|
|
|||||||||
|
|||||||||
|
|||||||||
| |
|||
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Display Modes |
|
|
|
Stay one step ahead of the competition. Evaluate and give feedback
on some of the hottest web development tools on the market today.
Make your opinion heard! Click
Here
|
|
#1
|
|||
|
|||
|
Database structure for web search engine - HELP!
I'm trying to build a web search engine to index around 2000 web sites. I've chosen mysql as the database and came up with 3 table definitions to store the data. table words that contains the list of words along with a id-key. table webpages that contains the list of URL's with each with an id, and a table called wordlinks that contains wordid and webpageid pairs denoting that the url whose id is webpageid contains the word whose id is wordid. This sounded like a good solution in theory but when implemented, I tried to index a medium-sized web site and it resulted into 17mb of data in the wordlinks table which obviously is far from space-efficient. Could anyone suggest an alternative structure?
|
|
#2
|
|||
|
|||
|
rather tahn getting all words from each site how about getting only the keywords.. (title, meta tag etc)?
__________________
Regards, James Yang .NET Developer / Network Engineer MCSE, MCDBA, MCSA, CCNA http://www.yellowpin.com/ http://www.opentechsupport.com/ |
|
#3
|
|||
|
|||
|
ooor you can save each page's text in a TEXT column that has a FULLTEXT INDEX. Then you can use mysql's fulltext searching
When MySQL 4 comes, you will be able to do boolean searches too.. ![]() SELECT col1, col2 FROM table1 WHERE MATCH (col1, col2) AGAINST ('some text you type in the searchbox') If you'd like to get the rating too, you can just do this (don't worry.. it remembers the search result --- so it won't search two times): SELECT col1, col2, MATCH (col1, col2) AGAINST ('some text you type in the searchbox') FROM table1 WHERE MATCH (col1, col2) AGAINST ('some text you type in the searchbox')
__________________
Best Regards, Håvard Lindset |
|
#4
|
|||
|
|||
|
Thanks Lindset,
Wouldn't that though cause an increase in size? since now every word is repeated on every page being indexed while before the words were only added once and their id was associated with the urls that included them. |
![]() |
| Viewing: Dev Articles Community Forums > Databases > Database Development > Database structure for web search engine - HELP! |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|
|