Database Development
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
 
User Name:
Password:
Remember me
 
Go Back   Dev Articles Community ForumsDatabasesDatabase Development

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Display Modes
 
Unread Dev Articles Community Forums Sponsor:
Stay one step ahead of the competition. Evaluate and give feedback on some of the hottest web development tools on the market today. Make your opinion heard! Click Here
  #1  
Old August 17th, 2002, 01:18 PM
TNT_ TNT_ is offline
Junior Member
Dev Articles Newbie (0 - 499 posts)
 
Join Date: Aug 2002
Location: Cyprus
Posts: 2 TNT_ User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Database structure for web search engine - HELP!

I'm trying to build a web search engine to index around 2000 web sites. I've chosen mysql as the database and came up with 3 table definitions to store the data. table words that contains the list of words along with a id-key. table webpages that contains the list of URL's with each with an id, and a table called wordlinks that contains wordid and webpageid pairs denoting that the url whose id is webpageid contains the word whose id is wordid. This sounded like a good solution in theory but when implemented, I tried to index a medium-sized web site and it resulted into 17mb of data in the wordlinks table which obviously is far from space-efficient. Could anyone suggest an alternative structure?

Reply With Quote
  #2  
Old August 18th, 2002, 01:03 AM
James Yang James Yang is offline
Contributing User
Dev Articles Newbie (0 - 499 posts)
 
Join Date: Apr 2002
Location: Atlanta, Georgia
Posts: 284 James Yang User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 2 m 45 sec
Reputation Power: 7
Send a message via ICQ to James Yang
rather tahn getting all words from each site how about getting only the keywords.. (title, meta tag etc)?
__________________
Regards,

James Yang
.NET Developer / Network Engineer
MCSE, MCDBA, MCSA, CCNA

http://www.yellowpin.com/
http://www.opentechsupport.com/

Reply With Quote
  #3  
Old August 18th, 2002, 03:22 AM
Lindset Lindset is offline
weirdomoderator
Dev Articles Newbie (0 - 499 posts)
 
Join Date: Jun 2002
Location: Alta, Norway
Posts: 370 Lindset User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 6
Send a message via ICQ to Lindset Send a message via AIM to Lindset
ooor you can save each page's text in a TEXT column that has a FULLTEXT INDEX. Then you can use mysql's fulltext searching

When MySQL 4 comes, you will be able to do boolean searches too..

SELECT col1, col2 FROM table1 WHERE MATCH (col1, col2) AGAINST ('some text you type in the searchbox')

If you'd like to get the rating too, you can just do this (don't worry.. it remembers the search result --- so it won't search two times):

SELECT col1, col2, MATCH (col1, col2) AGAINST ('some text you type in the searchbox') FROM table1 WHERE MATCH (col1, col2) AGAINST ('some text you type in the searchbox')
__________________
Best Regards,
Håvard Lindset

Reply With Quote
  #4  
Old August 18th, 2002, 08:34 AM
TNT_ TNT_ is offline
Junior Member
Dev Articles Newbie (0 - 499 posts)
 
Join Date: Aug 2002
Location: Cyprus
Posts: 2 TNT_ User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Thanks Lindset,
Wouldn't that though cause an increase in size? since now every word is repeated on every page being indexed while before the words were only added once and their id was associated with the urls that included them.

Reply With Quote
Reply

Viewing: Dev Articles Community ForumsDatabasesDatabase Development > Database structure for web search engine - HELP!


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump

 Free IT White Papers!
 
Accelerating Trading Partner Performance
One in five. That's how many partner transactions have at least one error. That is an amazing statistic, particularly given the extraordinary leaps in innovation across the global supply chain during the past two decades. Download this white paper to learn more.

 
Competing on Analytics
This Tech Analysis is designed to help identify characteristics shared by analytics competitors, and includes information about 32 organizations that have made a commitment to quantitative, fact-based analysis.

 
Cost Effective Scaling with Virtualization and Coyote Point Systems
An overview of the industry trend toward virtualization, how server consolidation has increased the importance of application uptime and the steps being taken to integrate load balancing technology with virtualized servers.

 
Five Checkpoints to Implementing IP Telephony
Implementation planning for IP PBX software and IP telephony has become vital as businesses replace discontinued legacy PBX phone systems. This informative whitepaper outlines five &quot;checkpoints&quot; for any implementation plan that will help make IP communications a successful proposition.

 
Hosted Email Security: Staying Ahead of New Threats
In the last two years, email has become a fierce battleground between the nefarious forces of spam and malware, and the heroes of messaging protection. The spam volumes increased alarmingly every month, bringing clever new forms of phishing and virus propagation attacks.

 

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 4 hosted by Hostway