Database Development
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
 
User Name:
Password:
Remember me
 
Go Back   Dev Articles Community ForumsDatabasesDatabase Development

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Display Modes
 
Unread Dev Articles Community Forums Sponsor:
  #1  
Old June 21st, 2004, 11:17 AM
Kostko Kostko is offline
Registered User
Dev Articles Newbie (0 - 499 posts)
 
Join Date: Apr 2003
Posts: 1 Kostko User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
Post Weighted random selection in PostgreSQL

I would like to implement a weighted random selection, so that rows with larger k would be selected more often, but the selection would still be random.

What would be the best way to implement it ?

Reply With Quote
  #2  
Old June 21st, 2004, 01:23 PM
dhouston's Avatar
dhouston dhouston is offline
Contributing User
Dev Articles Beginner (1000 - 1499 posts)
 
Join Date: May 2003
Location: Tennessee
Posts: 1,355 dhouston User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 8
Send a message via ICQ to dhouston
If there's weighting, it's not really random, is it? The only way I can think of right off to do something like this is to break the larger k into individual units. So in a sample of three items with weights 5, 3, and 2, you'd have ten rather than three items in your random selection, 5 for the item with k 5, 3 for k 3, and 2 for k 2. The greater frequency of the representatives of k 5 would increase the likelihood that that item would be selected. How to do that in a select without either having duplicate rows (rather than weight field k) or having to do some programming logic I can't suggest.
__________________
Please don't PM me asking for solutions outside the scope of a thread.
Keeping all responses in a thread stands to help others who come along later,
which is after all what this forum's all about.

Reply With Quote
  #3  
Old July 24th, 2004, 05:24 PM
jschmitt jschmitt is offline
Registered User
Dev Articles Newbie (0 - 499 posts)
 
Join Date: Jul 2004
Posts: 3 jschmitt User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
there are plenty of places that a weighted-random selection is useful (those dont need to be listed here)...

multiple entries can break table structure (relationships and unique keys), this allows you to do so with a "weighting" field.

SELECT [FieldsList]
FROM [TableName]
WHERE [WhereStatements]
ORDER BY Rand()*(1/Weight) LIMIT 1;

Assuming your weight field is an int, larger value has more weight, and named "Weight"

Reply With Quote
  #4  
Old July 26th, 2004, 08:14 AM
dhouston's Avatar
dhouston dhouston is offline
Contributing User
Dev Articles Beginner (1000 - 1499 posts)
 
Join Date: May 2003
Location: Tennessee
Posts: 1,355 dhouston User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 8
Send a message via ICQ to dhouston
I don't see that this ever allows a lower-weighted result to be returned. For a given random number, rows with a higher weight will always evaluate to a lower number than rows with a lower weight and thus will always be returned (because the default is to order ascending). My solution, because it relies on frequency of a given weight, has an increased likelihood of returning a higher-weighted row but does allow lower-weighted rows to be returned. You can get around the table structure issues by having a separate weights table tied to the main table. Maybe I'm missing something in your solution. Can you show me a situation in which a lower-weighted row will ever be returned? If not, then it's a flawed solution.

Reply With Quote
  #5  
Old July 26th, 2004, 08:15 AM
dhouston's Avatar
dhouston dhouston is offline
Contributing User
Dev Articles Beginner (1000 - 1499 posts)
 
Join Date: May 2003
Location: Tennessee
Posts: 1,355 dhouston User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 8
Send a message via ICQ to dhouston
I'm interested in seeing a more elegant solution than mine, incidentally. If you can show me the situation I've requested, I'll gladly admit that I'm wrong and add your solution to my toolbox. I just don't see it as a valid solution yet.

Reply With Quote
  #6  
Old July 29th, 2004, 11:22 AM
jschmitt jschmitt is offline
Registered User
Dev Articles Newbie (0 - 499 posts)
 
Join Date: Jul 2004
Posts: 3 jschmitt User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
running it is all that is needed to see a sample:

sample data:
ID Fruit Weight
1 oranges 1
2 apples 3
3 strawberry 2
4 pineapple 1
5 cherry 3
6 peach 2

results:
ID fruit run1(1000) run2(1000) run3(100,000)
1 oranges: 79 (7.9%) 66 (6.6%) 7345 (7.345%)
2 apples: 268 (26.8%) 282 (28.2%) 26965 (26.965%)
3 strawberry: 153 (15.3%) 150 (15%) 15377 (15.377%)
4 pineapple: 77 (7.7%) 73 (7.3%) 7851 (7.851%)
5 cherry: 274 (27.4%) 276 (27.6%) 26418 (26.418%)
6 peach: 149 (14.9%) 153 (15.3%) 16044 (16.044%)

whatcha need to remember is the random number will always be between 0 - 1. this will be multiplied by the inverse of weight... the weighting works because a higher weight returns smaller inverse, helping to achieve a smaller number (query defaults to select in ASC order). the weight is a constant for that line item, it skews the results, not kill them.

technically it would be a hair faster to Rand()/Weight (1 less operation) but it was for the example, because if you used weight, where a lower number had higher precedence, you would remove the inverse like so:
SELECT [FieldsList]
FROM [TableName]
WHERE [WhereStatements]
ORDER BY Rand()*(Weight) LIMIT 1;

this is all dependant on user entered values. there are many variations you could play into this, normalize the weights (say with dated material and TO_DAYS()/DateDiff()/Age()), or a cos()/sin()/ln() ... but that doesnt really belong here


Reply With Quote
  #7  
Old July 29th, 2004, 11:40 AM
dhouston's Avatar
dhouston dhouston is offline
Contributing User
Dev Articles Beginner (1000 - 1499 posts)
 
Join Date: May 2003
Location: Tennessee
Posts: 1,355 dhouston User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 8
Send a message via ICQ to dhouston
Gotcha. I had been thinking in terms of RAND() returning whole numbers rather than decimals between 0 and 1, which obviously changes things a little. Thanks for the good followup. Hope you'll stick around and continue to shed light on the various topics that interest you.

Reply With Quote
  #8  
Old July 31st, 2004, 12:04 PM
jschmitt jschmitt is offline
Registered User
Dev Articles Newbie (0 - 499 posts)
 
Join Date: Jul 2004
Posts: 3 jschmitt User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
glad to help, i'll definately try in the future. this was one of those things i've been searching for... to off-load processing time to my DB server rather than my code, and it seemed there were many more questions for it than answers.

its running well in a few places, i have a few more places to convert in a similar fashion with cos() and ln() functions for weighting.

Reply With Quote
Reply

Viewing: Dev Articles Community ForumsDatabasesDatabase Development > Weighted random selection in PostgreSQL


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest News | Latest Threads | Shoutbox
Forum Jump




 Free IT White Papers!
 
How to Present Effectively Online
This white paper offers practical and actionable advice on the key steps that any presenter should consider as they plan and execute a Webinar or online meeting.

Request Your Free Technology Downloads!
 
Open Source Security Myths
Open Source Software (OSS) is computer software whose source code is available to the general public with relaxed or non-existent intellectual property restrictions (or arrangement such as the public domain), and is usually developed with the input of many contributors.

Request Your Free Technology Downloads!
 
Power and Cooling Capacity Management for Data Centers
This paper describes the principles for achieving power and cooling capacity management.

Request Your Free Technology Downloads!
 
Scalable, Fault-Tolerant NAS for Oracle - The Next Generation
For several years NAS has been evolving as a storage alternative for Oracle databases, and for good reason: NAS is quite often the simplest, most cost-effective storage approach for Oracle. Learn about the benefits that HP's approach to scalable NAS brings to Oracle environments in this comprehensive white paper.

Request Your Free Technology Downloads!
 
Understanding Web Application Security Challenges
This white paper discusses many common threats and preventive measures for Web application security, and explains what you can do to help protect your organization.

Request Your Free Technology Downloads!
 

Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 




© 2003-2009 by Developer Shed. All rights reserved. DS Cluster 5 Hosted by Hostway
Stay green...Green IT