|
|
|||||||||
|
|||||||||
|
|||||||||
| |
|||
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
I would like to implement a weighted random selection, so that rows with larger k would be selected more often, but the selection would still be random.
What would be the best way to implement it ? ![]() |
|
#2
|
||||
|
||||
|
If there's weighting, it's not really random, is it? The only way I can think of right off to do something like this is to break the larger k into individual units. So in a sample of three items with weights 5, 3, and 2, you'd have ten rather than three items in your random selection, 5 for the item with k 5, 3 for k 3, and 2 for k 2. The greater frequency of the representatives of k 5 would increase the likelihood that that item would be selected. How to do that in a select without either having duplicate rows (rather than weight field k) or having to do some programming logic I can't suggest.
__________________
Please don't PM me asking for solutions outside the scope of a thread. Keeping all responses in a thread stands to help others who come along later, which is after all what this forum's all about. |
|
#3
|
|||
|
|||
|
there are plenty of places that a weighted-random selection is useful (those dont need to be listed here)...
multiple entries can break table structure (relationships and unique keys), this allows you to do so with a "weighting" field. SELECT [FieldsList] FROM [TableName] WHERE [WhereStatements] ORDER BY Rand()*(1/Weight) LIMIT 1; Assuming your weight field is an int, larger value has more weight, and named "Weight" |
|
#4
|
||||
|
||||
|
I don't see that this ever allows a lower-weighted result to be returned. For a given random number, rows with a higher weight will always evaluate to a lower number than rows with a lower weight and thus will always be returned (because the default is to order ascending). My solution, because it relies on frequency of a given weight, has an increased likelihood of returning a higher-weighted row but does allow lower-weighted rows to be returned. You can get around the table structure issues by having a separate weights table tied to the main table. Maybe I'm missing something in your solution. Can you show me a situation in which a lower-weighted row will ever be returned? If not, then it's a flawed solution.
|
|
#5
|
||||
|
||||
|
I'm interested in seeing a more elegant solution than mine, incidentally. If you can show me the situation I've requested, I'll gladly admit that I'm wrong and add your solution to my toolbox. I just don't see it as a valid solution yet.
|
|
#6
|
|||
|
|||
|
running it is all that is needed to see a sample:
sample data: ID Fruit Weight 1 oranges 1 2 apples 3 3 strawberry 2 4 pineapple 1 5 cherry 3 6 peach 2 results: ID fruit run1(1000) run2(1000) run3(100,000) 1 oranges: 79 (7.9%) 66 (6.6%) 7345 (7.345%) 2 apples: 268 (26.8%) 282 (28.2%) 26965 (26.965%) 3 strawberry: 153 (15.3%) 150 (15%) 15377 (15.377%) 4 pineapple: 77 (7.7%) 73 (7.3%) 7851 (7.851%) 5 cherry: 274 (27.4%) 276 (27.6%) 26418 (26.418%) 6 peach: 149 (14.9%) 153 (15.3%) 16044 (16.044%) whatcha need to remember is the random number will always be between 0 - 1. this will be multiplied by the inverse of weight... the weighting works because a higher weight returns smaller inverse, helping to achieve a smaller number (query defaults to select in ASC order). the weight is a constant for that line item, it skews the results, not kill them. technically it would be a hair faster to Rand()/Weight (1 less operation) but it was for the example, because if you used weight, where a lower number had higher precedence, you would remove the inverse like so: SELECT [FieldsList] FROM [TableName] WHERE [WhereStatements] ORDER BY Rand()*(Weight) LIMIT 1; this is all dependant on user entered values. there are many variations you could play into this, normalize the weights (say with dated material and TO_DAYS()/DateDiff()/Age()), or a cos()/sin()/ln() ... but that doesnt really belong here ![]() |
|
#7
|
||||
|
||||
|
Gotcha. I had been thinking in terms of RAND() returning whole numbers rather than decimals between 0 and 1, which obviously changes things a little. Thanks for the good followup. Hope you'll stick around and continue to shed light on the various topics that interest you.
![]() |
|
#8
|
|||
|
|||
|
glad to help, i'll definately try in the future. this was one of those things i've been searching for... to off-load processing time to my DB server rather than my code, and it seemed there were many more questions for it than answers.
its running well in a few places, i have a few more places to convert in a similar fashion with cos() and ln() functions for weighting. |
![]() |
| Viewing: Dev Articles Community Forums > Databases > Database Development > Weighted random selection in PostgreSQL |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|
|