TAGS :Viewed: 5 - Published at: a few seconds ago

[ Limiting records in a random select ]

I've searched all over and can't seem to find the answer to this anywhere so I'm asking for help.

I'm trying to randomly select 5 people. Kind of like a drawing. Everyone can enter themselves however many times they want to enter but only 3 of those entries can be considered. So... 5 distinct people will win, but all the entries (dups included) need to be considered up to 3 entries.

For example:

Adam,1
Adam,2
Adam,1
Adam,1
Sally,2
Timmy,3
John,1
John,1
Jenny,2
Wendy,3
Wendy,3
Wendy,3
Wendy,5
Wendy,5

Here's the code I've got so far:

select top 5 
    name, vote 
from 
    (SELECT 
         name, vote 
     FROM 
         Entries 
     GROUP BY 
          name, vote) winners 
ORDER BY
    NEWID();

My problem is (1) I'm not limiting them to 3 per for consideration cause I'm blanking on how to do that and (2) Distinct isn't working because they could enter themselves 100 times and have voted on a different entry each time.

Answer 1


Try this. It will only include up to three entries per name.

WITH UpToThree AS
(
    SELECT
         Name
        ,RN = ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Vote)
        ,NEWID() as RandID
    FROM
        Entries
)
SELECT TOP 5
    Name, MAX(RandID)
FROM
    UpToThree
WHERE
    RN < 4
GROUP BY
    Name
ORDER BY
    MAX(RandID)

Answer 2


I'm not sure if any of the following are unavailable in SQL 2000:

SELECT TOP 5 Name
FROM (SELECT Name,ABS(CHECKSUM(NEWID())) nid, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY NEWID()) RN
      FROM Table1) as sub
WHERE RN <= 3
GROUP BY Name
ORDER BY MAX(nid)

Demo: SQL Fiddle

I'm not sure how you want to decide which vote you want to display, but that should be handled in a separate join to the winners list if you want the entries to be considered on virtue of name alone.

Answer 3


I see a challenge in ensuring that the people with 3 entries are 3 times more likely to win than people with 1 entry. I am assuming that votes is not important, because it is not mentioned in the question or comments.

The strategy is to first limit the entries to 3 per person randomly using row_number() (and a where). Then, order the results again and enumerate each row using row_number() randomly. By enumerating randomly, any entry has an equal change of being at the top -- so a person with 3 entries has three times the probability of being the best as someone with one entry.

Finally, choose the first five people based on this sequence number:

with entries_3 as (
      select e.*
      from (select e.*, row_number() over (partition by name order by newid()) as seqnum
            from entries
           ) e
      where seqnum <= 3
     ),
     entries_3_ordered (
      select e.*, row_number() over (order by newid()) as seqnum2
      from entries_3
     )
select top 5 name, votes
from from entries_3_ordered
group by name, votes
order by min(seqnum2);