Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

SQL Select Distinct

Options
  • 18-08-2011 9:35am
    #1
    Registered Users Posts: 1,657 ✭✭✭


    Is there any way of telling which record you are getting when you use SELECT DISTINCT - i.e. is it the first of the duplicates or the last or what?

    Let's say I wanted a list of email addresses and the time they last ordered, so I say

    SELECT DISTINCT email, order_time FROM orders

    Do I know whether I'm getting the maximum order_time? If not how do I force it to be so?


Comments

  • Registered Users Posts: 1,311 ✭✭✭Procasinator


    I would assume* DISTINCT would just use the first encountered row and discard other rows that are the same.

    If you want to control this behaviour, you would use ORDER BY. Such as ORDER BY order_time DESC.

    Saying that, are you really looking for all unique email/order pairs. As in, are you looking just for the latest order for an email only (which would be 1 row), or are you try to remove duplicate orders (order_time is the same)?

    * Like all assumptions, I could be wrong. It could possibly differ from implementation. Maybe check your RDBMS documentation and/or SQL standards.


  • Registered Users Posts: 1,657 ✭✭✭komodosp


    OK I've just realised I should be using "GROUP BY" instead of DISTINCT... But still the question remains, if I say

    SELECT email, order_time FROM orders GROUP BY email

    How do I guarantee I'm always getting the latest order_time for each email?


  • Registered Users Posts: 1,311 ✭✭✭Procasinator


    Try something like:
    SELECT email, MAX(order_time) FROM orders GROUP BY email
    


  • Registered Users Posts: 7,182 ✭✭✭Genghiz Cohen


    This may be a job for a stored proc.
    You could create a view out of a:
    select * from database order by order_time DESC

    then

    select distinct email, order time from view.

    Would that select the first record (thereby the largest as it has been ordered such)?


  • Moderators, Technology & Internet Moderators Posts: 1,334 Mod ✭✭✭✭croo


    I think I'd try something like
    SELECT DISTINCT email, 
               (SELECT MAX(order_time) 
                 FROM orders so2 
                 WHERE so2.email = so.email) as lastOrderTime
    FROM orders so
    

    Though there might be a more efficient way!? The amount of time I would spend thinking about it would depend on the number of orders and the frequency that I needed to run the query!

    PS. I had thought Procrastinator's answer
    SELECT email, MAX(order_time) FROM orders GROUP BY email
    would be the best approach but on reflection I think the MAX function would return the max in the table rather than the max for that email. Though I could be wrong.


  • Advertisement
  • Closed Accounts Posts: 3,357 ✭✭✭Beano


    croo wrote: »
    I think I'd try something like
    SELECT DISTINCT email, 
               (SELECT MAX(order_time) 
                 FROM orders so2 
                 WHERE so2.email = so.email) as lastOrderTime
    FROM orders so
    

    Though there might be a more efficient way!? The amount of time I would spend thinking about it would depend on the number of orders and the frequency that I needed to run the query!

    PS. I had thought Procrastinator's answer

    would be the best approach but on reflection I think the MAX function would return the max in the table rather than the max for that email. Though I could be wrong.

    it would return the max order time for each email not for the table as a whole. if you are using sql server 2008 you could use a cross apply or an outer apply


  • Registered Users Posts: 1,311 ✭✭✭Procasinator


    croo wrote: »
    PS. I had thought Procrastinator's answer

    would be the best approach but on reflection I think the MAX function would return the max in the table rather than the max for that email. Though I could be wrong.

    I believe you are wrong. The GROUP BY means the maximum order_time is searched for per group (i.e. email) not the whole data set.

    If it wasn't done that way, aggregate functions in group by queries would be a lot less useful.


  • Moderators, Technology & Internet Moderators Posts: 1,334 Mod ✭✭✭✭croo


    I believe you are wrong. The GROUP BY means the maximum order_time is searched for per group (i.e. email) not the whole data set.

    If it wasn't done that way, aggregate functions in group by queries would be a lot less useful.
    I didn't test to confirm but that makes sense... In which case I would think your solution a better one than my proposal.


  • Registered Users Posts: 1,657 ✭✭✭komodosp


    Thanks Procrastinator for that ironically quick response... haven't had a chance to test this yet as something more urgent came up but it looks like it would do the trick


Advertisement