Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

SQL: Using the COUNT() function

Options
  • 12-01-2004 5:48pm
    #1
    Registered Users Posts: 68,317 ✭✭✭✭


    OK, have a DB with 2.2 million rows, and 5 fields, let's call them a, b, c, d and url.

    a, b, c, & d are just integers corresponding to an id in another table. url is a text field.

    Basically what we want to do is find all the of unique urls, ie every url that appears only once.

    so we have
    "SELECT url, COUNT(*) FROM result GROUP BY url"
    Works fine. But we only want those results where the COUNT(*) row contains one. Is it either possible to only return these rows, or if not, sort the COUNT(*) row in ascending order (ie all the 1's first).

    I'm assuming it's also possible to be more specific in this Query, ie
    "SELECT b, c, url, COUNT(*) FROM result GROUP BY url WHERE b=x AND c=y"?

    :)


Comments

  • Registered Users Posts: 1,562 ✭✭✭Snaga


    How about the distinct keyword? As long as your not worried about the different values of the other attributes this should do wht your looking for in conjunction withcount.

    http://www.1keydata.com/sql/sqldistinct.html


  • Registered Users Posts: 5,695 ✭✭✭jd


    having count(*)=1

    however...
    distinct is yer man in this case, i think..


  • Closed Accounts Posts: 4 Scof


    Try:

    SELECT url
    FROM result
    GROUP BY url
    HAVING (COUNT(*) = 1)


  • Closed Accounts Posts: 9,314 ✭✭✭Talliesin


    SELECT
    	url
    FROM
    	result
    GROUP BY url
    HAVING COUNT(*) = 1
    
    HAVING is like WHERE, but it acts on aggregate functions like COUNT, MAX etc.


  • Closed Accounts Posts: 5,564 ✭✭✭Typedef


    select distinct count(url) from blah;

    or similar.


  • Advertisement
  • Registered Users Posts: 68,317 ✭✭✭✭seamus


    Great stuff lads. Loadsa things to get our teeth into there

    :)


  • Closed Accounts Posts: 9,314 ✭✭✭Talliesin


    Originally posted by Typedef
    select distinct count(url) from blah;

    or similar.
    Similar meaning what Scof and myself both said?:rolleyes:


  • Registered Users Posts: 640 ✭✭✭Kernel32


    As far as I know using the sql statement
    select distinct count(url) from result
    will return the same result as
    select count(url) from result

    The reason being that the count is first done, which will return a single value, then the distinct is done againist that single value giving you the same value back

    Interestingly on some RDBM's (SQL Server being one) the following SQL
    select count(distinct url) from result
    will tell you how many unique urls there are, but that means that url exists one or more times. Interesting to know but not the answer to the question

    The answer is this bit of SQL

    SELECT
    url
    FROM
    result
    GROUP BY url
    HAVING COUNT(*) = 1


    Over a large dataset this would be slow probably because the count and group by would have to process into memory and then the Having Count = 1 would process over the results. Thats one of the drawbacks of using the Having clause, it can be much slower than a where clause because it causes the data to be processed twice, but I don't see any other way.


  • Registered Users Posts: 15,443 ✭✭✭✭bonkey


    Originally posted by seamus
    I'm assuming it's also possible to be more specific in this Query, ie
    "SELECT b, c, url, COUNT(*) FROM result GROUP BY url WHERE b=x AND c=y"?

    Given that you've been given what I think is the best answer to the first half of the question (HAVING keyword is your friend here), I'll address this bit.

    The short asnwer is yes, except that if you are only grouping by URL, then you will have to apply aggregate functions to b and c. In an aggregate query (i.e. with a GROUP BY), every output field must either be handled by an aggregate function or be part of the group by clause.

    GROUP BY url, b, c would not be a good idea, because then your count would be for every unique url/b/c combination, rather than for every url.

    Instead, you would have :

    SELECT MIN(b), MIN(c), url, count(*)
    FROM ...
    WHERE ...
    GROUP BY url
    HAVING count(*) = 1

    Given that each output row will be an aggregate of only one line (count = 1), the MIN function has no effect...and in most DB implementations will work fine on string values as well as numerics.

    jc


Advertisement