SQL: Using the COUNT() function

seamus · 12-01-2004 05:48PM #1

OK, have a DB with 2.2 million rows, and 5 fields, let's call them a, b, c, d and url.

a, b, c, & d are just integers corresponding to an id in another table. url is a text field.

Basically what we want to do is find all the of unique urls, ie every url that appears only once.

so we have
"SELECT url, COUNT(*) FROM result GROUP BY url"
Works fine. But we only want those results where the COUNT(*) row contains one. Is it either possible to only return these rows, or if not, sort the COUNT(*) row in ascending order (ie all the 1's first).

I'm assuming it's also possible to be more specific in this Query, ie
"SELECT b, c, url, COUNT(*) FROM result GROUP BY url WHERE b=x AND c=y"?

Snaga · 12-01-2004 05:53PM

How about the distinct keyword? As long as your not worried about the different values of the other attributes this should do wht your looking for in conjunction withcount.

http://www.1keydata.com/sql/sqldistinct.html

jd · 12-01-2004 05:56PM

having count(*)=1

however...
distinct is yer man in this case, i think..

Scof · 12-01-2004 05:58PM

Try:

SELECT url
FROM result
GROUP BY url
HAVING (COUNT(*) = 1)

Talliesin · 12-01-2004 05:59PM

SELECT
	url
FROM
	result
GROUP BY url
HAVING COUNT(*) = 1

HAVING is like WHERE, but it acts on aggregate functions like COUNT, MAX etc.

Typedef · 12-01-2004 06:23PM

select distinct count(url) from blah;

or similar.

seamus · 13-01-2004 12:28AM

Great stuff lads. Loadsa things to get our teeth into there

Talliesin · 13-01-2004 03:37PM

Originally posted by Typedef
select distinct count(url) from blah;

or similar.

Similar meaning what Scof and myself both said?:rolleyes:

Kernel32 · 13-01-2004 08:06PM

As far as I know using the sql statement
select distinct count(url) from result
will return the same result as
select count(url) from result

The reason being that the count is first done, which will return a single value, then the distinct is done againist that single value giving you the same value back

Interestingly on some RDBM's (SQL Server being one) the following SQL
select count(distinct url) from result
will tell you how many unique urls there are, but that means that url exists one or more times. Interesting to know but not the answer to the question

The answer is this bit of SQL

SELECT
url
FROM
result
GROUP BY url
HAVING COUNT(*) = 1

Over a large dataset this would be slow probably because the count and group by would have to process into memory and then the Having Count = 1 would process over the results. Thats one of the drawbacks of using the Having clause, it can be much slower than a where clause because it causes the data to be processed twice, but I don't see any other way.

bonkey · 14-01-2004 10:02AM

Originally posted by seamus
I'm assuming it's also possible to be more specific in this Query, ie
"SELECT b, c, url, COUNT(*) FROM result GROUP BY url WHERE b=x AND c=y"?

Given that you've been given what I think is the best answer to the first half of the question (HAVING keyword is your friend here), I'll address this bit.

The short asnwer is yes, except that if you are only grouping by URL, then you will have to apply aggregate functions to b and c. In an aggregate query (i.e. with a GROUP BY), every output field must either be handled by an aggregate function or be part of the group by clause.

GROUP BY url, b, c would not be a good idea, because then your count would be for every unique url/b/c combination, rather than for every url.

Instead, you would have :

SELECT MIN(b), MIN(c), url, count(*)
FROM ...
WHERE ...
GROUP BY url
HAVING count(*) = 1

Given that each output row will be an aggregate of only one line (count = 1), the MIN function has no effect...and in most DB implementations will work fine on string values as well as numerics.

jc

SQL: Using the COUNT() function

Comments