SQL - Counting occurrences of a distinct value in a column.

seamus · 21-09-2007 11:34AM #1

I think I've probably asked something similar to this before, but I'm still not convinced I'm doing this the right way.

I have a table which stores survey results. Each question in the survey has its own column which stores the user's answer (it's multiple choice, so there are only certain possible answers which can appear).

On particular column stores a single character - a, b, c or d.

What I want to do is to count the number of surveys for which each answer is given.

That is, I want to see a result that looks like this:

| ans |  num_ans |
------------------
|  'a'  |     120     |
------------------
|  'b'  |      20      |
------------------
|  'c'  |     103     |
------------------
|  'd'  |     154     |
------------------

This can work fine by just calling
SELECT count(q3) from survey group by q3

But if count(q3) is zero, it doesn't appear in the result.

Previously I've gotten around it by having a separate table, e.g. "Q3Answer" containing one row for each possible answer.
Then you call a query like

SELECT COUNT(survey.q3)
FROM q3answer
LEFT JOIN
survey ON survey.q3 = q3.answer.id
GROUP BY survey.q3

But this seems stupidly inefficient. Is there any other way of doing this. Would an ENUM datatype by any use here?

bonkey · 21-09-2007 12:12PM

seamus wrote:

I have a table which stores survey results. Each question in the survey has its own column which stores the user's answer (it's multiple choice, so there are only certain possible answers which can appear).

I dunno if its an option for you, but this design appears to be somewhat broken. Rather than having a column per question, you'd be better off (I think) have a row per question, so your table would be more like:

QuestionnaireNo
QuestionNo
Answer

rather than :

QuestionnaireNo
Answer1
Answer2
...
AnswerN

On particular column stores a single character - a, b, c or d.

What I want to do is to count the number of surveys for which each answer is given.

Do you mean each answer for a specific question, or each answer taken across all questions?

In the latter case, you'll have to add up the results for Question1, Question2, etc. This would be avoided by the design I suggested above.

But if count(q3) is zero, it doesn't appear in the result.

More correctly, if a specific multichoice option for the answer was never picked, then you can't find any entries to add up. This is (obviously) correct behaviour.

Previously I've gotten around it by having a separate table, e.g. "Q3Answer" containing one row for each possible answer.
Then you call a query like

SELECT COUNT(survey.q3)
FROM q3answer
LEFT JOIN
survey ON survey.q3 = q3.answer.id
GROUP BY survey.q3

But this seems stupidly inefficient.

I wouldn't say its stupidly inefficient at all. There's an infinite number of values which don't appear in your data. You have to define *somewhere* which of those values you want to see. With a relational database, having a lookup isn't a bad way of doing it. In a "robust" design, I'd probably recommend that this table exist in some form anyway with a foreign-key-constraint used to make sure that no-one entered bogus values into the answer fields.

Is there any other way of doing this.

Short of changing your table-structure, I can't see one.

Incidentally, if you know that all questions will always have 4 choices, you could do a table-design like:

QuestionnaireNo
QuestionNo
AnswerA
AnswerB
AnswerC
AnswerD

And store a 1 or a 0 in each of the Answer columns depending on what the user picked.

In general, though, its not a normalised structure which is the flaw I'd pick with your original structure.

If you could live with one row of output, and if your DB supports CASE statements, then you could do something like this:

SELECT SUM(CASE WHEN Q3='a' THEN 1 ELSE 0 END) as AnswerA
      , SUM(CASE WHEN Q3='b' THEN 1 ELSE 0 END) as AnswerB
      , SUM(CASE WHEN Q3='c' THEN 1 ELSE 0 END) as AnswerC
      , SUM(CASE WHEN Q3='d' THEN 1 ELSE 0 END) as AnswerD
FROM <table>

(note the lack of GROUP BY. I'm grouping over everything.

I wouldn't swear its more efficient, though

seamus · 21-09-2007 01:47PM

Thanks for that extensive reply as usual bonkey. A great help

I've always kind of felt that having columns as questions was a bit poorly designed, but I'm altering a system so I'm trying to stick to the initial setup to minimise my workload. Typical mistake.

It turns out though that I've deleted the main piece of code which relied on having the questions all in one row, so I may look at splitting out the details. Four tables may sort it - One for the survey details, one for the possible questions, another linking each survey to each question/answer, and another linking each question to its possible answers.

Thanks again.

Dun · 21-09-2007 10:57PM

If I'm understanding this correctly, this might work, as unpretty as it might be:

SELECT   'a'           AS ans,
	COUNT(q3)    AS num_ans
FROM    <table>
WHERE   q3 = 'a'

UNION

SELECT   'b',
	COUNT(q3)
FROM    <table>
WHERE   q3 = 'b'

UNION

SELECT   'c',
	COUNT(q3)
FROM    <table>

WHERE   q3 = 'c'

UNION

SELECT   'd',
	COUNT(q3)
FROM    <table>
WHERE   q3 = 'd'

Smoggy · 24-09-2007 07:36PM

UNION ALL is faster.

SQL - Counting occurrences of a distinct value in a column.

Comments