Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

SQL - Counting occurrences of a distinct value in a column.

Options
  • 21-09-2007 11:34am
    #1
    Registered Users Posts: 68,317 ✭✭✭✭


    I think I've probably asked something similar to this before, but I'm still not convinced I'm doing this the right way.

    I have a table which stores survey results. Each question in the survey has its own column which stores the user's answer (it's multiple choice, so there are only certain possible answers which can appear).

    On particular column stores a single character - a, b, c or d.

    What I want to do is to count the number of surveys for which each answer is given.

    That is, I want to see a result that looks like this:
    | ans |  num_ans |
    ------------------
    |  'a'  |     120     |
    ------------------
    |  'b'  |      20      |
    ------------------
    |  'c'  |     103     |
    ------------------
    |  'd'  |     154     |
    ------------------
    

    This can work fine by just calling
    SELECT count(q3) from survey group by q3

    But if count(q3) is zero, it doesn't appear in the result.

    Previously I've gotten around it by having a separate table, e.g. "Q3Answer" containing one row for each possible answer.
    Then you call a query like

    SELECT COUNT(survey.q3)
    FROM q3answer
    LEFT JOIN
    survey ON survey.q3 = q3.answer.id
    GROUP BY survey.q3

    But this seems stupidly inefficient. Is there any other way of doing this. Would an ENUM datatype by any use here?


Comments

  • Registered Users Posts: 15,443 ✭✭✭✭bonkey


    seamus wrote:
    I have a table which stores survey results. Each question in the survey has its own column which stores the user's answer (it's multiple choice, so there are only certain possible answers which can appear).
    I dunno if its an option for you, but this design appears to be somewhat broken. Rather than having a column per question, you'd be better off (I think) have a row per question, so your table would be more like:

    QuestionnaireNo
    QuestionNo
    Answer

    rather than :

    QuestionnaireNo
    Answer1
    Answer2
    ...
    AnswerN
    On particular column stores a single character - a, b, c or d.

    What I want to do is to count the number of surveys for which each answer is given.
    Do you mean each answer for a specific question, or each answer taken across all questions?

    In the latter case, you'll have to add up the results for Question1, Question2, etc. This would be avoided by the design I suggested above.
    But if count(q3) is zero, it doesn't appear in the result.
    More correctly, if a specific multichoice option for the answer was never picked, then you can't find any entries to add up. This is (obviously) correct behaviour.
    Previously I've gotten around it by having a separate table, e.g. "Q3Answer" containing one row for each possible answer.
    Then you call a query like

    SELECT COUNT(survey.q3)
    FROM q3answer
    LEFT JOIN
    survey ON survey.q3 = q3.answer.id
    GROUP BY survey.q3

    But this seems stupidly inefficient.
    I wouldn't say its stupidly inefficient at all. There's an infinite number of values which don't appear in your data. You have to define *somewhere* which of those values you want to see. With a relational database, having a lookup isn't a bad way of doing it. In a "robust" design, I'd probably recommend that this table exist in some form anyway with a foreign-key-constraint used to make sure that no-one entered bogus values into the answer fields.
    Is there any other way of doing this.
    Short of changing your table-structure, I can't see one.

    Incidentally, if you know that all questions will always have 4 choices, you could do a table-design like:


    QuestionnaireNo
    QuestionNo
    AnswerA
    AnswerB
    AnswerC
    AnswerD

    And store a 1 or a 0 in each of the Answer columns depending on what the user picked.

    In general, though, its not a normalised structure which is the flaw I'd pick with your original structure.


    If you could live with one row of output, and if your DB supports CASE statements, then you could do something like this:
    SELECT SUM(CASE WHEN Q3='a' THEN 1 ELSE 0 END) as AnswerA
          , SUM(CASE WHEN Q3='b' THEN 1 ELSE 0 END) as AnswerB
          , SUM(CASE WHEN Q3='c' THEN 1 ELSE 0 END) as AnswerC
          , SUM(CASE WHEN Q3='d' THEN 1 ELSE 0 END) as AnswerD
    FROM <table>
    
    (note the lack of GROUP BY. I'm grouping over everything.

    I wouldn't swear its more efficient, though


  • Registered Users Posts: 68,317 ✭✭✭✭seamus


    Thanks for that extensive reply as usual bonkey. A great help :)

    I've always kind of felt that having columns as questions was a bit poorly designed, but I'm altering a system so I'm trying to stick to the initial setup to minimise my workload. Typical mistake. :)

    It turns out though that I've deleted the main piece of code which relied on having the questions all in one row, so I may look at splitting out the details. Four tables may sort it - One for the survey details, one for the possible questions, another linking each survey to each question/answer, and another linking each question to its possible answers.

    Thanks again.


  • Registered Users Posts: 1,967 ✭✭✭Dun


    If I'm understanding this correctly, this might work, as unpretty as it might be:
    SELECT   'a'           AS ans,
    	COUNT(q3)    AS num_ans
    FROM    <table>
    WHERE   q3 = 'a'
    
    UNION
    
    SELECT   'b',
    	COUNT(q3)
    FROM    <table>
    WHERE   q3 = 'b'
    
    UNION
    
    SELECT   'c',
    	COUNT(q3)
    FROM    <table>
    
    WHERE   q3 = 'c'
    
    UNION
    
    SELECT   'd',
    	COUNT(q3)
    FROM    <table>
    WHERE   q3 = 'd'
    


  • Registered Users Posts: 1,466 ✭✭✭Smoggy


    UNION ALL is faster.


Advertisement