Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

another SQL prob(subqueries)

Options
  • 18-01-2004 3:29pm
    #1
    Registered Users Posts: 7,580 ✭✭✭


    I've have a table with Engine, query, round, position, and url as my columns. For a given Engine and query I want be able to return a result set for two subsequent rounds.

    select url, position, round from result
    where Engine=?
    and query=?
    and (round =i OR round =i+1);

    would work only something I hadn't anticipated occurs in the table, urls are repeated in the same round and my calculations don't allow for that. So i want to remove repeats altogether. Or at least only return the first instance of the repeated url.

    So i came up with this
    **************
    SELECT url, round, position from result
    WHERE url= (SELECT url from result
    WHERE engine_id=?
    AND query_id=?
    AND round=i
    HAVING COUNT(*)=1)
    OR url= (SELECT url from result
    WHERE engine_id=?
    AND query_id=?
    AND round = i+1
    HAVING COUNT(*)=1)
    GROUP BY url;
    ***************

    which you've prob guessed doesn't work :)
    instead of having count I could use DISTINCT which would return all the unique urls in each round.

    the trouble I'm having is with the syntax of the subquery. Any pointers appreciated.

    All this btw is for MySQL :)


Comments

  • Registered Users Posts: 15,443 ✭✭✭✭bonkey


    Originally posted by uberwolf
    urls are repeated in the same round and my calculations don't allow for that. So i want to remove repeats altogether.

    OK. Thats actually pretty easy.

    If each repeated URL has the same position (which I doubt), you could use the straightforward SELECT DISTINCT.

    But that won't do it if the positions are different, so how about this...

    select round, url, min(position) as position
    from ...
    where ...
    group by round, url

    That should get rid of your duplicates nicely :)

    jc


  • Registered Users Posts: 7,580 ✭✭✭uberwolf


    Originally posted by bonkey

    select round, url, min(position) as position
    from ...
    where ...
    group by round, url

    That should get rid of your duplicates nicely :)

    It actually gets rid of everything :p empty sets for that. If I get rid of the min I get the full set grouped as I would hope.

    The whole idea is to track url movement from one round to the nxt (well spotted bonkey) So I want to identical urls to be returned from subsequent rnds together so I can do maths with their postions (so obviously if a url occurs three times in my results it'll mess everything)

    not sure why min(pos) returns empty...


  • Registered Users Posts: 2,781 ✭✭✭amen


    post a few sample data rows (with/without the duplicate ursl)
    and tell us what u think the results should be
    easier to figure our the query that way


  • Registered Users Posts: 7,580 ✭✭✭uberwolf


    kinda hard to do that, but I'll try and give u an idea of what it looks like

    engine_id is an integer between 1 and 5 for the 5 engines, query_id an integer for one of the 246 querys, round an integer for one, position - that urls ranking(1 to 300) and url - a string.
    **************************
    engine_id | query_id | round | position | url
    1
    |
    1 | ---- 2 | - |
    1
    |
    1 | ---- 3 | - |

    so engine 1, query 1, round 2, would have 300 positions and 300 corresponding urls. I hadn't anticipated teh fact that some urls are repeated in the url list at diff positions.
    I want to return all urls which occur once, or all distinct urls(with the highest postition for any repeated urls-highest = closest to one) in an individual round, whilst returning two rounds at a time. Then group them so any repeated urls appear subsequently in the result set.

    The aim is to calc the diff between a urls position one week to the nxt. If a url occurs twice in one round it'll throw a spanner in the works so I want to remove them from consideration.
    I hope this clarfies, if any other carifications are needed please ask, and thanks for the help
    E


  • Registered Users Posts: 15,443 ✭✭✭✭bonkey


    Originally posted by uberwolf

    I want to return all urls which occur once, or all distinct urls(with the highest postition for any repeated urls-highest = closest to one) in an individual round,
    whilst returning two rounds at a time.

    That is still describing exactly what my group by query should return. Just in case there's some mistake, here's the full query.

    There is simply no way you could be getting results when omitting the MIN and GROUP BY, and yet no results when they're back in - as you described previously - unless your database is corrupt. Is this really the query you ran?


    SELECT url, MIN(position) as FirstPosition, round
    FROM result
    WHERE Engine=?
    AND query=?
    AND (round =i OR round =i+1)
    GROUP BY round, url;

    Then group them so any repeated urls appear subsequently in the result set.

    Assuming that what you mean is that if the same URL appears in both rounds (as you only eliminate duplicates *within* a round) then you want them appearing beside each other in the resultset.....

    that would be ordering, not grouping. You're not changing any data any further...simply re-ordering the resultset data so that records you want to proces together are indeed together.

    You can implement this in the same query :

    SELECT url, MIN(position) as FirstPosition, round
    FROM result
    WHERE Engine=?
    AND query=?
    AND (round =i OR round =i+1)
    GROUP BY round, url
    ORDER BY url, round;

    jc


  • Advertisement
  • Registered Users Posts: 15,443 ✭✭✭✭bonkey


    One last thing....

    The above queries will produce a blank FirstPosition if (and only if) there are records which have a blank Position value....as MIN will sort NULL and empty strings before the numeric values.

    If this type of record exists, then your logic needs revision, as you need to decide how to handle something without a specified position....either filter it out in the WHERE clause, or figure how you wish to handle blank Position values.

    jc


  • Registered Users Posts: 7,580 ✭✭✭uberwolf


    Originally posted by bonkey
    That is still describing exactly what my group by query should return. Just in case there's some mistake, here's the full query.

    There is simply no way you could be getting results when omitting the MIN and GROUP BY, and yet no results when they're back in - as you described previously - unless your database is corrupt. Is this really the query you ran?
    jc

    As you say, some mistake :$ I must have left out the as position part when running that query yesterday, mitigating was my bleary eyed state, but I'm sorry anyway. Thanks for your help :)


Advertisement