another SQL prob(subqueries)

uberwolf · 18-01-2004 03:29PM #1

I've have a table with Engine, query, round, position, and url as my columns. For a given Engine and query I want be able to return a result set for two subsequent rounds.

select url, position, round from result
where Engine=?
and query=?
and (round =i OR round =i+1);

would work only something I hadn't anticipated occurs in the table, urls are repeated in the same round and my calculations don't allow for that. So i want to remove repeats altogether. Or at least only return the first instance of the repeated url.

So i came up with this
**************
SELECT url, round, position from result
WHERE url= (SELECT url from result
WHERE engine_id=?
AND query_id=?
AND round=i
HAVING COUNT(*)=1)
OR url= (SELECT url from result
WHERE engine_id=?
AND query_id=?
AND round = i+1
HAVING COUNT(*)=1)
GROUP BY url;
***************

which you've prob guessed doesn't work

instead of having count I could use DISTINCT which would return all the unique urls in each round.

the trouble I'm having is with the syntax of the subquery. Any pointers appreciated.

All this btw is for MySQL

bonkey · 18-01-2004 04:07PM

Originally posted by uberwolf
urls are repeated in the same round and my calculations don't allow for that. So i want to remove repeats altogether.

OK. Thats actually pretty easy.

If each repeated URL has the same position (which I doubt), you could use the straightforward SELECT DISTINCT.

But that won't do it if the positions are different, so how about this...

select round, url, min(position) as position
from ...
where ...
group by round, url

That should get rid of your duplicates nicely

jc

uberwolf · 18-01-2004 04:35PM

Originally posted by bonkey

select round, url, min(position) as position
from ...
where ...
group by round, url

That should get rid of your duplicates nicely

It actually gets rid of everything

empty sets for that. If I get rid of the min I get the full set grouped as I would hope.

The whole idea is to track url movement from one round to the nxt (well spotted bonkey) So I want to identical urls to be returned from subsequent rnds together so I can do maths with their postions (so obviously if a url occurs three times in my results it'll mess everything)

not sure why min(pos) returns empty...

amen · 19-01-2004 02:41PM

post a few sample data rows (with/without the duplicate ursl)
and tell us what u think the results should be
easier to figure our the query that way

uberwolf · 19-01-2004 03:08PM

kinda hard to do that, but I'll try and give u an idea of what it looks like

engine_id is an integer between 1 and 5 for the 5 engines, query_id an integer for one of the 246 querys, round an integer for one, position - that urls ranking(1 to 300) and url - a string.
**************************
engine_id | query_id | round | position | url
1

|

1 | ---- 2 | - |
1

|

1 | ---- 3 | - |

so engine 1, query 1, round 2, would have 300 positions and 300 corresponding urls. I hadn't anticipated teh fact that some urls are repeated in the url list at diff positions.
I want to return all urls which occur once, or all distinct urls(with the highest postition for any repeated urls-highest = closest to one) in an individual round, whilst returning two rounds at a time. Then group them so any repeated urls appear subsequently in the result set.

The aim is to calc the diff between a urls position one week to the nxt. If a url occurs twice in one round it'll throw a spanner in the works so I want to remove them from consideration.
I hope this clarfies, if any other carifications are needed please ask, and thanks for the help
E

bonkey · 19-01-2004 03:22PM

Originally posted by uberwolf

I want to return all urls which occur once, or all distinct urls(with the highest postition for any repeated urls-highest = closest to one) in an individual round,
whilst returning two rounds at a time.

That is still describing exactly what my group by query should return. Just in case there's some mistake, here's the full query.

There is simply no way you could be getting results when omitting the MIN and GROUP BY, and yet no results when they're back in - as you described previously - unless your database is corrupt. Is this really the query you ran?

SELECT url, MIN(position) as FirstPosition, round
FROM result
WHERE Engine=?
AND query=?
AND (round =i OR round =i+1)
GROUP BY round, url;

Then group them so any repeated urls appear subsequently in the result set.

Assuming that what you mean is that if the same URL appears in both rounds (as you only eliminate duplicates *within* a round) then you want them appearing beside each other in the resultset.....

that would be ordering, not grouping. You're not changing any data any further...simply re-ordering the resultset data so that records you want to proces together are indeed together.

You can implement this in the same query :

SELECT url, MIN(position) as FirstPosition, round
FROM result
WHERE Engine=?
AND query=?
AND (round =i OR round =i+1)
GROUP BY round, url
ORDER BY url, round;

jc

bonkey · 19-01-2004 03:25PM

One last thing....

The above queries will produce a blank FirstPosition if (and only if) there are records which have a blank Position value....as MIN will sort NULL and empty strings before the numeric values.

If this type of record exists, then your logic needs revision, as you need to decide how to handle something without a specified position....either filter it out in the WHERE clause, or figure how you wish to handle blank Position values.

jc

uberwolf · 19-01-2004 03:39PM

Originally posted by bonkey
That is still describing exactly what my group by query should return. Just in case there's some mistake, here's the full query.

There is simply no way you could be getting results when omitting the MIN and GROUP BY, and yet no results when they're back in - as you described previously - unless your database is corrupt. Is this really the query you ran?
jc

As you say, some mistake :$ I must have left out the as position part when running that query yesterday, mitigating was my bleary eyed state, but I'm sorry anyway. Thanks for your help

another SQL prob(subqueries)

Comments