Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

SQL "intersect" statement

Options
  • 02-06-2011 12:22pm
    #1
    Registered Users Posts: 1,657 ✭✭✭


    Hi folks,

    I have an SQL statement which is trying to find records which are common to a number of different search terms

    e.g. a table containing two fields post_id and search_word. each post_id could have many records each with one value for search_word.
    I want to get a list of post_ids which contain all of the searched words.

    SELECT post_id FROM wp_index_search WHERE search_word = 'Hello' AND post_id IN
    (SELECT post_id FROM wp_index_search WHERE search_word = 'this' AND post_id IN
    (SELECT post_id FROM wp_index_search WHERE search_word = 'is' AND post_id IN
    (SELECT post_id FROM wp_index_search WHERE search_word = 'a' AND post_id IN
    (SELECT post_id FROM wp_index_search WHERE search_word = 'long' AND post_id IN
    (SELECT post_id FROM wp_index_search WHERE search_word = 'search')))))

    Now, one of those SELECTs takes about 0.0006 seconds, however each one you add makes the search take exponentially longer - by the time you get to about 5, it takes several minutes and 6 takes over an hour. There is an index each for both post_id and search_word.

    Why is that?

    Is there any "correct" way of getting the same effect?


Comments

  • Registered Users Posts: 2,781 ✭✭✭amen


    First of all which DB are you using MSSQL Server, Oracle, MySQL or something else?

    I'm not sure what you are trying do. Are you
    1: trying to search each post for the following words "My", "Search"
    2: Searching for all posts that contain the words "My Search"


  • Registered Users Posts: 1,657 ✭✭✭komodosp


    MySQL

    and it's #1 above. I want to return the post if it has all of the requested words. There can be many records with the same post_id.


  • Registered Users Posts: 14,714 ✭✭✭✭Earthhorse


    Perhaps I'm not understanding the question correctly but can you not replace the search_word = 'hello' part of your query with search_word IN ('hello', 'this') etc. and do it in one SELECT?


  • Registered Users Posts: 4,766 ✭✭✭cython


    Earthhorse wrote: »
    Perhaps I'm not understanding the question correctly but can you not replace the search_word = 'hello' part of your query with search_word IN ('hello', 'this') etc. and do it in one SELECT?

    I think the OP wants to match only the posts that have all the words being searched for, rather than all those posts that contain any one of the terms. Basically in the manner that the IN clause eliminates the need for multiple OR statements, what they seek would be something that would eliminate the need for multiple AND statements.


  • Registered Users Posts: 4,766 ✭✭✭cython


    OP, I'm not too sure of a solution, but when I searched for nested IN statements, as you have above, I found this page outlining how joins may be more efficient. Would this be of any use to you?


  • Advertisement
  • Registered Users Posts: 2,781 ✭✭✭amen


    This is a tricky problem.

    For instance if you are searching for "my","list" they have to be in that order or can they be in any order?

    Using IN in most databases often equates to using OR and thus hurts performance.

    I think what your really want is "Full Text" search and you are in luck as MySql supports this. A good starting point is


  • Registered Users Posts: 1,657 ✭✭✭komodosp


    For instance if you are searching for "my","list" they have to be in that order or can they be in any order?
    Order doesn't matter - the statement I posted in the OP returns the right results, it's just the performance I'm worried about.

    I'll take a look at the links posted anyway...
    It takes place in a PHP program so if the worst comes to the worst I will just put in a sequence of sql statements... I'd rather not do it that way though.


  • Registered Users Posts: 1,657 ✭✭✭komodosp


    Cython's link seems to be doing the trick!

    Having a query with 4 levels of nested INs took 33 seconds.
    Doing the equivalent with INNER JOINs only took 0.126 seconds

    Now I just have to make absolutely sure I am getting the right results


  • Registered Users Posts: 2,089 ✭✭✭henryporter


    Would you not be better joining the tables in a View containing all the queries you need?


  • Registered Users Posts: 4,766 ✭✭✭cython


    komodosp wrote: »
    Cython's link seems to be doing the trick!

    Having a query with 4 levels of nested INs took 33 seconds.
    Doing the equivalent with INNER JOINs only took 0.126 seconds

    Now I just have to make absolutely sure I am getting the right results

    Good to hear. The part in bold was why I thought it better to post a link like that than try to figure it out and post a solution, because without a sample structure/set of data, it would be very difficult to verify


  • Advertisement
Advertisement