Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

search algorithm

Options
  • 22-03-2002 3:10am
    #1
    Registered Users Posts: 762 ✭✭✭


    Hi guys,

    I've recently designed a links directory for my site and it works pretty well except the search algorithm is not so good.

    The database uses a standard directory structure, e.g. each link has a Title, URL, Description, etc.

    The current select statement is

    Select * From links where title like '%" &q& "%' or description like '%" &q& "%' (where q is the search term inputted from a search box.)

    This pretty much finds all relevant records but I need something that shows the nearest matches at the top of the results.

    E.g., show sites that contain 'q' in the title and description first, followed by sites that contain 'q' in the title, then 'q' in description, and so on.

    Anybody done anything in this area?


Comments

  • Registered Users Posts: 2,494 ✭✭✭kayos


    You could do something like the following
    /*
    ** Create a temp table with an identity column so we have an order on the results
    */
    create table #tmp
    (
    LinkID 		int NOT NULL IDENTITY (1, 1),
    Title  		varchar(50),
    Url  		varchar(50),
    [Description]  	varchar(50)
    )
    
    /*
    ** First insert where the search string is found in both the title and the desc
    */
    insert
    	#tmp
    select
    	L.Title,
    	L.Url,
    	L.[Description]
    from
    	Links as L
    where
    	L.Title like '%q%'
    and
    	L.[Description] like '%q%'
    order by
    	l.Title asc,
    	l.[Description] asc
    
    /*
    ** Next insert where the search string is found the title only
    */
    insert
    	#tmp
    select
    	L.Title,
    	L.Url,
    	L.[Description]
    from
    	Links as L
    where
    	L.Title like '%q%'
    order by
    	l.Title asc,
    	l.[Description] asc
    
    /*
    ** Next insert where the search string is found the desc only
    */
    insert
    	#tmp
    select
    	L.Title,
    	L.Url,
    	L.[Description]
    from
    	Links as L
    where
    	L.[Description] like '%q%'
    order by
    	l.Title asc,
    	l.[Description] asc
    
    /*
    ** Select out our results
    */
    Select Distinct
    	*
    from
    	#tmp
    order by
    	linkid asc
    
    /*
    ** drop the temp table
    */
    drop table #tmp
    
    
    But I dont know if this is really what you want to do. I think the better way would be to count the occurances of the search string in the description and order by that me thinks.

    One thing description is a reserved word that is way I have it wrapped in []

    kayos


  • Registered Users Posts: 762 ✭✭✭Terminator


    Populating a temporary table sounds like a great idea (genius in fact) but lets say the search results return 100 links spanning 4 pages and there's every chance that the user is gonna view all 4 pages rather than just one. How can I ensure that the table is only dropped after they've left the site completely?


  • Registered Users Posts: 2,494 ✭✭✭kayos


    What are you doing this in ASP/Perl/PHP/CGI if its ASP I'll give a looksee to what you can do but any of the others someone else will have to help you.

    kayos


  • Closed Accounts Posts: 19,777 ✭✭✭✭The Corinthian


    Originally posted by kayos
    What are you doing this in ASP/Perl/PHP/CGI if its ASP I'll give a looksee to what you can do but any of the others someone else will have to help you.
    Don’t you think the ampersands in his code give that away?


  • Registered Users Posts: 762 ✭✭✭Terminator


    I'm using access and asp.

    I could have used MSQL but decided against it because the database is only being used when somebody does a search or when I'm doing a build of the directory.


  • Advertisement
  • Registered Users Posts: 2,494 ✭✭✭kayos


    Originally posted by The Corinthian

    Don’t you think the ampersands in his code give that away?

    Well if they gave it away to me I wouldn't be asking. I am not a web developer. I do business objects, architecture and db's not web stuff.

    To me the following code gives it away but how about you...
    WITH
    /*
    ** Company %
    */
    MEMBER [Measures].[Company %] AS '(SUM ( CROSSJOIN( {[Company].CurrentMember},{  [FundManager].CurrentMember }) , [Measures].[Holding] ) / SUM ( CROSSJOIN({ [FundManager].[All FundManager] },{[Company].CurrentMember}) , [Measures].[Holding] ) ) ', SOLVE_ORDER = 1, Format_String = 'PERCENT'
    /*
    ** Sector %
    */
    MEMBER [Measures].[Sector %] AS '( SUM ( CROSSJOIN ( { [Company].CurrentMember.Parent },{[FundManager].CurrentMember } ) , [Measures].[Holding]) / SUM ( CROSSJOIN ( { [Company].CurrentMember.Parent },{[FundManager].[All FundManager] } ) , [Measures].[Holding])) ', SOLVE_ORDER = 2, FORMAT_STRING = 'PERCENT'
    /*
    ** Market %
    */
    MEMBER [Measures].[Market %] AS '( SUM ( CROSSJOIN ( { [Company].[All Company] },{[FundManager].CurrentMember } ) , [Measures].[Holding]) / SUM ( CROSSJOIN ({ [Company].[All Company] },{[FundManager].[All FundManager] } ) , [Measures].[Holding])) ', SOLVE_ORDER = 3, FORMAT_STRING = 'PERCENT'
    /*
    ** Sector Weight
    */
    MEMBER [Measures].[Sector Weight] AS 'VAL([Measures].[Company %] / [Measures].[Sector %]) * 100' , SOLVE_ORDER = 3,FORMAT = '###'
    /*
    ** Market Weight
    */
    MEMBER [Measures].[Market Weight] AS 'VAL([Measures].[Company %] / [Measures].[Market %]) * 100' , SOLVE_ORDER = 4,FORMAT = '###'
    
    SELECT
    NON EMPTY { [Measures].[Holding] , [Measures].[Company %] , [Measures].[Sector %],[Measures].[Market %],[Measures].[Sector Weight],[Measures].[Market Weight] } on Columns,
    NON EMPTY { ORDER ( {Filter( CROSSJOIN({[Company].[All Company].[Telecommunication Services].[Vodafone Group] },{DESCENDANTS( [FundManager].[All FundManager] ,  [FundManager].[Fund Manager]  ) }),[Measures].[Company %] > 0.01)} , [Measures].[Company %] , BDESC)  } on ROWS
    FROM
    [Online]
    

    Dont flame me boy.

    kayos


  • Registered Users Posts: 2,494 ✭✭✭kayos


    Originally posted by Terminator
    I'm using access and asp.

    Ok you could create a permnament table for the results and add a sessionID column and fill it with a unique identifer for that persons session.

    Then when you are doing a search you would pass both the search string and the session id and check to see if there are any results in the table for that session if there are delete them if not just add the new results.

    Then when it comes to going through the pages just pass the last LinkID on the page you are leaving and only select the next results over that ID.

    This is most likely not the bast way of doing things but hey it would work :)

    kayos


  • Registered Users Posts: 762 ✭✭✭Terminator


    I suppose I could keep the table size in check by deleting old records everytime somebody does a new search.

    Like you say there's probably a better way but if it works and its not too slow then that will be all I need. :D

    Thanks again


  • Registered Users Posts: 2,494 ✭✭✭kayos


    Sorry I did mean to tell you to clear the results table for any session that finished. THere is a session terminate/end/close type event isn't there?

    Also what you want to do is once a night or every couple of hours is to Truncate the table this deletes the table but also set back the identity column to start at 1. Just make sure there are no active results in the table when you issue that command

    kayos


  • Registered Users Posts: 7,412 ✭✭✭jmcc


    Originally posted by Terminator
    Hi guys,

    I've recently designed a links directory for my site and it works pretty well except the search algorithm is not so good.

    The database uses a standard directory structure, e.g. each link has a Title, URL, Description, etc.

    The current select statement is

    Select * From links where title like '%" &q& "%' or description like '%" &q& "%' (where q is the search term inputted from a search box.)

    This pretty much finds all relevant records but I need something that shows the nearest matches at the top of the results.

    It is late and this may not make much sense. :)
    First split the query into a few queries. Just going on the title/description aspect:

    select url from links where title like '%$q%' and url like '%$q%';

    Then use a single queries for $q in title and for $q in description.

    Another thing that you can use is rlike which is a precise and case sensitive version of "like".

    The alternative is to use a free search engine like Udmsearch (mnogo ?). http://search.mnogo.ru/

    Regards...jmcc


  • Advertisement
  • Registered Users Posts: 762 ✭✭✭Terminator


    Sounds interesting. But as the results for certain queries, e.g., "Irish" or "Ireland" will be spanning several pages, I'd need to know how to convert these multiple queries into a single recordset?


  • Registered Users Posts: 2,781 ✭✭✭amen


    fpr paging look at something like
    http://stardeveloper.com:8080/asp_paging_2.asp

    or just enter
    asp recordset in google


  • Registered Users Posts: 7,412 ✭✭✭jmcc


    Originally posted by Terminator
    Sounds interesting. But as the results for certain queries, e.g., "Irish" or "Ireland" will be spanning several pages, I'd need to know how to convert these multiple queries into a single recordset?

    Use temporary tables to build up the queries and have an auto_incrementing record_id. this way you can use ORDER BY record_id DESC. The safe thing to do would be to block words that will appear on every page or title/description.

    Doing this offline would be a craftier way of handling user queries: A keyword table would have a list of keywords and a list or urls. Basically you would have a set of urls for each as in keyword{url(0),url(1)....url(n)}. Thus when a user searches, the keyword table would be searched as in: SELECT url from keywordtable where $q rlike keyword. Common phrases that appear in every page/title/description can also be blocked when generating this keyword table. However since you are controlling what the user gets, the server doesn't get stressed with a huge query.

    I still think that if the site is going to have more than a few hundred pages, using a search engine link Udmsearch is a better way of doing things. You can set it to reindex the site every night or every few hours. A search engine prog has the advantage that it will base the results on the url/title/description and body text.

    Regards...jmcc


Advertisement