Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Screen scraping web program

Options
  • 05-06-2003 8:42pm
    #1
    Registered Users Posts: 237 ✭✭


    Would anyone know how easy/difficult it would be to write a program to interrogate a website & download all fares, eg. ryanair.com , put in a route & then get all dates & prices over a 1 month range ?
    aerlingus.com is a bit painful coz it's slow & I have to keep putting in the towns and dates each time.


Comments

  • Registered Users Posts: 7,739 ✭✭✭mneylon


    I don't know how hard it would be to write it tbh, but you could look at some of the scrapers available for JSP


  • Banned (with Prison Access) Posts: 16,659 ✭✭✭✭dahamsta


    Your best bet for a question like this is ILUG, there's some world-class scraper-writers on the list. Justin Mason's pretty hot at it, as I recall.

    adam


  • Closed Accounts Posts: 304 ✭✭Zaltais


    As regards to this specific application I don't know how valuable it would be as airline rates are based on availabilty and your information would become 'stale' very quickly.....


  • Registered Users Posts: 1,842 ✭✭✭phaxx


    Take a look at www.flytowork.ie - isn't that what you're trying to do?


  • Registered Users Posts: 7,412 ✭✭✭jmcc


    Originally posted by lukegriffen
    Would anyone know how easy/difficult it would be to write a program to interrogate a website & download all fares, eg. ryanair.com , put in a route & then get all dates & prices over a 1 month range ?
    aerlingus.com is a bit painful coz it's slow & I have to keep putting in the towns and dates each time.

    Not particularly difficult (if you have a good knowledge of the language you want to use, a good knowledge of HTTP and a good knowledge of REGEXP) unless sessions are involved. It basically is four operations:

    Creating the query.
    Presenting the query.
    Grabbing the results page.
    Processing the results.

    If you are running a comparison of fares over routes, then you would need some sort of database to do it properly. Ryanair's site seems to be well integrated. The problem with the data is that it is volatile - seats/booking data changes are live etc.

    Is this just a sporadic idea or is it intended to be part of a website. Also what language/platform will be used?

    The application is a bit more complex than a simple scraper/spider such as the ones used by search engines. (Writing spiders can be a bit more complex than writing scrapers ;) as the data changes so often and some webdevs invent their own META data categories.)

    This link may be useful but you would need a Perl capable box, some Perl familiarity and perhaps some Perl heads to sort out a handler (scraper). http://www.newsclipper.com/

    There is a tendency among SysAdmins to block scrapers.

    Regards...jmcc


  • Advertisement
  • Registered Users Posts: 237 ✭✭lukegriffen


    The information would be just for my own use, for booking weekends away, so once I got the info , I'd then book within the hour. It would just save a lot of time trying to key in different dates on different routes.

    Thanks for all the replies.


Advertisement