Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Scraping info from website

Options
  • 05-11-2010 8:52pm
    #1
    Registered Users Posts: 207 ✭✭


    I was hoping to start downloading information from a website on a daily basis in order to analyse the data. The files are published in html and in xml format. I then want to store the data on the hard drive of a laptop that has gone passed its usefulness and operate it almost like a network.

    My query is there a “scraper” that can allow me to download this publically available information from the website on a daily basis, and put it in a database for analysis. I am mildly technically minded and obviously use computers on a daily basis but this is pushing the realms of my abilities and I apologise if the query is a bit naive

    Any help is appreciated

    Shakeydude


Comments

  • Registered Users Posts: 2,370 ✭✭✭Knasher


    There is probably an application that can parse websites in some sort of automated way, but unless somebody can come up with a better suggestion I'd recommend just using regular expressions to parse the raw xml/html and pull the data you need into a database yourself. Provided they publish the data in a reasonably standardized way (which their use of xml would suggest they do) it really shouldn't be all that difficult.


  • Registered Users Posts: 1,530 ✭✭✭CptSternn


    Yeah, if the data is already in XML format, what would you need a scraper for?

    Just write a script that will download the XML and put it into a database.


Advertisement