Scraping info from website

shakeydude · 05-11-2010 8:52pm #1

I was hoping to start downloading information from a website on a daily basis in order to analyse the data. The files are published in html and in xml format. I then want to store the data on the hard drive of a laptop that has gone passed its usefulness and operate it almost like a network.

My query is there a “scraper” that can allow me to download this publically available information from the website on a daily basis, and put it in a database for analysis. I am mildly technically minded and obviously use computers on a daily basis but this is pushing the realms of my abilities and I apologise if the query is a bit naive

Any help is appreciated

Shakeydude

Knasher · 08-11-2010 7:57pm

There is probably an application that can parse websites in some sort of automated way, but unless somebody can come up with a better suggestion I'd recommend just using regular expressions to parse the raw xml/html and pull the data you need into a database yourself. Provided they publish the data in a reasonably standardized way (which their use of xml would suggest they do) it really shouldn't be all that difficult.

CptSternn · 09-11-2010 1:57pm

Yeah, if the data is already in XML format, what would you need a scraper for?

Just write a script that will download the XML and put it into a database.

Scraping info from website

Comments