Data Mining/Building HTTP Requests

Raskolnikov · 25-03-2010 07:09PM #1

Hey, I'm looking for some help with a side-project I'm working on.

I'm trying to scrape a number of ID's from a website, and then use those ID's to build up a URL which will get me particular data for that ID. The problem I'm facing is that the website I'm hitting is just not properly processing the requests I'm making (I think by going through the website it's adding some sort of cookie or session). I don't care how I get the information I'm looking for, just want any solution!

The website I'm trying to scrape from is at http://www.tse.or.jp/tseHpFront/HPLCDS0101E.do?method=init&callJorEFlg=1

Do a search and you'll get a number of search items. Click "Display Stock Price" and you get a page of information for that stock. Basically, I want to try and get all the data off the page for all stocks.

Any thoughts, ideas or suggestions on how I do this?

clearz · 28-03-2010 08:56PM

Yes it seems that it is using some sort of method to see if the request is coming from the same place as the search. It could be a cookie or a session or the referrer http header. You can always check this by copying and pasting the url into another browser or a private session in your current browser.

When I am developing a scarper for any site I use wireshark if I run into problems to see what exactly is being sent in the requests and responces.

Data Mining/Building HTTP Requests

Comments