Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Data Mining/Building HTTP Requests

Options
  • 25-03-2010 7:09pm
    #1
    Registered Users Posts: 10,148 ✭✭✭✭


    Hey, I'm looking for some help with a side-project I'm working on.

    I'm trying to scrape a number of ID's from a website, and then use those ID's to build up a URL which will get me particular data for that ID. The problem I'm facing is that the website I'm hitting is just not properly processing the requests I'm making (I think by going through the website it's adding some sort of cookie or session). I don't care how I get the information I'm looking for, just want any solution!

    The website I'm trying to scrape from is at http://www.tse.or.jp/tseHpFront/HPLCDS0101E.do?method=init&callJorEFlg=1

    Do a search and you'll get a number of search items. Click "Display Stock Price" and you get a page of information for that stock. Basically, I want to try and get all the data off the page for all stocks.

    Any thoughts, ideas or suggestions on how I do this?


Comments

  • Registered Users Posts: 885 ✭✭✭clearz


    Yes it seems that it is using some sort of method to see if the request is coming from the same place as the search. It could be a cookie or a session or the referrer http header. You can always check this by copying and pasting the url into another browser or a private session in your current browser.

    When I am developing a scarper for any site I use wireshark if I run into problems to see what exactly is being sent in the requests and responces.


Advertisement