Advertisement
Help Keep Boards Alive. Support us by going ad free today. See here: https://subscriptions.boards.ie/.
If we do not hit our goal we will be forced to close the site.

Current status: https://keepboardsalive.com/

Annual subs are best for most impact. If you are still undecided on going Ad Free - you can also donate using the Paypal Donate option. All contribution helps. Thank you.
https://www.boards.ie/group/1878-subscribers-forum

Private Group for paid up members of Boards.ie. Join the club.

Data Mining/Building HTTP Requests

  • 25-03-2010 07:09PM
    #1
    Registered Users, Registered Users 2 Posts: 10,148 ✭✭✭✭


    Hey, I'm looking for some help with a side-project I'm working on.

    I'm trying to scrape a number of ID's from a website, and then use those ID's to build up a URL which will get me particular data for that ID. The problem I'm facing is that the website I'm hitting is just not properly processing the requests I'm making (I think by going through the website it's adding some sort of cookie or session). I don't care how I get the information I'm looking for, just want any solution!

    The website I'm trying to scrape from is at http://www.tse.or.jp/tseHpFront/HPLCDS0101E.do?method=init&callJorEFlg=1

    Do a search and you'll get a number of search items. Click "Display Stock Price" and you get a page of information for that stock. Basically, I want to try and get all the data off the page for all stocks.

    Any thoughts, ideas or suggestions on how I do this?


Comments

  • Registered Users, Registered Users 2 Posts: 885 ✭✭✭clearz


    Yes it seems that it is using some sort of method to see if the request is coming from the same place as the search. It could be a cookie or a session or the referrer http header. You can always check this by copying and pasting the url into another browser or a private session in your current browser.

    When I am developing a scarper for any site I use wireshark if I run into problems to see what exactly is being sent in the requests and responces.


Advertisement