Advertisement
Help Keep Boards Alive. Support us by going ad free today. See here: https://subscriptions.boards.ie/.
If we do not hit our goal we will be forced to close the site.

Current status: https://keepboardsalive.com/

Annual subs are best for most impact. If you are still undecided on going Ad Free - you can also donate using the Paypal Donate option. All contribution helps. Thank you.
https://www.boards.ie/group/1878-subscribers-forum

Private Group for paid up members of Boards.ie. Join the club.

screen scraper help

  • 25-03-2010 04:37PM
    #1
    Registered Users, Registered Users 2 Posts: 495 ✭✭


    like buses.. similar to the topic in this thread - http://www.boards.ie/vbulletin/showthread.php?t=2055861895 - I'm writing a scraper for soccerbase.com in java

    I can open a connection to the frontpage successfully, but if I try and open a connection to any of the results pages (e.g. http://www.soccerbase.com/results_by_date.sd?date=2010-01-27+00%3A00%3A00 ) or the details for a particular game (e.g. http://www.soccerbase.com/results3.sd?gameid=609770 ) using the code below, the server returns a 500 error

    I've examined the request and response headers in Firefox using Live HTTP Headers, and the first response does have the 500 return value, but for some reason firefox (and opera and internet explorer) are ignoring it and displaying the screen correctly.

    I've tried connecting to the above URLs using cURL and wget. Bizarrely cURL returns the html for the page correctly, but wget throws the same 500 exception.

    Any ideas?
    import java.io.InputStreamReader;
    import java.io.Reader;
    import java.net.URL;
    import java.net.URLConnection;
    
    public class Test {
    
    	public static void main(String args[]) {
    
    		try {
    			URLConnection conn = new URL("http://www.soccerbase.com/results_by_date.sd?date=2010-01-27+00%3A00%3A00").openConnection();
    			InputStreamReader inputStreamReader = new InputStreamReader(conn.getInputStream());
    		}
    		catch(java.io.IOException e) {
    
    			System.out.println("Exception thrown.\ne.getMessage(): " + e.getMessage());
    		}
    
    	}
    
    }
    
    


Comments

  • Registered Users, Registered Users 2 Posts: 495 ✭✭tetsujin1979


    nevermind, figured it out
    the text is coming in from the error stream, so when the exception is thrown, connect to the error stream in the URL Connection and read in from there.


Advertisement