Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

.Net LoadXml (string) gives 403 - forbidden

Options
  • 30-10-2010 1:22pm
    #1
    Registered Users Posts: 6,465 ✭✭✭


    Trying to pull data from a website in VB.Net

    I've read the data with a WebRequest and WebResponse object, converted it to a string with a StreamReader.

    When I try to generate an XmlDocument from that using LoadXml, I'm getting a 403 - Forbidden error.

    I don't get this - I've already got the data, the LoadXml method shouldn't be trying to access the server, it should just be using the string I pass it?
    Am I misunderstanding something here?
    webRequest = System.Net.WebRequest.Create("http://thewebpage.com")
            webResponse = webRequest.GetResponse()
    
            sr = New StreamReader(webResponse.GetResponseStream)
            result = sr.ReadToEnd
    
            xml.LoadXml(result)
    


Comments

  • Registered Users Posts: 330 ✭✭leahcim


    Check what data is in the result variable just before you call the loadxml, it might be empty or contain garbage.

    You could also use the XmlDocument class to load xml from a web site.

    XmlDocument xml = new XmlDocument();
    xml.load(@"http://thewebpage.com/page.xml");


  • Registered Users Posts: 339 ✭✭duffman85


    You may have got a HTTP 403 response when you called GetResponse() - this means you are not allowed access this page or folder and you just receive the headers and not XML/HTML you want.

    check the first header in the webResponse.Headers collection

    http://msdn.microsoft.com/en-us/library/system.net.webresponse.headers%28v=VS.90%29.aspx

    If the first entry is HTTP\1.1 403 Forbidden then check in a browser to see what happens.


  • Registered Users Posts: 6,465 ✭✭✭MOH


    It's not a 403 from the web page I'm scraping, cos the results are coming back fine in the result string.

    I think the problem is the LoadXml call is trying to access the DTD specified in the doctype, and it's that which is returning the 403.

    [edit]
    This is in fact the problem. I tried loading it through an XmlReader object and this time got a useful error message which confirmed that it was the DTD URL which was forbidden - dunno why the same error gave me sod all info the other way.
    Apparently w3.org block on User Agent to avoid excessive traffic.


Advertisement