.Net LoadXml (string) gives 403 - forbidden

MOH · 30-10-2010 1:22pm #1

Trying to pull data from a website in VB.Net

I've read the data with a WebRequest and WebResponse object, converted it to a string with a StreamReader.

When I try to generate an XmlDocument from that using LoadXml, I'm getting a 403 - Forbidden error.

I don't get this - I've already got the data, the LoadXml method shouldn't be trying to access the server, it should just be using the string I pass it?
Am I misunderstanding something here?

webRequest = System.Net.WebRequest.Create("http://thewebpage.com")
        webResponse = webRequest.GetResponse()

        sr = New StreamReader(webResponse.GetResponseStream)
        result = sr.ReadToEnd

        xml.LoadXml(result)

leahcim · 30-10-2010 2:24pm

Check what data is in the result variable just before you call the loadxml, it might be empty or contain garbage.

You could also use the XmlDocument class to load xml from a web site.

XmlDocument xml = new XmlDocument();
xml.load(@"http://thewebpage.com/page.xml");

duffman85 · 31-10-2010 12:36pm

You may have got a HTTP 403 response when you called GetResponse() - this means you are not allowed access this page or folder and you just receive the headers and not XML/HTML you want.

check the first header in the webResponse.Headers collection

http://msdn.microsoft.com/en-us/library/system.net.webresponse.headers%28v=VS.90%29.aspx

If the first entry is HTTP\1.1 403 Forbidden then check in a browser to see what happens.

MOH · 01-11-2010 10:04pm

It's not a 403 from the web page I'm scraping, cos the results are coming back fine in the result string.

I think the problem is the LoadXml call is trying to access the DTD specified in the doctype, and it's that which is returning the 403.

[edit]
This is in fact the problem. I tried loading it through an XmlReader object and this time got a useful error message which confirmed that it was the DTD URL which was forbidden - dunno why the same error gave me sod all info the other way.
Apparently w3.org block on User Agent to avoid excessive traffic.

.Net LoadXml (string) gives 403 - forbidden

Comments