JavaScript & XML

Peteee · 06-02-2007 10:44PM #1

Hey,

I'm looking for a way to only search the characters that are outside tags.

e.g.

[PHP]<h1 class="firstHeading">United States dollar</h1>
<div id="bodyContent">
<h3 id="siteSub">From Wikipedia, the free encyclopedia</h3>
<div id="contentSub"></div>[/PHP]

I want my JavaScript to ignore everything thats in the tags, and only parse the "United States Dollar" and "From Wikipedia, the free encyclopedia", irregardless of what tag its in (Because generally text that isn't enclosed in a tag is the raw text)

The closest example I can find is something like this

[PHP]var x=xmlDoc.getElementsByTagName("title")[0].childNodes[0][/PHP]

Which returns the 'text' within the title tag.

I want something that will do this irregardless of that TagName (i.e. if no text in that particular tag, go onto next tag and try there, when you get to a tag that has text, do something)

edit: Apparently the 'wholeText' from http://www.w3schools.com/dom/dom_text.asp does something similar, but it's unsupported!

amen · 07-02-2007 02:36PM

so what you are trying to do is extract all the non html formating code from the dom

I suppose you could set up a regular expression and extract the text between > and < on a per line basis

I don't think you xml example is really going to help
you could take the html and try and turn it into XML and then go through it but very mess

maybe if you explained why you want the text from the page you might get some more suggestions?
Is this for screen scraping or some other reason?

Peteee · 10-02-2007 07:24PM

amen wrote:

so what you are trying to do is extract all the non html formating code from the dom

I suppose you could set up a regular expression and extract the text between > and < on a per line basis

I don't think you xml example is really going to help
you could take the html and try and turn it into XML and then go through it but very mess

maybe if you explained why you want the text from the page you might get some more suggestions?
Is this for screen scraping or some other reason?

I'll try the reg ex.

Yeah, its for screen scraping (I'm looking for a certain string, extract it, then replace it with a new value)

JavaScript & XML

Comments