Advertisement
Help Keep Boards Alive. Support us by going ad free today. See here: https://subscriptions.boards.ie/.
If we do not hit our goal we will be forced to close the site.

Current status: https://keepboardsalive.com/

Annual subs are best for most impact. If you are still undecided on going Ad Free - you can also donate using the Paypal Donate option. All contribution helps. Thank you.
https://www.boards.ie/group/1878-subscribers-forum

Private Group for paid up members of Boards.ie. Join the club.

JavaScript & XML

  • 06-02-2007 10:44PM
    #1
    Registered Users, Registered Users 2 Posts: 2,432 ✭✭✭


    Hey,

    I'm looking for a way to only search the characters that are outside tags.

    e.g.

    [PHP]<h1 class="firstHeading">United States dollar</h1>
    <div id="bodyContent">
    <h3 id="siteSub">From Wikipedia, the free encyclopedia</h3>
    <div id="contentSub"></div>[/PHP]

    I want my JavaScript to ignore everything thats in the tags, and only parse the "United States Dollar" and "From Wikipedia, the free encyclopedia", irregardless of what tag its in (Because generally text that isn't enclosed in a tag is the raw text)

    The closest example I can find is something like this

    [PHP]var x=xmlDoc.getElementsByTagName("title")[0].childNodes[0][/PHP]

    Which returns the 'text' within the title tag.

    I want something that will do this irregardless of that TagName (i.e. if no text in that particular tag, go onto next tag and try there, when you get to a tag that has text, do something)

    edit: Apparently the 'wholeText' from http://www.w3schools.com/dom/dom_text.asp does something similar, but it's unsupported!


Comments

  • Registered Users, Registered Users 2 Posts: 2,781 ✭✭✭amen


    so what you are trying to do is extract all the non html formating code from the dom

    I suppose you could set up a regular expression and extract the text between > and < on a per line basis

    I don't think you xml example is really going to help
    you could take the html and try and turn it into XML and then go through it but very mess

    maybe if you explained why you want the text from the page you might get some more suggestions?
    Is this for screen scraping or some other reason?


  • Registered Users, Registered Users 2 Posts: 2,432 ✭✭✭Peteee


    amen wrote:
    so what you are trying to do is extract all the non html formating code from the dom

    I suppose you could set up a regular expression and extract the text between > and < on a per line basis

    I don't think you xml example is really going to help
    you could take the html and try and turn it into XML and then go through it but very mess

    maybe if you explained why you want the text from the page you might get some more suggestions?
    Is this for screen scraping or some other reason?

    I'll try the reg ex.

    Yeah, its for screen scraping (I'm looking for a certain string, extract it, then replace it with a new value)


Advertisement