Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

java regex

Options
  • 05-02-2009 11:43pm
    #1
    Registered Users Posts: 163 ✭✭


    I need two java regex expressions to parse the anchor text and url from a http link like this: "<a href="http://www.example.com/chapter2.html">chapter two</a>"

    so what i want to be left with is
    url: http://www.example.com/chapter2.html
    anchor: chapter two

    I have something like this for the url : "http://[a-zA-Z_0-9.-&/+=]+&quot;

    it works for simple urls but exotic characters mess it up, alos im using the " at the end to end the match, don't think this is the best way

    I have ">[a-zA-Z_0-9[\\W]]+<a/>" for the anchor, but they don''t seen to cover each eventually, has any one got a set that would handle any permutation ?


Comments

  • Subscribers Posts: 4,076 ✭✭✭IRLConor


    Untested:
    Pattern p = Pattern.compile("<a\s+.*?href=(?:\"(.*?)\"|'(.*?)').*?>(.*?)</a>");
    Matcher m = p.matcher(theStringToSearch);
    if (m.matches()) {
        String anchorText = m.group(3);
        String url = m.group(1);
        if (url == null) {
            url = m.group(2);
        }
    }
    


  • Registered Users Posts: 163 ✭✭stephenlane80


    thanx, i will try it this afternoon, but it looks pretty good


Advertisement