Advertisement
Help Keep Boards Alive. Support us by going ad free today. See here: https://subscriptions.boards.ie/.
If we do not hit our goal we will be forced to close the site.

Current status: https://keepboardsalive.com/

Annual subs are best for most impact. If you are still undecided on going Ad Free - you can also donate using the Paypal Donate option. All contribution helps. Thank you.
https://www.boards.ie/group/1878-subscribers-forum

Private Group for paid up members of Boards.ie. Join the club.

Tricky RegEx Problem (Java)

  • 09-02-2009 11:30AM
    #1
    Registered Users, Registered Users 2 Posts: 40


    I have a piece of code (Java) which matches arbitrary tokens in strings using regular expressions. I'm using the Apache RegEx library.

    What I'd like to to do is craft a regular expression that only matches a token if the preceding character matches a certain criterion. The tricky part is that I don't want the preceding character to part of the match.

    Here's the Java code:
    public int match(RE regEx, String s) {
       int start = 0;
       while (regEx.match(s, start) {
          handleToken(regEx.getParen(0));
          start = regEx.getParenEnd(0);
       }
    }
    

    An example would be searching for printf-style placeholders. The following expression finds them alright:
    %(( |-|\+|0|#)*([0-9]+|\*)?(\.([0-9]+|\*))?)[hlL]?[cCdiouxXeEfgGnpsS]
    
    However, this would also match the "%s" in "%%s", which is incorrect. So, I'm looking for a way to match in "foo %s bar", but not in "foo %%s bar".

    Obviously, I could do this in the Java code, but in order to keep it as flexible (and preferably simple) as possible, I'd like to do it using a regular expression only. This is especially important as not all expressions are going to have this requirement.


Comments

  • Closed Accounts Posts: 286 ✭✭Kev


    Does the Apache Regex Library you are using support negative lookbehind assertions ?

    In perl I would use this at the start of your pattern
    (?<!%)
    


  • Registered Users, Registered Users 2 Posts: 40 dob99


    Kev wrote: »
    Does the Apache Regex Library you are using support negative lookbehind assertions ?

    In perl I would use this at the start of your pattern
    (?<!%)
    

    That works with the built-in JDK Pattern class. Unfortunately it doesn't with Apache's one. Looks like I'll have to look into switching.

    Thanks for the help.


  • Registered Users, Registered Users 2 Posts: 5,618 ✭✭✭Civilian_Target


    Have you considered combining the tokenizer and the reg-ex expressions.

    Certainly, I find this to be a handy way of doing replacements on XML files that are not tag based, use reg-exps to get to the right part and tokenize on some relevant character, like "


Advertisement