Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Tricky RegEx Problem (Java)

Options
  • 09-02-2009 11:30am
    #1
    Registered Users Posts: 40


    I have a piece of code (Java) which matches arbitrary tokens in strings using regular expressions. I'm using the Apache RegEx library.

    What I'd like to to do is craft a regular expression that only matches a token if the preceding character matches a certain criterion. The tricky part is that I don't want the preceding character to part of the match.

    Here's the Java code:
    public int match(RE regEx, String s) {
       int start = 0;
       while (regEx.match(s, start) {
          handleToken(regEx.getParen(0));
          start = regEx.getParenEnd(0);
       }
    }
    

    An example would be searching for printf-style placeholders. The following expression finds them alright:
    %(( |-|\+|0|#)*([0-9]+|\*)?(\.([0-9]+|\*))?)[hlL]?[cCdiouxXeEfgGnpsS]
    
    However, this would also match the "%s" in "%%s", which is incorrect. So, I'm looking for a way to match in "foo %s bar", but not in "foo %%s bar".

    Obviously, I could do this in the Java code, but in order to keep it as flexible (and preferably simple) as possible, I'd like to do it using a regular expression only. This is especially important as not all expressions are going to have this requirement.


Comments

  • Closed Accounts Posts: 286 ✭✭Kev


    Does the Apache Regex Library you are using support negative lookbehind assertions ?

    In perl I would use this at the start of your pattern
    (?<!%)
    


  • Registered Users Posts: 40 dob99


    Kev wrote: »
    Does the Apache Regex Library you are using support negative lookbehind assertions ?

    In perl I would use this at the start of your pattern
    (?<!%)
    

    That works with the built-in JDK Pattern class. Unfortunately it doesn't with Apache's one. Looks like I'll have to look into switching.

    Thanks for the help.


  • Registered Users Posts: 5,618 ✭✭✭Civilian_Target


    Have you considered combining the tokenizer and the reg-ex expressions.

    Certainly, I find this to be a handy way of doing replacements on XML files that are not tag based, use reg-exps to get to the right part and tokenize on some relevant character, like "


Advertisement