Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Can this RegEx be simplified?

Options
  • 11-11-2011 12:31pm
    #1
    Registered Users Posts: 19,019 ✭✭✭✭


    I have a RegEx:
    ^(\d)+((((,(?!\d+\s))|\s)|((,(?!((\d+\s)|(\s+\d+\s+))))\s))\d+)*$

    Its purpose is simple enough so I think I'm probably over complicating it.

    It is used as a mask in a stripes (Java frontend framework) validation to validate user input.

    The user is expected to enter a list of integers, separated by EITHER one comma OR one space OR one comma followed by one space. No other input formats are acceptable.

    The user must be consistent however-no mixing comma separation with space separation (to prevent typos causing problems later). To this end I reject the entire input if there's any instance of a comma being followed by a space being followed by a digit being followed by a space OR any instance of a comma being followed by a digit followed by a space.

    I'm fairly satisfied that the RegEx works...but is there a more elegant solution?


Comments

  • Registered Users Posts: 339 ✭✭duffman85


    I got this:
    ^(\d)+(\s|,|,\s)((\d+)(\2))*\d+
    

    to match

    ", " => 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
    " " => 1 2 3 4 5 6 7 8 9 10
    "," => 1,2,3,4,5,6,7,8,9,10,


    on the regex tester here: http://www.regular-expressions.info/javascriptexample.html


  • Registered Users Posts: 19,019 ✭✭✭✭murphaph


    Hi duffman, thx for your efforts.

    I had a look but it matched the "1, 2" part from "1, 2 3, 4", which would not be desired as it would be a typo (missing comma-user wanted to enter "1, 2, 3, 4" and the validation mask is supposed to catch this and reject the whole string).I've been asked to also make sure a CSV with a trailing comma is rejected by the validator.

    Using your solution with a "$" stuck on the end however seems perfect, and much neater than mine! Cheers.

    I'm not great at RegEx-can you explain what the \2 bit does? My googling says \n means nth group/subpattern but I don't get what that means here?

    Ah I get it, you use the \2 to say "whatever pattern in the second group (the group with the commas and spaces) matched, use this for all subsequent pattern matching"-very elegant, I'll be using that again, thanks duffman!


  • Registered Users Posts: 339 ✭✭duffman85


    murphaph wrote: »
    Hi duffman, thx for your efforts.

    I had a look but it matched the "1, 2" part from "1, 2 3, 4", which would not be desired as it would be a typo (missing comma-user wanted to enter "1, 2, 3, 4" and the validation mask is supposed to catch this and reject the whole string).I've been asked to also make sure a CSV with a trailing comma is rejected by the validator.

    Using your solution with a "$" stuck on the end however seems perfect, and much neater than mine! Cheers.

    No problem - glad it helped.
    murphaph wrote: »
    I'm not great at RegEx-can you explain what the \2 bit does? My googling says \n means nth group/subpattern but I don't get what that means here?

    Ah I get it, you use the \2 to say "whatever pattern in the second group (the group with the commas and spaces) matched, use this for all subsequent pattern matching"-very elegant, I'll be using that again, thanks duffman!

    Yes, that's it - very handy. I'm no regex expert myself, usually just trial & error and some googling. :D

    A proper explanation of grouping and backreferences (\2 etc.) is here: http://www.regular-expressions.info/brackets.html


  • Registered Users Posts: 1,311 ✭✭✭Procasinator


    Would this work:
    ^\d+((\s\d+)*|(,\s\d+)*|(,\d+)*)$
    

    Should match:

    1 2 3 4
    1,2,3,4
    1, 2, 3, 4

    If you don't mind:
    1,2, 3, 4 (i.e. inconsistent usage of comma + space, but always a comma) you can change it to:
    ^\d+((\s\d+)*|(,\s?\d+)*)$
    


  • Registered Users Posts: 1,419 ✭✭✭Cool Mo D


    Surely the java string.split() method is a more appropriate approach? If you split on each delimiter in turn, and test if you can convert the any of the returned array of strings into an array of integers, you can avoid the messy corner cases the regexes can introduce, and is more readable to boot.


  • Advertisement
  • Registered Users Posts: 19,019 ✭✭✭✭murphaph


    It has to be a regex for the Stripes framework validation AFAIK. It's a mask applied to the input. I do something similar to what you suggest inside the ActionBean (Stripes name for a servlet) though. Cheers.


Advertisement