Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Regular Expression Help

Options
  • 20-11-2008 10:14am
    #1
    Registered Users Posts: 500 ✭✭✭


    I am using a language that rquires regular expressions. Its all new to me.

    What I want is to match anything after the www.example.com.

    So for instance I want to match just "/" and "/search/"

    But i dont know how to match both urls in the one expression. Any help?
    This is what i have so far

    /?[search]?/?

    Ill explain my thinking. everything is optional because it may have nothing. I want that to match. It may just have a /. It may have /search or may have /search/. I am not sure I made the seach bit optional by putting it in a character class.

    Please help


Comments

  • Registered Users Posts: 85 ✭✭slavigo


    Hi,
    You seem to have tried so I presume it's ok to write the following.
    At first it can be hard to get your head around the options and what they can do.
    They can be very powerful when used in the correct place.
    They can also have high processing times associated with them but in an instance like this you wouldn't worry about something like that.

    Note: All my examples below are written in python RE syntax so you may have to change it to match the RE syntax for the language your using.

    Here is the easier of several options.
    (?<=com)(?:/search/|/search|/)(?=$)
    

    The central portion is where the matching is done:
    /search/|/search|/

    This will match '/search/' or '/search' or '/'
    The main thing to take is that we are matching only one. The '|' is the OR operator. This OR This OR This

    This is all you need for your example "www.example.com" but if you are processing full URLs your also likely to have a 'http://' at the start of it and the '/'s from this are also going to get matched.

    To counter that we add a condition before the string we are matching that says "only match if it is preceded by this string"
    That's what this does,
    (?<=com)

    Only match if preceded by the string 'com'

    The trailing portion,
    (?=$)

    Says that it must be followed by the end of the string.
    '$' denotes the end of the string (or line in some cases). This is only good to you if you are running your RE across a URL and that's all. If you are pulling it out of a body of text. You won't be able to search for the end of the string.

    The '(?:' and ending ')' around the strings we are matching just groups the three options together, without marking them as a group in the results.
    Have a read about RE groups for more.
    This will apply the pre and post conditions to which ever of the central strings gets matched.

    Hope the above helped.

    If you fancy learning more, I know it might not be the language your using but you can still read up on the possibilities (in python anyway) at
    python RE module

    I use a handy wee python application called 'kiki' which allows you to test your regular expressions on bodies of text when developing.

    Also, just for fun (because everyone knows regular expressions are fun :)) here's another way of doing it. The best way to learn regular expressions is to practice. Also, trying to understand examples.
    (?<=com)/(?=search|$)(?:search)?(?:(?<=search)/)?
    
    A '/' preceded by 'com' and followed by 'search' or end of string.
    Followed by 0 or 1 instances of 'search'.
    Followed by 0 or 1 instances of '/' if it's preceded by 'search'

    Matching the single '/' for this example depends on what your searching.
    For this regular expression it must be the end of the string.
    (Or a new line if you've marked it as a multiline search. See module)


    I hope this wasn't too long winded.
    If it is just say and I'll start a little lower.

    Enjoy.


  • Registered Users Posts: 501 ✭✭✭rtmie


    As someone who still struggles to remember regex syntax, I fall back on :

    txt2regex

    R


  • Closed Accounts Posts: 13 Deserved


    My sokution is:

    [w]{3}\.[\w\-\.]+\.[\w]{2,5}(([\/])([\w\-]+[\/]{0,1}))

    From link:
    http://www.blabla.com/big/

    It will select:
    /big/
    big/
    /

    If you want to select everything after / than you expression have to look like:

    [w]{3}\.[\w\-\.]+\.[\w]{2,5}[\/]([\w\-\/]+)


    And for training this software will be usefull:

    http://www.weitz.de/regex-coach/

    And also try to find this book:

    http://oreilly.com/catalog/9781565922570/


Advertisement