Regular Expression Help

warrenaldo · 20-11-2008 10:14AM #1

I am using a language that rquires regular expressions. Its all new to me.

What I want is to match anything after the www.example.com.

So for instance I want to match just "/" and "/search/"

But i dont know how to match both urls in the one expression. Any help?

This is what i have so far

/?[search]?/?

Ill explain my thinking. everything is optional because it may have nothing. I want that to match. It may just have a /. It may have /search or may have /search/. I am not sure I made the seach bit optional by putting it in a character class.

Please help

slavigo · 20-11-2008 01:28PM

Hi,
You seem to have tried so I presume it's ok to write the following.
At first it can be hard to get your head around the options and what they can do.
They can be very powerful when used in the correct place.
They can also have high processing times associated with them but in an instance like this you wouldn't worry about something like that.

Note: All my examples below are written in python RE syntax so you may have to change it to match the RE syntax for the language your using.

Here is the easier of several options.

(?<=com)(?:/search/|/search|/)(?=$)

The central portion is where the matching is done:
/search/|/search|/

This will match '/search/' or '/search' or '/'
The main thing to take is that we are matching only one. The '|' is the OR operator. This OR This OR This

This is all you need for your example "www.example.com" but if you are processing full URLs your also likely to have a 'http://' at the start of it and the '/'s from this are also going to get matched.

To counter that we add a condition before the string we are matching that says "only match if it is preceded by this string"
That's what this does,
(?<=com)

Only match if preceded by the string 'com'

The trailing portion,
(?=$)

Says that it must be followed by the end of the string.
'$' denotes the end of the string (or line in some cases). This is only good to you if you are running your RE across a URL and that's all. If you are pulling it out of a body of text. You won't be able to search for the end of the string.

The '(?:' and ending ')' around the strings we are matching just groups the three options together, without marking them as a group in the results.
Have a read about RE groups for more.
This will apply the pre and post conditions to which ever of the central strings gets matched.

Hope the above helped.

If you fancy learning more, I know it might not be the language your using but you can still read up on the possibilities (in python anyway) at
python RE module

I use a handy wee python application called 'kiki' which allows you to test your regular expressions on bodies of text when developing.

Also, just for fun (because everyone knows regular expressions are fun

) here's another way of doing it. The best way to learn regular expressions is to practice. Also, trying to understand examples.

(?<=com)/(?=search|$)(?:search)?(?:(?<=search)/)?

A '/' preceded by 'com' and followed by 'search' or end of string.
Followed by 0 or 1 instances of 'search'.
Followed by 0 or 1 instances of '/' if it's preceded by 'search'

Matching the single '/' for this example depends on what your searching.
For this regular expression it must be the end of the string.
(Or a new line if you've marked it as a multiline search. See module)

I hope this wasn't too long winded.
If it is just say and I'll start a little lower.

Enjoy.

rtmie · 20-11-2008 09:46PM

As someone who still struggles to remember regex syntax, I fall back on :

txt2regex

R

Deserved · 21-11-2008 11:20AM

My sokution is:

[w]{3}\.[\w\-\.]+\.[\w]{2,5}(([\/])([\w\-]+[\/]{0,1}))

From link:
http://www.blabla.com/big/

It will select:
/big/
big/
/

If you want to select everything after / than you expression have to look like:

[w]{3}\.[\w\-\.]+\.[\w]{2,5}[\/]([\w\-\/]+)

And for training this software will be usefull:

http://www.weitz.de/regex-coach/

And also try to find this book:

http://oreilly.com/catalog/9781565922570/

Regular Expression Help

Comments