Help needed with preg_match regex

smcelhinney · 22-07-2011 12:36pm #1

Hi guys

I have this line in a text file

X Y ZZZ 444444 <a name="6">developer name<a>

And I need to extract certain values from it. So far I have this

$developer_details = preg_split("/[\s,]+/", $line);

which will make an array, so the first 4 elements are populated fine (based on creating array items of values separated by space or comma). However, I need to take the 5th element as a whole string. But there might be more elements in between 4 and 5. So the pseudo code would be

Extract all elements separated by a space or comma as separate array items, except if you encounter a HTML tag, in which case, take the string as a whole (match open and closing tags).

The one consolation is that I know the HTML is well formed.

Any ideas?

tehjimmeh · 22-07-2011 7:14pm

Would this not work? (note: I don't know PHP very well at all)

$developer_details = preg_split("/[\s,]+/", $line);
for($i=5; $i < count($developer_details); $i++)
   $developer_details[4] .= " ".$developer_details[$i];

EDIT: Actually I think it'll only work if you can guarantee there'll be no commas in the 5th item.

Inspector Gadget · 23-07-2011 8:07pm

If the format of the file you're reading is fixed, and looks like what you've got there, then maybe preg_split() is the wrong function. You could write a regex that matches the whole line at once (i.e. matches each desired item, assuming there'll always be five items), or perhaps take every match whose index is greater than 3 (0..3 should be your first four terms) and implode() them together?

There are a lot of ways of skinning this particular cat, but it's possible that you haven't provided enough examples of what you're parsing?

CuLT · 25-07-2011 11:12am

If you're trying to parse free format HTML, regex alone won't do the trick, you'll need a HTML parser.

If you know it's always going to be some text followed by a single "a" element containing everything, then it's straight forward and can be broken into two simple expressions:
[php]
<?php
$string = 'X Y ZZZ 444444 <a name="6">developer name</a>';

/* Breaks the string into two components, everything before the a tag and everything after */
preg_match('/(^[^<a]+)([<a].*)$/', $string, $matches);

/* Splits the first components on space or comma */
$split = preg_split('/[\s,]/', trim($matches[1]));

var_export($matches);
var_export($split);
?>
[/php]

Help needed with preg_match regex

Comments