Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Help needed with preg_match regex

Options
  • 22-07-2011 12:36pm
    #1
    Registered Users Posts: 1,127 ✭✭✭


    Hi guys

    I have this line in a text file
    X Y ZZZ 444444 <a name="6">developer name<a>
    

    And I need to extract certain values from it. So far I have this
    $developer_details = preg_split("/[\s,]+/", $line);
    

    which will make an array, so the first 4 elements are populated fine (based on creating array items of values separated by space or comma). However, I need to take the 5th element as a whole string. But there might be more elements in between 4 and 5. So the pseudo code would be

    Extract all elements separated by a space or comma as separate array items, except if you encounter a HTML tag, in which case, take the string as a whole (match open and closing tags).


    The one consolation is that I know the HTML is well formed.

    Any ideas?
    Tagged:


Comments

  • Registered Users Posts: 89 ✭✭tehjimmeh


    Would this not work? (note: I don't know PHP very well at all)
    $developer_details = preg_split("/[\s,]+/", $line);
    for($i=5; $i < count($developer_details); $i++)
       $developer_details[4] .= " ".$developer_details[$i];
    

    EDIT: Actually I think it'll only work if you can guarantee there'll be no commas in the 5th item.


  • Registered Users Posts: 1,393 ✭✭✭Inspector Gadget


    If the format of the file you're reading is fixed, and looks like what you've got there, then maybe preg_split() is the wrong function. You could write a regex that matches the whole line at once (i.e. matches each desired item, assuming there'll always be five items), or perhaps take every match whose index is greater than 3 (0..3 should be your first four terms) and implode() them together?

    There are a lot of ways of skinning this particular cat, but it's possible that you haven't provided enough examples of what you're parsing?


  • Subscribers Posts: 9,716 ✭✭✭CuLT


    If you're trying to parse free format HTML, regex alone won't do the trick, you'll need a HTML parser.

    If you know it's always going to be some text followed by a single "a" element containing everything, then it's straight forward and can be broken into two simple expressions:
    [php]
    <?php
    $string = 'X Y ZZZ 444444 <a name="6">developer name</a>';

    /* Breaks the string into two components, everything before the a tag and everything after */
    preg_match('/(^[^<a]+)([<a].*)$/', $string, $matches);

    /* Splits the first components on space or comma */
    $split = preg_split('/[\s,]/', trim($matches[1]));

    var_export($matches);
    var_export($split);
    ?>
    [/php]


Advertisement