Grep type question

09-01-2010 4:45pm #1

I want to be able to select and copy multiple lines of code that lie within a unique CSS tag. So, if the source page looks like this

Blah
Blah
<div id="unique">
Line 1
Line 2
Line 3
</div>
blah
blah

Grep seems to fail me here as I can't select more then one line of code and I want to select lines 1-3. I'm guessing this is simple enough but can't find much on the internet. Anyone have any ideas? Thanks!

09-01-2010 5:49pm

jQuery would be perfect for this - it has selectors to target the relevant div and it's then easy to pull out the content. You'll find all you need on the jQuery site - a line of code will do it.

09-01-2010 5:53pm

Cheers - I'd rather not use javascript, I was hoping to get this done as a shell script.

Jonathan · 09-01-2010 5:54pm

Regular Expressions

Skrynesaver · 09-01-2010 7:31pm

Something like the following should do it, WARNING untested

perl -e 'while (<>){$found=1 if (/div id="unique"/);$found = 0 if (($found==1) && (/<\/div/));print if $found;}'  $FILENAME

daymobrew · 09-01-2010 8:52pm

Or look at the 3 dot perl range operator. Again, this is untested.

perl -e 'while (<>){print if /div id="unique"/ ... /<\/div/;}'  $FILENAME

There is a 2 dot version too:

perl -e 'while (<>){print if /div id="unique"/ .. /<\/div/;}'  $FILENAME

IIRC one will print the div lines, the other won't.

Edit: From a quick experiment this morning, both code snippets print the div lines. I might be doing something wrong.

<div id="unique">
Line 1
Line 2
Line 3
</div>

daymobrew · 10-01-2010 11:12pm

I haven't been able to get my code snippet working.
The range operator docs say that the operator returns values that could be useful but I couldn't figure out how to access this returned value.

The value returned is either the empty string for false, or a sequence number (beginning with 1) for true. The sequence number is reset for each range encountered. The final sequence number in a range has the string "E0" appended to it, which doesn't affect its numeric value, but gives you something to search for if you want to exclude the endpoint. You can exclude the beginning point by waiting for the sequence number to be greater than 1.

This is getting closer:

perl -e 'while (<>){print if ((/div id="unique"/ ... /<\/div/) > 1)}' $FILENAME

This returns:

Line 1
Line 2
Line 3
</div>

daymobrew · 11-01-2010 11:05am

I think I finally got it (I just couldn't let this one go):

~> perl -e 'while (<>){ $range = (/div id="unique"/ ... /<\/div/); print if ($range > 1 && $range !~ /E0/)}' $FILENAME

It stores the result of the ranger operator and then checks it. The ranger operator returns '5E0' for the last matching line (one with '</div>').

Grep type question

Comments