Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Comparing text files using grep/diff/whatever...

Options
  • 26-11-2009 6:04pm
    #1
    Registered Users Posts: 6,790 ✭✭✭


    Hello all, a bit of an easy exercise but anyway it has me stumped anyway I think... I need to compare 2 text files and output any lines that each file has in common - so if a line exists in file A and file B, it needs to end up in file C. What's the most elegant way to achieve this in Perl? I would have a preference for using Unix CLI tools like grep, diff etc.

    An example of what I'm talking about:

    File A
    banana
    orange
    apple
    pear
    

    File B
    mango
    banana
    blueberry
    pear
    

    output will consist of the lines that each file contains:
    banana
    pear
    

    The files are not likely to be very large (maybe 20-30 lines each max).
    I could write something that iterates through each line of File A, comparing it with each line of File B but if someone has done that already, then that would be great.


Comments

  • Registered Users Posts: 701 ✭✭✭fuse




  • Registered Users Posts: 3,721 ✭✭✭E39MSport


    I'd use one file as a search pattern file using sed

    # /usr/xpg4/bin/grep -f fileB fileA
    banana
    pear
    #


  • Closed Accounts Posts: 1,150 ✭✭✭Ross


    Should do the trick:
    grep -f fileA fileB > result
    

    Or does it have to use Perl?


  • Registered Users Posts: 304 ✭✭PhantomBeaker


    My own take in perl (in python, I'd just use sets :D ) would be to use hashes.

    Iterate over your first file, A, and read each line. Use that line as a key for your hash. Put anything in as the value, just so long as you have the key there.

    Then you can read over each line in B and check if the key is in your hash ( I'd use something like "if defined($myhash{$line})" ) and then print out your line.

    It'll partially preserve ordering in that you'll get matching lines as they occur in B, but if you were to rewrite A as:
    pear
    orange
    apple
    banana
    

    you'd still get a result of:
    banana
    pear
    

    This may matter in some applications that want lines that match in sequence (as in, if you want to match "pear" but not "banana" because in A, banana is after pear and in B it's before pear)

    Hope that helps (if you haven't already cracked it)

    Aoife


Advertisement