Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Perl text parsing

Options
  • 07-06-2012 1:59pm
    #1
    Registered Users Posts: 43


    I'm new to perl scripting so I was wondering if anyone could help with this problem I have. I have a file separated by tabs and its set up like this:
    q 1
    q 3
    b 3
    b 4
    b 4
    a 1
    a 3
    a 3
    I want to add up each value on column 2 correspond to each letter on column 1 e.g. q should equal 4.

    Thanks alot for any help


Comments

  • Registered Users Posts: 962 ✭✭✭darjeeling


    Here's a script I wrote a while ago to do this job. There are probably lots of other ways to do this, but this should work:
    #!/usr/bin/perl -w
    
    # get_totals.pl
    # reads in a set of identifiers with replicates and corresponding scores,
    # computes the total for each replicate and outputs
    
    # enforce strict pragma for variable declaration etc (good practice):
    use strict;
    
    # read iinput file name from command line args, 
    # or use STDIN if none is provided:
    my $infile=shift;
    if (! $infile){$infile = '-';}
    open(INFILE, "$infile") || die "opening $infile: $!"; $_="1";
    
    # variables:
    my $item;                  # name of current item
    my $value;                 # value of item from current line
    my %item_totals;           # hash of totalised values for each item
    
    # read through input, one line at a time, 
    # adding values to running totals for each item:
    while(<INFILE>) {
      chomp;
      ($item,$value) = split(/\t/, $_);
      if ( exists ( $item_totals{$item} ) ){
        $item_totals{$item} += $value;
      }
      else {
        $item_totals{$item} = $value;
      }
    }
    close (INFILE);
    
    # print out the totals:
    print STDOUT "Item\tScore\n";
    for $item ( sort {$a cmp $b} keys %item_totals){
      print STDOUT "$item\t$item_totals{$item}\n";
    }
    
    


  • Registered Users Posts: 43 Rhavin


    Thanks very much, works perfectly!


  • Registered Users Posts: 1,109 ✭✭✭Skrynesaver


    or more concisely
    perl -e 'while(<>){$value{$1}+=$2 if (/\w+\s+\d+/);}END{for (sort keys %value){print "$_\t$value{$_}\n";}}' <FILENAME>
    


  • Registered Users Posts: 962 ✭✭✭darjeeling


    or more concisely
    perl -e 'while(<>){$value{$1}+=$2 if (/\w+\s+\d+/);}END{for (sort keys %value){print "$_\t$value{$_}\n";}}' <FILENAME>
    

    Thanks. I added a couple of pairs of brackets in the regexp to get your one-liner to work for me:
    perl -e 'while(<>){$value{$1}+=$2 if (/^[COLOR=Red]([/COLOR]\w+[COLOR=Red])[/COLOR]\s+[COLOR=Red]([/COLOR]\d+[COLOR=Red])[/COLOR]/);}END{for  (sort keys %value){print "$_\t$value{$_}\n";}}' <FILENAME>
    


  • Registered Users Posts: 43 Rhavin


    Why would the second answer round the numbers while the first one gave the whole number?


  • Advertisement
  • Registered Users Posts: 962 ✭✭✭darjeeling


    Rhavin wrote: »
    Why would the second answer round the numbers while the first one gave the whole number?

    It's down to the regular expression used:

    Matches one or more word characters at the start of a line, followed by one or more space characters, followed by one or more digits:
    /^(\w+)\s+(\d+)/

    This additionally allows for an optional decimal point and optional following digits:
    /^(\w+)\s+(\d+\.?\d*)/

    The bits of text matching the terms in brackets (i.e. '\w+' and '\d+\.?\d*') are stored in the match variables $1 and $2, which are then used to populate skrynesaver's hash %value

    Here's the revised code:
    perl -e 'while(<>){$value{$1}+=$2 if (/^(\w+)\s+(\d+\.?\d*)/);}END{for  (sort keys %value){print "$_\t$value{$_}\n";}}' <FILENAME>
    


  • Registered Users Posts: 43 Rhavin


    Thanks for all the help! It made work today alot easier:D


  • Registered Users Posts: 1,414 ✭✭✭Fluffy88


    It's not Perl if it can't be wrote in one line :P


Advertisement