Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Japanese encoding in Perl

Options
  • 21-04-2006 4:21pm
    #1
    Registered Users Posts: 90 ✭✭


    Hi, it's my first time dealing with a foreign character set in Perl and I'm having a few problems.

    For the moment I just want to read in a file which is encoded in Unicode and output each line to another file (later I want to deal with regexs, but I wanted to start off simple). When I open the new output file it doesn't match the original. Could someone please tell me where I'm going wrong? Cheers + here's the code I've written so far.

    #!/usr/bin/perl

    $inFile=$ARGV[0];
    $outFile=$ARGV[1];

    open (IN, $inFile) || die "cannot open file: $inFile\n";
    chomp(@lines = <IN>);
    close (IN);

    open (OUT, "$outFile") || die "cannot write to file: $outFile\n";
    for($i = 0; $i <= $#lines; $i++) {
    print OUT "$lines[$i]\n";
    }


Comments

  • Registered Users Posts: 6,508 ✭✭✭daymobrew


    What if you do it without using chomp?
    #!/usr/bin/perl
    
    $inFile=$ARGV[0];
    $outFile=$ARGV[1];
    
    open (IN, $inFile) || die "cannot open file: $inFile\n";
    open (OUT, "$outFile") || die "cannot write to file: $outFile\n";
    while ( <IN> )
    {
      print OUT;
    }
    close (IN);
    close (OUT);
    

    Don't forget to close OUT filehandle (I realise that it will be closed when the script ends).


  • Registered Users Posts: 90 ✭✭Alligator Wine


    It doens't make any difference if I use chomp or not. Same with closing the output file.

    The problem is with character encoding and what perl interprets as a character. I'm just wondering how people normally handle foreign character sets such as Japanese.


  • Registered Users Posts: 6,508 ✭✭✭daymobrew


    I don't have to deal with non-English data so I don't know the right way.
    Maybe the perllocale page (perldoc perllocale) might help. It can tell perl to assume a different locale when working on data.
    Also look at:
    perldoc perluniintro (Perl Unicode introduction)
    perldoc perlunicode (Unicode support in perl)

    Google Groups is a good place to look too.


  • Closed Accounts Posts: 146 ✭✭MrScruff


    Your code works for me using a UTF-8 Japanese XML file.
    (
    with the addition of a ">" i.e
    open (OUT, ">$outFile") || die "cannot write to file: $outFile\n";
    
    )

    What encoding is your input file using?


Advertisement