Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Getting a Printwriter to accept UTF8?

Options
  • 14-09-2007 6:57pm
    #1
    Registered Users Posts: 21,264 ✭✭✭✭


    Ok this is driving me nuts.

    I have a method that returns a printwriter (lets call the variable pita). I have no control over this method.

    However the data coming back through it is UTF-8, but that printwriter doesn't reconise it (so won't format it).

    So normally I would do something like.
    PrintWriter pw = new PrintWriter(new OutputStreamWriter(xx.getOutputStream(), "UTF8"), true);
    

    So basically what I want to do is take pita convert that output into an OutputStream so that I can pass it to a new printwriter.

    Anyone any clue on how to do this?


Comments

  • Closed Accounts Posts: 7 turncorner


    Read the OP last night, but couldn't make much sense of it. Occurred to me later what might be going on.

    The main problem you are having is a PrintWriter doesn't "accept UTF-8". It accepts Strings, which are Java objects, basically independant of encoding. When you want to write the string to disk, or across a network, or turn it into a stream of bytes for any reason, you need to pick an encoding. When you pass a String to a PrintWriter method like println, the writer converts the String to a sequence of bytes using whatever encoding is the default for your platform. The resulting bytes might be different if you run it on a Windows box, a Mac or a Unix machine.

    If you are getting an OutputStream back from some other method and you want to write Strings, and you want them converted to byte using UTF-8 encoding, you won't be able to use a PrintWriter. You can use a PrintStream though, which allows you to choose whatever encoding you want.
    os = xx.getOutputStream();
    PrintStream ps = new PrintStream(os, true, "UTF8");
    ps.println("Hello, world.");
    

    If I interpretted you question correctly, that should help.


  • Registered Users Posts: 21,264 ✭✭✭✭Hobbes


    Nope.

    What is wrong is I have a Class that returns a printWriter to the data I want to get at.

    However the data coming back is UTF-8, but the class hasn't told the printwriter it is sending back it is UTF-8 so its all a mess.

    What you describe is what I described above.


  • Closed Accounts Posts: 7 turncorner


    Okay I obviously picked that up wrong then.

    You say you are calling a method that "returns a printWriter to the data I want to get at". The problem here is that you can't get data from a writer, you get data from a reader. If the method is returning a PrintWriter to you it is inviting you to write data to it, not read data from it.

    Unless...

    Do you have a block of code that gets the PrintWriter, you send data to it, and you look at some other bit of your system where the PrintWriter is writing to? In that case, you are at the mercy of whatever code created the PrintWriter for you. If it is a PrintWriter and not a PrintStream, then it is going to be using the "default character encoding" according to the Javadocs. That's just the way the SDK works. The default encoding on my Windows box is 'Cp1252', as reported by the system property 'file.encoding'. There isn't any way of changing the encoding the PrintWriter is using after it has been created. It is possible (but I haven't tried it) that if you set 'file.encoding' to UTF-8 that might do it for you. For example:
    java -Dfile.encoding=UTF8 MyApp
    

    or set it in the program code with
    // have to do this before the PrintWriter is created
    System.setProperty("file.encoding", "UTF8");
    

    Be warned though, that this change could have very widespread consequences (that's global variables for you) so it might not be advisable.

    It might be that if you step back and look at your design, you might find that UTF8 encoding the output isn't appropriate. By the sounds of it, whoever wrote the class you are using (that passes you back a PrintWriter) thought that it wasn't. They might be wrong, but even if they were, they've left you in a position of having to fight the tool that you are using, which is never an enviable position to be in.

    Here's a final crazy workround, which is messy, but you might consider. Instead of outputting normal strings as UTF8, use some escaping scheme. For example use org.apache.commons.lang.StringEscapeUtils.escapeXML to turn tricky characters like umlauts and crazy mathematical symbols into XML unicode escapes. Do this before you pass the strings to the PrintWriter. Then the output will not be raw text, but you can retrieve the original strings faithfully using the corresponding un-escape method (org.apache.commons.lang.StringEscapeUtils.unescapeXML). Now, that might be completely useless to you (maybe you can't post-process the output that comes from the PrintWriter), but its just a thought.


Advertisement