Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

regular expression to remove all non-printable characters

Options
  • 28-01-2013 5:16pm
    #1
    Moderators, Science, Health & Environment Moderators, Social & Fun Moderators, Society & Culture Moderators Posts: 60,092 Mod ✭✭✭✭


    I wish to remove all non-printable ascii characters from a string while retaining invisible ones. I thought this would work because whitespace, \n \r \b are invisible characters but not non-printable? Basically I am getting a byte array with � characters in it (\uFFFd) and I don't want them to be in it. So i am trying to convert it to a string, remove the � characters before using it as a byte array again.

    With the code below they are removed but so are any occurences of \r \n and \b. What would be the correct regex to retain these also? Or is there a better way that what I am doing?
    public void write(byte[] bytes, int offset, int count) 
    {
    
        try {
            String str = new String(bytes, "UTF-8");
            str2 = str.replaceAll("[^\\p{Print}\t\n]", "");
            GraphicsTerminalActivity.sendOverSerial(str2.getBytes("UTF-8"));
    
        } catch (UnsupportedEncodingException e) {
    
            e.printStackTrace();
        }
    
         return;
        }
    
    }
    

    Do I have to add some clause for ascii control characters?


Comments

  • Registered Users Posts: 1,931 ✭✭✭PrzemoF




  • Moderators, Science, Health & Environment Moderators, Social & Fun Moderators, Society & Culture Moderators Posts: 60,092 Mod ✭✭✭✭Tar.Aldarion


    Thanks, I tried that earlier and I get the exact same functionality as I do at the moment. Although I was trying replaceAll("\\p{C}", ""); due to ? listing all options in the terminal.


  • Moderators, Science, Health & Environment Moderators, Social & Fun Moderators, Society & Culture Moderators Posts: 60,092 Mod ✭✭✭✭Tar.Aldarion


    I tried [^\\x00-\\x7F] which is the range of ascii characters....but then the � symbols still get through, weird.


Advertisement