Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Stripping data submitted by users of elements

Options
  • 18-10-2008 7:40pm
    #1
    Closed Accounts Posts: 2,300 ✭✭✭


    I'm doing some work on a java/jsp site.

    Users will submit reviews which'll be posted straight up so I need to strip < > elements out. Is that, within reason, all I need to do?

    SQL injection is a whole other kettle of fish AFAIK (Isn't it?) - I'm not too worried about that tho. Passwords are encrypted for starters., nothing very sensitive stored, etc


Comments

  • Registered Users Posts: 569 ✭✭✭none


    Check here (http://www.comp.lancs.ac.uk/computing/research/cseg/projects/ariadne/ihe/web/chars.html and http://www.w3.org/TR/WD-html40-970708/sgml/entities.html) for the reserved chars but in most cases they won't break up the HTML layout even if you don't encode them. What you really need to worry about is your Java and JavaScript strings and, as you said yourself, SQL backend as they all are much more sensible for the reserved chars than HTML.


  • Closed Accounts Posts: 2,300 ✭✭✭nice1franko


    All I really want them to be able to enter is plain text. So no html or script elements at all.

    The reviews will only be about 300 characters, max.
    String str = "<tr align='center'><td>This product is great..<br style='line-height:20px;'></td></tr><script>alert('im a hacker me')</script>:D";
    
    str = str.replaceAll("\\<.*?\\>", "");
    
    System.out.println(str);
    

    outputs:
    This product is great..alert('im a hacker me'):D
    

    Is that good enough do ya reckon?


  • Closed Accounts Posts: 2,300 ✭✭✭nice1franko


    or possibly this one :
    str.replaceAll("</?\\w++[^>]*+>", "")
    


  • Registered Users Posts: 569 ✭✭✭none


    I thought your question was where their input may cause problems. Obviously, for the end user it is almost always only plain text but for the computer it may well be a bit of an issue. This is what you have to watch out for. For Java and JavaScript it's most of the time quote, slash and CR/LF chars but your main concern may be your SQL backend which can have other restrictions.


Advertisement