Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Any regular expression experts?

Options
  • 20-08-2007 4:36pm
    #1
    Moderators, Science, Health & Environment Moderators Posts: 8,950 Mod ✭✭✭✭


    I use a thing called the widgEditor for html editing in a browser mainly because it's unobtrusive and degrades nicely. Anyway I decided to allow tables to be pasted into the thing. The code in the widgEditor automatically cleans pasted word or excel stuff and does a nice job. It removes tables though so I have stopped it doing this. One bit:-
    [SIZE=2]theHTML = theHTML.replace(/(<[^\/]>|<[^\/][^>]*[^\/]>)\s*<\/[^>]*>/g, [/SIZE][SIZE=2][COLOR=#800000]""[/COLOR][/SIZE][SIZE=2]);[/SIZE]
    [SIZE=2]
    
    [/SIZE]

    strips out all empty tags. Now I need to strip all empty tags except empty td tags as this would be allowed. Any regex experts know what to change here?


Comments

  • Closed Accounts Posts: 4,943 ✭✭✭Mutant_Fruit


    This might help: http://www.regular-expressions.info/conditional.html

    Basically what you want to do is something like:
    if(tag is not td) then (remove tag if it's empty)

    I'd attempt to write it, but the actual regex syntax changes between languages, so i'd be wasting my time ;) Plus, it'd take me a while. If you can't get it figured out, i'll write it up for ya later, and you should be able to translate it then (if it needs translating).


  • Registered Users Posts: 2,931 ✭✭✭Ginger


    I find this very handy

    http://tools.osherove.com/CoolTools/Regulazy/tabid/182/Default.aspx

    Use it to verify any regexes I need to build


  • Subscribers Posts: 4,076 ✭✭✭IRLConor


    Try:
    theHTML = theHTML.replace(/(?:<(?!td)[^>\/]*>\s*<\/(?!td)[^>\/]*>|<(?!td)[^>\/]*\s*\/>)/g, "");
    

    I used the following code to test it:
    #!/usr/bin/perl
    
    my $sample = "<tr id=\"foo\"><td><i>foo</i><b id=\"bar\"></b><b /></td><td /><td id=\"baz\"></td></tr><tr id=\"quux\" /><tr></tr>\n";
    
    print $sample;
    $sample =~ s/(?:<(?!td)[^>\/]*>\s*<\/(?!td)[^>\/]*>|<(?!td)[^>\/]*\s*\/>)//g;
    print $sample;
    

    Edit: I know that Perl's regex dialect is different to Javascript's (which is what I'm assuming you're using) but I think I've only used features that are in the JS dialect.


  • Moderators, Science, Health & Environment Moderators Posts: 8,950 Mod ✭✭✭✭mewso


    Thanks Conor. Works a charm. Was about to sit down and try my hand at this myself but I've always hated regular expressions. Thanks again. Thanks for the links guys. Next time I have a regex problem I promise I'll try it myself :)


  • Subscribers Posts: 4,076 ✭✭✭IRLConor


    No problem. Glad to be of help.

    Mastering Regular Expressions by Jeffrey Friedl is your friend if you need to use regular expressions a lot. Well worth the investment.


  • Advertisement
Advertisement