Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

ASp.net Regex and Fada characters

Options
  • 02-03-2010 4:46pm
    #1
    Registered Users Posts: 2,791 ✭✭✭


    Hi,

    Anyone know how to get a Regex validator to accept fada characters? I have to modify an existing email address regex to accept them but I've no idea.

    [HTML](?i)^([a-zA-Z][\w\.-]*[a-zA-Z0-9]@mycompanydomain\.ie)$[/HTML]

    Thanks,
    John


Comments

  • Registered Users Posts: 9,579 ✭✭✭Webmonkey


    I've never done it before but I presume you need Unicode. I doubt Unicode characters can be saved in source code so you may need to pass the hexadecimal value of it.

    What sort of email address has fadas anyways. It's bound to cause problems with systems.


  • Registered Users Posts: 2,781 ✭✭✭amen


    you need to read http://tools.ietf.org/html/rfc5322 but I don't think a fada is a valid chacter for an email address.

    Why would you want to do anyway ?

    If the user insists then show them the above rfc and tell them not supported.


  • Registered Users Posts: 515 ✭✭✭NeverSayDie


    As per the other posts, you probably shouldn't be getting fadas in email addresses.

    As a general note on this kind of thing, the ASP.NET regex validator should have full Unicode support (as does the rest of the Framework generally), and far as I recall, the \w catchall is mapped onto various Unicode ranges for words as well as the A-Za-z range. This will cover characters with fadas and a wide range of other regular normal letters in different alphabets.

    Where you might run into problems is with the clientside (Javascript) aspect of the validator, which is enabled by default. This seems to have trouble with Unicode, probably depending on the browser and the OS - you may need to tweak your regexes to suit, or just disable the client side validation for the particular validator and let it fall back on the serverside one.


  • Registered Users Posts: 4,277 ✭✭✭km991148


    I would agree with having fadas in emails.. prob not a good idea as loads of systems out there wont handle them but..
    Another thing with the serverside stuff is that with regexes (esp for emails where there are many differing matching regexes) is that they should prob be configurable, so you may get away with putting something like:

    <appSettings>
    <add key ="myRegex" value ="ú"/>
    </appSettings>

    in the config and then matching as:

    string fadaInput = "fada char ú";
    Regex reg = new Regex(System.Configuration.ConfigurationSettings.AppSettings["myRegex"].ToString());
    reg.Match(fadaInput);

    for example;

    (obbiously you need to place the fada stuff in beside the [a-zA-Z] part of the regex, but the above works ok)


  • Registered Users Posts: 2,791 ✭✭✭John_Mc


    As per the other posts, you probably shouldn't be getting fadas in email addresses.

    As a general note on this kind of thing, the ASP.NET regex validator should have full Unicode support (as does the rest of the Framework generally), and far as I recall, the \w catchall is mapped onto various Unicode ranges for words as well as the A-Za-z range. This will cover characters with fadas and a wide range of other regular normal letters in different alphabets.

    Where you might run into problems is with the clientside (Javascript) aspect of the validator, which is enabled by default. This seems to have trouble with Unicode, probably depending on the browser and the OS - you may need to tweak your regexes to suit, or just disable the client side validation for the particular validator and let it fall back on the serverside one.

    Thanks very much, will try validating on the server side and see if it works.
    km991148 wrote: »
    I would agree with having fadas in emails.. prob not a good idea as loads of systems out there wont handle them but..
    Another thing with the serverside stuff is that with regexes (esp for emails where there are many differing matching regexes) is that they should prob be configurable, so you may get away with putting something like:

    <appSettings>
    <add key ="myRegex" value ="ú"/>
    </appSettings>

    in the config and then matching as:

    string fadaInput = "fada char ú";
    Regex reg = new Regex(System.Configuration.ConfigurationSettings.AppSettings["myRegex"].ToString());
    reg.Match(fadaInput);

    for example;

    (obbiously you need to place the fada stuff in beside the [a-zA-Z] part of the regex, but the above works ok)

    Thanks for your reply. Yeah the regex and corresponding error message is stored in an XML config file for the customer, so its configurable.

    I dont think we should be allowing fada's in the email addresses either, but I'm being told it is a requirement :rolleyes:


  • Advertisement
  • Registered Users Posts: 9,579 ✭✭✭Webmonkey


    John_Mc wrote: »
    Thanks very much, will try validating on the server side and see if it works.



    Thanks for your reply. Yeah the regex and corresponding error message is stored in an XML config file for the customer, so its configurable.

    I dont think we should be allowing fada's in the email addresses either, but I'm being told it is a requirement :rolleyes:
    I'd argue with them to be honest. Tell them it should not be a requirement. It *will* cause problems. Link them to the RFC document.


  • Registered Users Posts: 515 ✭✭✭NeverSayDie


    John_Mc wrote: »
    Thanks very much, will try validating on the server side and see if it works.

    Well, you already are validating on the server side, it just may not be getting that far because the client-side part is preventing the postback :)

    There's an attribute on most Validators you can use to disable the client-side part (which is enabled by default), off the top of my head I think it's "EnableClientSideValidation" or some such, just set that to false. You don't have to do that everywhere btw (it's best to leave it on as a convenience for the users, and to help keep pointless postbacks down), just on this particular Validator.

    And re the spec, I'd agree with Webmonkey, an email address with a fada character in it is of little use to anyone - you probably won't be able to send email to it and you probably won't be able to get email from it, so there's really no point in storing it :) (that's with fadas in the local part - the bit to the left of the "@"; the domain may be more flexible, if I recall recent changes right)


  • Registered Users Posts: 2,426 ✭✭✭ressem


    It's not straightforward.

    For the domain name, if you try a number of German domains, you will find that a number allow you to add umlauts and the like. And these work in firefox etc.
    The characters allowed depend on the top level domain.
    See some samples lists at
    http://de.wikipedia.org/wiki/Internationalizing_Domain_Names_in_Applications

    Your application can use a Punycode to ascii translator, which transforms the domain into what is often an unpronouncable ascii muddle which can be registered and safely used.

    RFC 3490, RFC 3491 and RFC 3492 (contains a sample implementation).

    http://www.microsoft.com/downloads/details.aspx?familyid=AD6158D7-DDBA-416A-9109-07607425A815&displaylang=en
    Contains a dll with microsoft's helper functions.

    ---

    There is a experimental proposal for UTF-8 support in email addresses, especially the mailbox element, from 2008.
    http://tools.ietf.org/html/rfc5336
    But I don't yet know of any mainstream clients that support it.


    Even sendmail and postfix need to be patched to add the UTF8SMTP extension (check your server to see whether it responds UTF8SMTP when queried with EHLO). Exchange 2010 does not support UTF8SMTP. And there is not a fallback yet defined (eg an ALT-ADDRESS mail header).

    So anyone that uses a mailbox with UTF8 characters should expect sending and receiving mail to fail almost always for the next ?5-10? years. As should your application.

    Support for other characters in the Display part of the email address is much easier. And it usually keeps marketing people happy as they hardly see the actual address in Outlook.

    ---

    So how can you proceed?
    Make up a comprehensive list of all the points after a mail leaves your program that will likely fail, and draw up very clear warning and error dialogs which can be displayed when DNS and SMTP attempts fail with no fallback?

    If this application is on a hosted site then you can point out that the mail will fail to leave the system for a few years yet.
    And you've to create your code with the knowledge that UTF-8 strings are coming down the road this decade.


Advertisement