Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie
Hi all! We have been experiencing an issue on site where threads have been missing the latest postings. The platform host Vanilla are working on this issue. A workaround that has been used by some is to navigate back from 1 to 10+ pages to re-sync the thread and this will then show the latest posts. Thanks, Mike.
Hi there,
There is an issue with role permissions that is being worked on at the moment.
If you are having trouble with access or permissions on regional forums please post here to get access: https://www.boards.ie/discussion/2058365403/you-do-not-have-permission-for-that#latest

Unicode compliance

  • 01-02-2003 9:25pm
    #1
    Registered Users, Registered Users 2 Posts: 1,853 ✭✭✭


    Xian told me that he had sent something in Unicode to the boards but it did not get through. As I am Irish national representative to ISO/IEC JTC1/SC2/WG2, and a member of the Unicode book committee, I thought I had better check. I'm using Safari under OS X. Some tests:

    The Irish alphabet
    Aa Áá Bb ?? Cc ?? Dd ?? Ee Éé Ff ?? Gg ?? Hh Ii Íí Jj Kk Ll Mm ?? Nn Oo Óó Pp ?? Qq Rr Ss ?? Tt ?? Uu Úú Vv Ww Xx Yy Zz & ? (the 7-shaped agus)

    A little Icelandic: Það var enn þau skiptaði tungu á Englalondi. ('That was before they changed language in England' a reference to Icelander's recognition that Middle English was very different from Old English)

    A little Russian: ??? ????? ?????????? ?? ???????? ('My nipples explode with delight)

    Some Greek: ? ???????? ????? ???? ??? ??????.

    And of course some Sanskrit. ??? ??? ??? ???? ??
    If you are using a crap browser like Internet Explorer you haven't a hope in hell of seeing most of this correctly without ????? throughout. On OS X Safari and OmniWeb do a fine job of handling Unicode.

    Now, let's press send and see what happens.


Comments

  • Registered Users, Registered Users 2 Posts: 1,853 ✭✭✭Yoda


    Everything not in Latin-1 displays as a ? in Safari, even though it displayed correctly while I was composing the message. That means that Icelandic Þþ and Ðð make it safely, but it means that something is corrupting the Unicode. This Sucks.


  • Closed Accounts Posts: 931 ✭✭✭ozpass


    I have the same problem getting the phrase

    How Now Brown Cow

    to display correctly on www.boards.tw


  • Registered Users, Registered Users 2 Posts: 1,853 ✭✭✭Yoda


    Perhaps you don't know what Unicode is, Ozpass.

    This is a Boards issue. Consider http://www.boards.ie/vbulletin/showthread.php?s=&threadid=76419 which can't be read because there is no character set information available. If Boards were Unicode-compliant, there wouldn't be a need for that.


  • Closed Accounts Posts: 5,564 ✭✭✭Typedef


    Never use eight bits to display a character when sixteen bits will do.

    or was that the other way around?


  • Registered Users, Registered Users 2 Posts: 35,524 ✭✭✭✭Gordon


    How would boards go unicode compliant Yoda?


  • Advertisement
  • Registered Users, Registered Users 2 Posts: 1,853 ✭✭✭Yoda


    No, it isn't like that. The web is recommended to use UTF-8.


  • Registered Users, Registered Users 2 Posts: 1,853 ✭✭✭Yoda


    Originally posted by Gordon
    How would boards go unicode compliant Yoda?

    Well I'm not sure. I'm an expert in writing systems and spend my time adding characters and scripts to the standard. (You've seen evertype.com haven't you?) I'm not a programmer. The thing I would do though is get in touch with whoever supplies the boards' software and ask them about it.


  • Registered Users, Registered Users 2 Posts: 2,281 ✭✭✭DeadBankClerk


    I vote for Yoda to translate boards.ie into Strict XHTML 1.0 with CSS level 2 compliancy:

    That means losing the silly internet explorer coloured bars :P
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
        <head>
            <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
        </head>
    </html>
    


  • Banned (with Prison Access) Posts: 16,659 ✭✭✭✭dahamsta


    There's a lot of discussion about Unicode on their forums:

    http://www.vbulletin.com/forum/search.php?searchid=11433

    I don't know enough about Unicode to decide which are relevant.

    adam


  • Registered Users, Registered Users 2 Posts: 1,853 ✭✭✭Yoda


    Originally posted by DeadBankClerk
    I vote for Yoda to translate boards.ie into Strict XHTML 1.0 with CSS level 2 compliancy:
    Why do people always have to be snotty?

    Unicode is important, whether you think so or not, and it is in the nation's interest that we not lag behind in implementing it. As I pointed out, even on the boards there are users who wish to communicate in Japanese, and they cannot do so.
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
    </head>

    Replace with charset=UTF-8


  • Advertisement
  • Closed Accounts Posts: 6,275 ✭✭✭Shinji


    As I pointed out, even on the boards there are users who wish to communicate in Japanese, and they cannot do so.

    Really? I can read that post perfectly happily.

    Perhaps your Macintosh wonder-browser doesn't understand Shift-JIS? :)


  • Registered Users, Registered Users 2 Posts: 2,281 ✭✭✭DeadBankClerk


    Originally posted by Yoda
    Why do people always have to be snotty?
    Because the feature that you have requested is a huge one to implement?


  • Closed Accounts Posts: 9,314 ✭✭✭Talliesin


    The problem isn't that we are using ISO-8859-1 in the encoding, it's that we aren't using it correctly. ISO-8859-1 can be used to transmit the entire UCS because this is HTML and hence we can use entity encodings.

    The problem is we aren't using it correctly. This isn't just at the level of what gets sent, but also what gets read and what gets stored.

    It's worse than you think, it isn't even correct ISO-8859-1! (That's why the XML feed we're experimenting with b0rks some times).

    Maybe UTF-8 is the best way to go, though I would probably favour ISO-8859-1 used well for most of the site, with Shift-JIS used well for the Japanese board.

    Supporting the entire UCS is something we want to do, and will do* (indeed have to do for the sake of the poor bastards trying to use foreign languages on the language boards). Expect it to occur in usual boards timescales (everyone moans for ages, and then one day it'll just be there!).

    BTW IE's support for Unicode is quite good, its normally the fonts problems if a correctly encoded character doesn't render correctly.
    (You've seen evertype.com haven't you?)
    Of course! With all those language-registrations originating from Michael Everson (is that you, or do you work for him?) surely everyone has.

    *Well, it's something I want to do, and I sit close enough to some of the Admins to throw stuff at them.


  • Business & Finance Moderators, Entertainment Moderators Posts: 32,387 Mod ✭✭✭✭DeVore


    What Talliesin said re: Timescales and other stuff.

    Yoda, I think people were being snotty cos you arrived and pointed it out in what read as a bit of a smartarse way and then when asked for a possible solution you said "I dunno, I just point out problems" effectively :)

    We'll look into it and it will happen sometime...

    DeV.


  • Registered Users, Registered Users 2 Posts: 11,446 ✭✭✭✭amp


    I think I'm going to have to design a Gathering card for Yoda. Something like:

    Is it a bird? Is it a plane? No! It's the highly unlikely appearance of a flying humanoid wearing a lycra costume and who is defying gravity through some very suspect physics involving Red Giants. In fact I think this is all nonsense and am going to complain to Stan Lee....

    Mind you the use of the phrase 'My nipples explode with delight' disapoints me as it suggests that you may actually have a sense of humour :)


  • Registered Users, Registered Users 2 Posts: 1,853 ✭✭✭Yoda


    Originally posted by Shinji
    Really? I can read that post perfectly happily.

    Perhaps your Macintosh wonder-browser doesn't understand Shift-JIS? :)

    If I manually select a Japanese encoding it turns up in Japanese. This is why Unicode is what everyone should be migrating to. No need for language-specific encodings.


  • Registered Users, Registered Users 2 Posts: 1,853 ✭✭✭Yoda


    Originally posted by DeadBankClerk
    Because the feature that you have requested is a huge one to implement?
    That doesn't mean I wasn't right to propose its implementation.


  • Registered Users, Registered Users 2 Posts: 1,853 ✭✭✭Yoda


    Originally posted by DeVore
    Yoda, I think people were being snotty cos you arrived and pointed it out in what read as a bit of a smartarse way and then when asked for a possible solution you said "I dunno, I just point out problems" effectively
    I pointed it out with examples which failed. Why is that smartarse? I did say I'm not a programmer or systems network geek (as so many of you :) are) so I didn't have the solution, though I was specific about UTF-8....

    I don't know what a Gathering card is so I have no idea what Amp is on about.


  • Registered Users, Registered Users 2 Posts: 1,853 ✭✭✭Yoda


    Originally posted by Talliesin
    The problem isn't that we are using ISO-8859-1 in the encoding, it's that we aren't using it correctly. ISO-8859-1 can be used to transmit the entire UCS because this is HTML and hence we can use entity encodings.
    What, the &#xXXXX; entities? Ick. UTF-8 is so much nicer and can be edited in WYSYWIG.
    Maybe UTF-8 is the best way to go, though I would probably favour ISO-8859-1 used well for most of the site, with Shift-JIS used well for the Japanese board.
    And the Russian board? They might use KOI-8, or WinCyrillic, or Mac WorldScript. We have Latvians in Ireland now. Baltic Rim? Something else? So it's either put up with all the differing encodings. We have gamers who might want to write in Runic and Ogham :D . There are no standard encodings for them. All of this chaos, of course, is what Unicode is supposed to cure.
    Supporting the entire UCS is something we want to do [...] one day it'll just be there!
    I am delighted.
    BTW IE's support for Unicode is quite good, its normally the fonts problems if a correctly encoded character doesn't render correctly.
    On the Mac IE chokes where OmniWeb and Safari do far better.
    With all those language-registrations originating from Michael Everson (is that you, or do you work for him?) surely everyone has.
    Oh, my. Thanks. Yes, I am he.


  • Registered Users, Registered Users 2 Posts: 11,446 ✭✭✭✭amp


    Originally posted by Yoda
    I pointed it out with examples which failed. Why is that smartarse? I did say I'm not a programmer or systems network geek (as so many of you :) are) so I didn't have the solution, though I was specific about UTF-8....

    I don't know what a Gathering card is so I have no idea what Amp is on about.

    Soz, it's a bit of an In Joke:
    injoke.jpg

    Incidentily shouldn't this thread be in Bugs/Suggestions?

    Or am I just being pedantic?


  • Advertisement
  • Closed Accounts Posts: 9,314 ✭✭✭Talliesin


    Originally posted by Yoda
    What, the &#xXXXX; entities? Ick. UTF-8 is so much nicer and can be edited in WYSYWIG.

    Well I wouldn't suggest that the user have to code it, users should be able to write in any IANA-registered encoding. On output though there are disadvantages as well as advantages to using UTF-8 (though theoretically the only real choice is whether to use UTF-8, UTF-16 or UCS-4 in practice we have legacy issues with those).
    And the Russian board? They might use KOI-8, or WinCyrillic, or Mac WorldScript. We have Latvians in Ireland now. Baltic Rim? Something else? So it's either put up with all the differing encodings. We have gamers who might want to write in Runic and Ogham :D . There are no standard encodings for them.
    Certainly the runes and ogham would have their place on the Paganism board (and even Enochian characters).
    Still though, a solution to the current situation will be invisible to users. Besides which even if we do go with UTF-8 we won't be able to force posts to come in in UTF-8 so there is still work to do around that.
    All of this chaos, of course, is what Unicode is supposed to cure.

    Unicode != UTF-8. All HTML is in Unicode. This page is in Unicode (it's just encoded wrong, so it's wrong Unicode, but it's still Unicode! :) )


  • Closed Accounts Posts: 9,314 ✭✭✭Talliesin


    Originally posted by amp
    Soz, it's a bit of an In Joke:
    See Yoda, if you've been wondering why I'm so poor at staying firmly on-topic on the ieft-languages list now you know that this site isn't good for developping that skill.


Advertisement