|
|
Le Forgeron wrote:
> Le 01.11.2007 20:00, Orchid XP v7 nous fit lire :
>> No - it's Unicode. 24 bits per character. ;-)
>
> Unicode is just a bunch of tables of glyphs (lot of tables, lot of
> glyphs).
> It is past 24 bits since a few... (even past 32 bits!!!)
> The real thing is how you encode all these.
> UTF-8 is one way (the popular one these days),
> UTF-16 another... and raw storage the worst idea ever!
>
> UTF-8 is about 8 bits for ascii range, usually go up to 24 bits (3 x
> 8) for classical japanese, 16 for most french variants...
I was under the impression that these encodings apply to *strings*, not
individual characters by themselves...
>> But yes, the standard Haskell string type is geared to flexibility, not
>> performance. See my ByteString comments...
>
> Fixed size unicode... if only they stop adding more tables!
Oh, ByteString (as the name somewhat implies) only supports the first
256 Unicode code-points. ;-)
Somebody should really fix that eventually...
Post a reply to this message
|
|