|
|
Orchid XP v7 nous apporta ses lumieres en ce 2007/11/01 17:15:
> Warp wrote:
>
>> UTF-8 encoding "wastes" some bits (in order to use less bits for the
>> most used western characters) and requires at most 4 bytes per character
>> (even though the characters requiring more than 3 bytes are very rarely
>> used).
>
> And thus, like any decent variable-length encoding scheme, it tries to
> assign short codes to common symbols. (Although UTF-8 probably fails
> horribly for, say, Japanese text. I don't actually know...)
For Japanese and Chinese, it average around 3 bytes per characters. It's not so
bad after all, as each characters in those represent a whole word, some even
represent a whole phrase or some complexe concept.
A 1000 glyphs text in Chinese would be, roughly, a 1000 to 5000 words text in
English!
>
>>> UTF-16 another... and raw storage the worst idea ever!
>>
>> Why would raw storage be the worst idea? There are several advantages.
>> The disadvantage is, of course, an increased memory requirement.
>
> You win some, you loose some. Programming is all about these kinds of
> compromises. :-)
All possible characters for all European languages fit in 1 or 2 bytes, and I
think that it also include Arabic and Cyrilic. The Asiatic glyphs use the bulk
of the 3 and 4 bytes codes.
--
Alain
-------------------------------------------------
If you're ever about to be mugged by a couple of clowns, don't hesitate - go for
the juggler.
Post a reply to this message
|
|