|
|
Orchid XP v7 nous apporta ses lumieres en ce 2007/11/01 15:03:
> Warp wrote:
>> Alain <ele### [at] netscapenet> wrote:
>>> It's about 12 BITS per characters, on average, not BYTES! That's
>>> useing UTF8 encoding. About 16 BITS per characters if using UTF16
>>> encoding.
>>> UTF8 is only 7 BITS per characters if you stick to only standard
>>> ASCII characters set, but it gets bigger if you also use extended
>>> ASCII or characters from foreign alphabets.
>>
>> Except that if each single character is indeed garbage-collected, that
>> requires quite a lot of memory per character (compared to the size of the
>> character).
>
> I can't actually find documentation to hand to clarify whether it's 12
> bits or 12 bytes per character. (My strong suspecion is that it *is* 12
> *bytes* - since, after all, a single Unicode code point is 24 bits
> already.)
>
> The situation is actually worse than it looks. All this lazy evaluation
> magic is implemented by storing program state around the place, so an
> "unevaluated" string probably takes up more space still...
Using UTF8 encoding, a single character can be 1 BYTE, 2, 3 or 4 Bytes long.
Standard ASCII characters are 1 BYTE. extended ASCII and some more use 2 bytes.
BIT 7 is a flag for 2 bytes codes. BITS 6 and 7 set to 1 = 3 bytes codes. BITS
5, 6 and 7 set to 1 = 4 bytes codes.
Using UTF16 encoding, any character is 2 BYTES long, for a grand total is 65536
possible characters, not all of them been printable.
--
Alain
-------------------------------------------------
Error in operator: add beer
Post a reply to this message
|
|