|
|
Warp wrote:
> Alain <ele### [at] netscapenet> wrote:
>> It's about 12 BITS per characters, on average, not BYTES! That's useing UTF8
>> encoding. About 16 BITS per characters if using UTF16 encoding.
>> UTF8 is only 7 BITS per characters if you stick to only standard ASCII
>> characters set, but it gets bigger if you also use extended ASCII or characters
>> from foreign alphabets.
>
> Except that if each single character is indeed garbage-collected, that
> requires quite a lot of memory per character (compared to the size of the
> character).
I can't actually find documentation to hand to clarify whether it's 12
bits or 12 bytes per character. (My strong suspecion is that it *is* 12
*bytes* - since, after all, a single Unicode code point is 24 bits already.)
The situation is actually worse than it looks. All this lazy evaluation
magic is implemented by storing program state around the place, so an
"unevaluated" string probably takes up more space still...
Post a reply to this message
|
|