|
![](/i/fill.gif) |
>>> - The "char" type works with Unicode. Well done. Oh, but wait... It only
>>> stores 16 bits, and yet Unicode actually requires 24 bits to represent a
>>> single code-point. So this "Unicode character" only actually covers the
>>> Basic Multilingual Plane. FAIL!
>>
>> Oh great. Apparently "char" doesn't store a code-point at all, it stores
>> a code-unit.
>>
>> For anything in the BMP, these are effectively the same thing. For
>> anything outside that range, *you* must manually write the code to
>> decode UTF-16 into actual code-points (which then do not fit into a
>> "char").
>
> Uh... why does this come as a surprise to you?
I guess I'm used to using a programming language where a Char is...
well... any valid Unicode code-point, and once you set the encoding of a
file handle, the library does all necessary encoding and decoding,
whether it's UTF-8, UTF-16, Latin-1 or whatever.
Still, I suppose it's better than char = unsigned byte. :-P
Post a reply to this message
|
![](/i/fill.gif) |