|
![](/i/fill.gif) |
Am 23.02.2012 14:54, schrieb Warp:
> clipka<ano### [at] anonymous org> wrote:
>> (2) The editor does not expect a leading BOM in UTF-8; in that case, it
>> /must/ treat it according to the Unicode standard, which explicitly
>> states that the BOM is actually a perfectly valid normal character,
>> which just happens to be one of the many space characters, non-breaking
>> in this case, with zero width; so you're perfectly safe here as well,
>> unless you accidently strip it from the very beginning of the file.
>
> Does that mean that xFEFF is the zero-width nbsp in both UTF-16 and UTF-8?
If you're talking about codepoint, then obviously yes.
If you're talking about encoded byte sequence, then no; in UTF-16, it
would be encoded as xFE xFF or xFF xFE respectively, while in UTF-8 it
would always be encoded as xEF xBB xBF.
> Also: If the byte order happened to be the reverse of what the editor
> expects (assuming the editor does not support the BOM), wouldn't the
> multi-byte characters be garbage then?
I would be surprised to find an editor supporting UTF-16 but not the
BOM. As for UTF-8, the byte order for multi-byte characters (i.e.
codepoints x0100 and above) is unambiguously defined by the standard
(using a big-endian-ish encoding); as UTF-8 requires bit shifting
anyway, a byte-reversed encoding would provide no benefit for
little-endian machines and therefore doesn't exist.
Post a reply to this message
|
![](/i/fill.gif) |