|
![](/i/fill.gif) |
Le 23/02/2012 14:54, Warp a écrit :
> Does that mean that xFEFF is the zero-width nbsp in both UTF-16 and UTF-8?
>
Tsss... no cake for you.
0xFEFF is the UTF-16 encoding of BOM (Byte order mark). It is used to
signal endianess with UTF-16 (because 0xFFFE is not a valid utf-16,
indeed U+FFFE will never be a valid glyph).
Encoding U+FEFF in utf-8:
* has no purpose, there is no endianess to detect for utf-8 encoding
(but it is legit to have a BOM in utf-8)
* would be done as 3 bytes: 0xEF 0xBB 0xBF
BOM can also be useful when using UTF-32. (and other esoteric encoding
of unicode, such as utf-7, or utf-ebcdic, utf-1 (misnamed, IMHO), ... )
Notice that U+FEFF is deprecated as zero-width non breaking space.
You should use U+2060 (word joiner, zero width space non breaking),
and/or U+200B (zero width space, but breaking). At least in unicode 6.1
> Also: If the byte order happened to be the reverse of what the editor
> expects (assuming the editor does not support the BOM), wouldn't the
> multi-byte characters be garbage then?
>
That's the reason utf-16/32 need a BOM for automatic detection.
Post a reply to this message
|
![](/i/fill.gif) |