POV-Ray : Newsgroups : povray.off-topic : The trouble with XSLT : Re: The trouble with XSLT Server Time
29 Jul 2024 18:25:51 EDT (-0400)
  Re: The trouble with XSLT  
From: Le Forgeron
Date: 23 Feb 2012 10:11:01
Message: <4f465705$1@news.povray.org>
Le 23/02/2012 14:54, Warp a écrit :

>   Does that mean that xFEFF is the zero-width nbsp in both UTF-16 and UTF-8?
> 

Tsss... no cake for you.

0xFEFF is the UTF-16 encoding of BOM (Byte order mark). It is used to
signal endianess with UTF-16 (because 0xFFFE is not a valid utf-16,
indeed U+FFFE will never be a valid glyph).

Encoding U+FEFF in utf-8:
 * has no purpose, there is no endianess to detect for utf-8 encoding
(but it is legit to have a BOM in utf-8)
 * would be done as 3 bytes: 0xEF 0xBB 0xBF

BOM can also be useful when using UTF-32. (and other esoteric encoding
of unicode, such as utf-7, or utf-ebcdic, utf-1 (misnamed, IMHO), ... )

Notice that U+FEFF is deprecated as zero-width non breaking space.
You should use U+2060 (word joiner, zero width space non breaking),
and/or U+200B (zero width space, but breaking). At least in unicode 6.1

>   Also: If the byte order happened to be the reverse of what the editor
> expects (assuming the editor does not support the BOM), wouldn't the
> multi-byte characters be garbage then?
> 
That's the reason utf-16/32 need a BOM for automatic detection.


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.