POV-Ray: Newsgroups: povray.off-topic: The trouble with XSLT: Re: The trouble with XSLT

POV-Ray : Newsgroups : povray.off-topic : The trouble with XSLT : Re: The trouble with XSLT		Server Time 2 Jul 2025 12:24:17 EDT (-0400)

From: clipka
Date: 23 Feb 2012 11:49:39
Message: <4f466e23$1@news.povray.org>

Am 23.02.2012 14:54, schrieb Warp:
> clipka<ano### [at] anonymousorg>  wrote:
>> (2) The editor does not expect a leading BOM in UTF-8; in that case, it
>> /must/ treat it according to the Unicode standard, which explicitly
>> states that the BOM is actually a perfectly valid normal character,
>> which just happens to be one of the many space characters, non-breaking
>> in this case, with zero width; so you're perfectly safe here as well,
>> unless you accidently strip it from the very beginning of the file.
>
>    Does that mean that xFEFF is the zero-width nbsp in both UTF-16 and UTF-8?

If you're talking about codepoint, then obviously yes.

If you're talking about encoded byte sequence, then no; in UTF-16, it 
would be encoded as xFE xFF or xFF xFE respectively, while in UTF-8 it 
would always be encoded as xEF xBB xBF.

>    Also: If the byte order happened to be the reverse of what the editor
> expects (assuming the editor does not support the BOM), wouldn't the
> multi-byte characters be garbage then?

I would be surprised to find an editor supporting UTF-16 but not the 
BOM. As for UTF-8, the byte order for multi-byte characters (i.e. 
codepoints x0100 and above) is unambiguously defined by the standard 
(using a big-endian-ish encoding); as UTF-8 requires bit shifting 
anyway, a byte-reversed encoding would provide no benefit for 
little-endian machines and therefore doesn't exist.

Post a reply to this message