|
![](/i/fill.gif) |
Am 23.02.2012 10:11, schrieb Invisible:
>>> I wonder how widely implemented this undocumented feature is?
>>
>> Most likely more widespread than you think.
>
> I got the distinct impression that this is a Windows-specific
> convention. (Doesn't Linux do something strange with using environment
> variables to define the "system locale"?)
From the current XML spec:
"Entities encoded in UTF-16 MUST and entities encoded in UTF-8 MAY begin
with the Byte Order Mark described by Annex H of [ISO/IEC 10646:2000],
section 16.8 of [Unicode] (the ZERO WIDTH NO-BREAK SPACE character,
#xFEFF)."
As for this being supported by editors, there are two possible cases:
(1) The editor treats UTF-8 with a leading BOM as a special encoding; in
that case, it will strip the BOM from the character stream upon reading,
and prepend it upon writing, so you're perfectly safe here.
(2) The editor does not expect a leading BOM in UTF-8; in that case, it
/must/ treat it according to the Unicode standard, which explicitly
states that the BOM is actually a perfectly valid normal character,
which just happens to be one of the many space characters, non-breaking
in this case, with zero width; so you're perfectly safe here as well,
unless you accidently strip it from the very beginning of the file.
Post a reply to this message
|
![](/i/fill.gif) |