|
![](/i/fill.gif) |
Am 23.02.2012 11:57, schrieb Invisible:
>> You also need a unicode-compatible editor...
>
> I've got that. The problem isn't the editor handling Unicode, it's
> figuring out which encoding an arbitrary text file happens to use. It
> seems as soon as you use any encoding other than the Windows default
> (whatever the hell that is), things get messy, rapidly.
That's because figuring out the encoding of an extended-ASCII text file
is, in fact, virtually impossible (unless you know details about the
contents, e.g. you can recognize the encoding if you know that it's an
HTML file), due to the fact that none of them has a standardized file
signature. With the sole exception of UTF-8 with leading BOM, where the
BOM character can double-feature as such a signature.
A leading BOM in UTF-8 files can cause problems with files such as shell
scripts or C/C++ source code (because it is indeed a part of the
character stream rather than a mere file signature; strictly speaking
the same is actually true for UTF-16 as well); but at the beginning of
XML files it is explicitly allowed.
Post a reply to this message
|
![](/i/fill.gif) |