|
|
clipka <ano### [at] anonymousorg> wrote:
> That's /one/ reason why XML requires UTF-16 files to start with a byte
> order mark: If the file starts with "FE FF" or "FF FE", it's either
> UTF-16, malformed XML (not starting with optional whitespace followed by
> "<?"), or indeed garbish, so UTF-16 is a safe bet in that case.
> Otherwise rely on the file format being backward compatible with ASCII,
> and treat it as UTF-8 (which is ASCII-compatible as well) until an
> encoding declaration tells you otherwise.
OTOH, it would be quite trivial to guess that a HTML/XML file is
UTF-16-encoded: If each other byte is 0 and each other form valid
HTML/XML (at least up to the header section that specifies the encoding),
then it's a safe bet that it's UTF16-encoded.
--
- Warp
Post a reply to this message
|
|