|
|
Am 02.06.2018 um 10:14 schrieb Le_Forgeron:
> Le 01/06/2018 à 19:29, clipka a écrit :
>> - Non-ASCII characters in string literals: This I will also set aside
>> for now, until I get a clearer picture of whether the current
>> scene-global `charset` mechanism is even used to any extent worth
>> supporting, as I think it may be easier and cleaner to throw it
>> overboard (or at least ditch the `sys` setting) in favour of a per-file
>> mechanism.
>>
>
> 1. Is there, in our modern world, a need for something else than utf-8 ?
I'm primarily thinking of legacy files, or files created by legacy software.
> 2. I hope you do not expect editors to always insert a BOM header
No, of course not. Sticking to current UCS specs there, according to
which the signature is to be optional in UTF-8 encoding scheme.
Having or not having a signature BOM /may/ have side effects though --
most notably because without a signature it is impossible to distinguish
the format from ASCII or classic extended ASCII until the first
non-ASCII character is encountered (and even then it is a guess whether
it's really UTF-8), or some other means of specifying the encoding is
used. Such has been the case in v3.7, where a signature BOM was taken to
imply `global_settings { charset utf8 }`, while absence of both
signature BOM and `charset` caused UTF-8 files to be interpreted as
ASCII with unrecognized characters (quietly replaced with blanks, IIRC).
Post a reply to this message
|
|