POV-Ray : Newsgroups : povray.beta-test : New version of new tokenizer : Re: New version of new tokenizer Server Time
2 Dec 2022 21:49:41 EST (-0500)
  Re: New version of new tokenizer  
From: clipka
Date: 2 Jun 2018 05:22:19
Message: <5b1261cb@news.povray.org>
Am 02.06.2018 um 10:14 schrieb Le_Forgeron:
> Le 01/06/2018 à 19:29, clipka a écrit :
>> - Non-ASCII characters in string literals: This I will also set aside
>> for now, until I get a clearer picture of whether the current
>> scene-global `charset` mechanism is even used to any extent worth
>> supporting, as I think it may be easier and cleaner to throw it
>> overboard (or at least ditch the `sys` setting) in favour of a per-file
>> mechanism.
>>
> 
> 1. Is there, in our modern world, a need for something else than utf-8 ?

I'm primarily thinking of legacy files, or files created by legacy software.

> 2. I hope you do not expect editors to always insert a BOM header

No, of course not. Sticking to current UCS specs there, according to
which the signature is to be optional in UTF-8 encoding scheme.

Having or not having a signature BOM /may/ have side effects though --
most notably because without a signature it is impossible to distinguish
the format from ASCII or classic extended ASCII until the first
non-ASCII character is encountered (and even then it is a guess whether
it's really UTF-8), or some other means of specifying the encoding is
used. Such has been the case in v3.7, where a signature BOM was taken to
imply `global_settings { charset utf8 }`, while absence of both
signature BOM and `charset` caused UTF-8 files to be interpreted as
ASCII with unrecognized characters (quietly replaced with blanks, IIRC).


Post a reply to this message

Copyright 2003-2021 Persistence of Vision Raytracer Pty. Ltd.