POV-Ray : Newsgroups : povray.beta-test : POV-Ray v3.7 charset behaviour : Re: POV-Ray v3.7 charset behaviour Server Time
28 Apr 2024 20:35:55 EDT (-0400)
  Re: POV-Ray v3.7 charset behaviour  
From: clipka
Date: 26 Jun 2018 09:41:26
Message: <5b324286$1@news.povray.org>
Am 26.06.2018 um 05:12 schrieb Kenneth:

> Going forward:
> When you have finished your work on restructuring the parser, and someone wants
> to write an #include file using UTF-8 encoding (with or without a BOM): Which of
> the following two contructs is the proper way to code the scene file/#include
> file combo:
> 
> A)
> Scene file:
> global_settings{... charset utf8}
> #include "MY FILE.txt" // encoded as UTF-8 but with no charset keyword
> 
> OR B):
> scene file:
> global_settings{...) // no charset keyword
> #include "MY FILE.txt" // encoded as UTF-8 and with its *own*
>                        // global_settings{charset utf8}
> 
> I'm still a bit confused as to which is correct-- although B) looks like the
> logical choice(?). The documentation about 'charset' seems to imply this.

When I have finished my work?

Probably neither. The `global_settings { charset FOO }` mechanism isn't
really ideal, and I'm pretty sure I'll be deprecating it and introducing
something different, possibly along the following lines:

(1) A signature-based mechanism to auto-detect UTF-8 with signature, and
maybe also UTF-16 and/or UTF-32 (either endian variant).

(2) A `#charset STRING_LITERAL` directive to explicitly specify the
encoding on a per-file basis. This setting would explicitly apply only
to the respective file itself, and would probably have to appear at the
top of the file (right alongside the initial `#version` directive).

(3a.1) An INI setting `Charset_Autodetect=BOOL` to specify whether
POV-Ray should attempt to auto-detect UTF-8 without signature (and maybe
certain other encodings) based on the first non-ASCII byte sequence in
the file.

(3a.2) An INI setting `Charset_Default=STRING` to specify what character
set should be presumed for files that have neither a signature, nor a
`#charset` statement, nor can be recognized based on the first non-ASCII
byte sequence.

-or-

(3b) An INI setting `Charset_Autodetect=STRING_LIST` to specify a list
of character sets, in order of descending preference, to try to
auto-detect based on the first non-ASCII byte sequence in the file.


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.