POV-Ray: Newsgroups: povray.beta-test: POV-Ray v3.7 charset behaviour: POV-Ray v3.7 charset behaviour

POV-Ray : Newsgroups : povray.beta-test : POV-Ray v3.7 charset behaviour : POV-Ray v3.7 charset behaviour		Server Time 12 Jul 2025 16:11:04 EDT (-0400)
From: clipka
Date: 5 Jun 2018 17:28:11
Message: <5b17006b$1@news.povray.org>
POV-Ray v3.7 (for Windows) behaves as follows with respect to different
#version/charset settings (as tested with German locale):


Common
======

- "\uXXXX" escape sequences are technically always interpreted as UCS-2
character codes. Note however that depending on the context in which a
string is used, the /effective/ interpretation may vary.

- `asc()` and `chr()` functions technically always operate according to
UCS-2 character encoding. Note however that depending on the context in
which a string is used, the /effective/ encoding may vary.


3.0 / ascii
===========

- Non-ASCII octets in strings are technically decoded according to ISO
8859-1 (Latin-1; matching the display in the editor for many characters,
except for codes hex 80-9F, though I presume the editor display may vary
with the system's locale, while the Latin-1 decoding is invariant). Note
however that depending on the context in which a string is used, the
/effective/ decoding may vary.

- When used in file names or debug output, strings are effectively
subject to re-interpretation of the UCS-2 character codes as
Windows-1252 codes (matching the display in the editor; presumably this
varies with the system's locale), with character codes above hex FF
interpreted modulo 256.

- When used in text primitives, non-ASCII characters in strings are
typically garbled, depending on the font used; for instace, with
Microsoft's Arial font the text appears to be subject to
re-interpretation of the UCS-2 character codes as Macintosh Roman codes,
while with POV-Ray's `cyrvetic.ttf` the re-interpretation seems to be as
Windows-1251 (Cyrillic) codes. Character codes above hex FF are treated
according to obscure rules in all cases. (In some cases,
re-interpretation may happen to match the display in the editor.)


3.0 / utf8
==========

- Non-ASCII octets in strings are technically decoded according to
UTF-8. Note however that depending on the context in which a string is
used, non-ASCII characters may effectively become garbled.

- When used in file names or debug output, strings are effectively
subject to re-interpretation of the UCS-2 codes as Windows-1252 codes
(presumably this varies with the system's locale), with codes above hex
FF interpreted modulo 256.

- When used in text primitives, non-ASCII characters in strings may or
may not be garbled, depending on the font used; for instace, with
Microsoft's Arial font the text is displayed as expected for UCS-2
encoded text, while with POV-Ray's `cyrvetic.ttf` it appears to be
subject to re-interpretation of the UCS-2 codes as Macintosh Cyrillic
codes with codes above hex FF being treated according to obscure rules.


3.7 / ascii
===========

- Non-ASCII octets in strings are decoded as blanks (ASCII hex 20;
non-ASCII characters can still be inserted via `\uXXXX` escape sequences
or the `chr()` function though).

- When used in file names or debug output, non-ASCII characters in
strings (entered via "\uXXXX" or `chr()`) are substituted with blanks.

- When used in text primitives, non-ASCII characters in strings (entered
via "\uXXXX" or `chr()`) are typically garbled, depending on the font
used; for instace, with Microsoft's Arial font the text appears to be
subject to re-interpretation of the codes as Macintosh Roman codes,
while with POV-Ray's `cyrvetic.ttf` the re-interpretation seems to be as
Windows-1251 (Cyrillic) codes. Character codes above hex FF are treated
according to obscure rules in all cases.


3.7 / utf8
==========

- Non-ASCII octets in strings are decoded according to UTF-8.

- When used in file names or debug output, non-ASCII characters in
strings are substituted with blanks.

- When used in text primitives, non-ASCII characters in strings may or
may not be garbled, depending on the font used; for instace, with
Microsoft's Arial font the text is displayed as expected for UCS-2
encoded text, while with POV-Ray's `cyrvetic.ttf` it appears to be
subject to re-interpretation of the UCS-2 codes as Windows-1251
(Cyrillic) codes, again with codes above hex FF being treated according
to obscure rules.
Post a reply to this message