POV-Ray: Newsgroups: povray.beta-test: POV-Ray v3.7 charset behaviour: Re: POV-Ray v3.7 charset behaviour

POV-Ray : Newsgroups : povray.beta-test : POV-Ray v3.7 charset behaviour : Re: POV-Ray v3.7 charset behaviour		Server Time 12 Jul 2025 14:00:14 EDT (-0400)
From: Thorsten Froehlich
Date: 6 Jun 2018 13:15:01
Message: <web.5b18164a4a827126535efa580@news.povray.org>
Note that the TrueType font decoding used to be rather buggy before 3.6 (or was
it 3.5?), which is why i.e. in 3.0 it matched the MacRoman font tables before
ever getting to the other tables.

clipka <ano### [at] anonymousorg> wrote:
> POV-Ray v3.7 (for Windows) behaves as follows with respect to different
> #version/charset settings (as tested with German locale):
>
>
> Common
> ======
>
> - "\uXXXX" escape sequences are technically always interpreted as UCS-2
> character codes. Note however that depending on the context in which a
> string is used, the /effective/ interpretation may vary.
>
> - `asc()` and `chr()` functions technically always operate according to
> UCS-2 character encoding. Note however that depending on the context in
> which a string is used, the /effective/ encoding may vary.
>
>
> 3.0 / ascii
> ===========
>
> - Non-ASCII octets in strings are technically decoded according to ISO
> 8859-1 (Latin-1; matching the display in the editor for many characters,
> except for codes hex 80-9F, though I presume the editor display may vary
> with the system's locale, while the Latin-1 decoding is invariant). Note
> however that depending on the context in which a string is used, the
> /effective/ decoding may vary.
>
> - When used in file names or debug output, strings are effectively
> subject to re-interpretation of the UCS-2 character codes as
> Windows-1252 codes (matching the display in the editor; presumably this
> varies with the system's locale), with character codes above hex FF
> interpreted modulo 256.
>
> - When used in text primitives, non-ASCII characters in strings are
> typically garbled, depending on the font used; for instace, with
> Microsoft's Arial font the text appears to be subject to
> re-interpretation of the UCS-2 character codes as Macintosh Roman codes,
> while with POV-Ray's `cyrvetic.ttf` the re-interpretation seems to be as
> Windows-1251 (Cyrillic) codes. Character codes above hex FF are treated
> according to obscure rules in all cases. (In some cases,
> re-interpretation may happen to match the display in the editor.)
>
>
> 3.0 / utf8
> ==========
>
> - Non-ASCII octets in strings are technically decoded according to
> UTF-8. Note however that depending on the context in which a string is
> used, non-ASCII characters may effectively become garbled.
>
> - When used in file names or debug output, strings are effectively
> subject to re-interpretation of the UCS-2 codes as Windows-1252 codes
> (presumably this varies with the system's locale), with codes above hex
> FF interpreted modulo 256.
>
> - When used in text primitives, non-ASCII characters in strings may or
> may not be garbled, depending on the font used; for instace, with
> Microsoft's Arial font the text is displayed as expected for UCS-2
> encoded text, while with POV-Ray's `cyrvetic.ttf` it appears to be
> subject to re-interpretation of the UCS-2 codes as Macintosh Cyrillic
> codes with codes above hex FF being treated according to obscure rules.
>
>
> 3.7 / ascii
> ===========
>
> - Non-ASCII octets in strings are decoded as blanks (ASCII hex 20;
> non-ASCII characters can still be inserted via `\uXXXX` escape sequences
> or the `chr()` function though).
>
> - When used in file names or debug output, non-ASCII characters in
> strings (entered via "\uXXXX" or `chr()`) are substituted with blanks.
>
> - When used in text primitives, non-ASCII characters in strings (entered
> via "\uXXXX" or `chr()`) are typically garbled, depending on the font
> used; for instace, with Microsoft's Arial font the text appears to be
> subject to re-interpretation of the codes as Macintosh Roman codes,
> while with POV-Ray's `cyrvetic.ttf` the re-interpretation seems to be as
> Windows-1251 (Cyrillic) codes. Character codes above hex FF are treated
> according to obscure rules in all cases.
>
>
> 3.7 / utf8
> ==========
>
> - Non-ASCII octets in strings are decoded according to UTF-8.
>
> - When used in file names or debug output, non-ASCII characters in
> strings are substituted with blanks.
>
> - When used in text primitives, non-ASCII characters in strings may or
> may not be garbled, depending on the font used; for instace, with
> Microsoft's Arial font the text is displayed as expected for UCS-2
> encoded text, while with POV-Ray's `cyrvetic.ttf` it appears to be
> subject to re-interpretation of the UCS-2 codes as Windows-1251
> (Cyrillic) codes, again with codes above hex FF being treated according
> to obscure rules.
Post a reply to this message