POV-Ray: Newsgroups: povray.beta-test: POV-Ray v3.7 charset behaviour: Re: POV-Ray v3.7 charset behaviour

POV-Ray : Newsgroups : povray.beta-test : POV-Ray v3.7 charset behaviour : Re: POV-Ray v3.7 charset behaviour		Server Time 15 Jul 2025 17:45:27 EDT (-0400)

From: Thorsten Froehlich
Date: 6 Jun 2018 13:30:07
Message: <web.5b1819544a827126535efa580@news.povray.org>

clipka <ano### [at] anonymousorg> wrote:
> 3.7 / ascii
> ===========
>
> - Non-ASCII octets in strings are decoded as blanks (ASCII hex 20;
> non-ASCII characters can still be inserted via `\uXXXX` escape sequences
> or the `chr()` function though).
>
> - When used in file names or debug output, non-ASCII characters in
> strings (entered via "\uXXXX" or `chr()`) are substituted with blanks.
>
> - When used in text primitives, non-ASCII characters in strings (entered
> via "\uXXXX" or `chr()`) are typically garbled, depending on the font
> used; for instace, with Microsoft's Arial font the text appears to be
> subject to re-interpretation of the codes as Macintosh Roman codes,
> while with POV-Ray's `cyrvetic.ttf` the re-interpretation seems to be as
> Windows-1251 (Cyrillic) codes. Character codes above hex FF are treated
> according to obscure rules in all cases.
>
>
> 3.7 / utf8
> ==========
>
> - Non-ASCII octets in strings are decoded according to UTF-8.
>
> - When used in file names or debug output, non-ASCII characters in
> strings are substituted with blanks.
>
> - When used in text primitives, non-ASCII characters in strings may or
> may not be garbled, depending on the font used; for instace, with
> Microsoft's Arial font the text is displayed as expected for UCS-2
> encoded text, while with POV-Ray's `cyrvetic.ttf` it appears to be
> subject to re-interpretation of the UCS-2 codes as Windows-1251
> (Cyrillic) codes, again with codes above hex FF being treated according
> to obscure rules.

I won't dispute your results because they don't surprise me, but I dispute the
fact that the code behaves unpredictably: The code is in OpenFontFile, and it
does pick the Unicode tables as top priority if Unicode is specified as text
format. It only falls back to MacRoman (which next to all TTFs contain for
legacy reasons) if it cannot find any of the Unicode formats that it does
support. So the real question is if maybe in the 18 years since this code was
introduced in 3.5 based on TTFs shipped with Mac OS 9 and Windows 2000, some
other TTF Unicode formats were introduced that the code does not support...

Post a reply to this message