POV-Ray: Newsgroups: povray.beta-test: v3.8 character set handling: Re: v3.8 character set handling

POV-Ray : Newsgroups : povray.beta-test : v3.8 character set handling : Re: v3.8 character set handling		Server Time 30 Jun 2025 19:36:16 EDT (-0400)

From: clipka
Date: 12 Jan 2019 20:54:02
Message: <5c3a9a3a@news.povray.org>

Am 12.01.2019 um 19:09 schrieb jr:

>> To simplify conversion of old scenes, the text primitive syntax will be
>> extended with a syntax allowing for more control over the lookup process:
>>
>>       #declare MyText = "a\u20ACb";
>>       text { ttf "sym.ttf" cmap { 3,0 charset windows1252 } MyText }
> 
> with alpha.10008988, and same code as in other thread modified to read:
> 
> #version 3.8;
> ......
> text { ttf "arialbd.ttf" cmap { 1,0 charset utf8 S }
> ......
> 
> I get the following error:
> 
> File 'pav-patt.pov' line 61: Parse Warning: Text primitive 'cmap' extension is
>   experimental and may be subject to future changes.
> File 'pav-patt.pov' line 61: Parse Error: Expected 'numeric expression', utf8
>   found instead
> Fatal error in parser: Cannot parse input.
> Render failed
> 
> 
> same for 'ascii'.

Yes, change of plan, sorry. Specify `charset FLOAT` here, with `FLOAT` 
being one of the following values:

     0       No remapping (effectively UCS4)
     1200    UCS2 character set (16-bit subset of UCS, aka BMP)
     1251    Windows-1251 character set (aka "ANSI Cyrillic")
     1252    Windows-1252 character set (aka "ANSI Latin")
     10000   Mac OS Roman
     12000   UCS4 character set
     28591   ISO-8859-1 character set (aka Latin-1)
     -1      Special remapping for legacy Microsoft symbol fonts

Note that these are character sets (collections of characters with an 
associated mapping to integral values, aka code points), _not_ character 
encoding schemes (character set with an associated scheme for storing 
character sequences as byte streams). So with UTF-8 being an encoding 
scheme, there's no dedicated value for it - use the value for UCS4 
instead, which is the character set used in UTF-8.

There is no speicifc value for ASCII, but any of the above values except 
-1 will do, as they're all supersets of ASCII.

We could also probably do without values 1200 (UCS2 being a subset of 
UCS4) and 28591 (ISO-8895-1 being a subset of both UCS2 and 
Windows-1252), but I happen to have implemented them anyway.


I concede that the numeric values aren't easy to memorize, but this 
could be solved by supplying an include file that defines some common 
macros for the entire CMAP block, and/or variables (or maybe even a 
dictionary with string keys) for the charset numeric values.


Also, as the first warning message already mentions, stay tuned for 
future changes to this feature. I'm still not happy with it - ideas for 
improvement continue to be highly welcome - and integration of the 
FreeType library may also necessitate modifications.

Post a reply to this message