POV-Ray: Newsgroups: povray.beta-test: v3.8 character set handling: Re: v3.8 character set handling

POV-Ray : Newsgroups : povray.beta-test : v3.8 character set handling : Re: v3.8 character set handling		Server Time 12 Jul 2025 18:30:01 EDT (-0400)

From: jr
Date: 13 Jan 2019 04:50:00
Message: <web.5c3b093dd2c7bd4748892b50@news.povray.org>

hi,

clipka <ano### [at] anonymousorg> wrote:
> Am 12.01.2019 um 19:09 schrieb jr:
>
> >> To simplify conversion of old scenes, the text primitive syntax will be
> >> extended with a syntax allowing for more control over the lookup process:
> >>
> >>       #declare MyText = "a\u20ACb";
> >>       text { ttf "sym.ttf" cmap { 3,0 charset windows1252 } MyText }
> >
> > with alpha.10008988, and same code as in other thread modified to read:
> >
> > #version 3.8;
> > ......
> > text { ttf "arialbd.ttf" cmap { 1,0 charset utf8 S }
> > ......
> >
> > I get the following error:
> >
> > File 'pav-patt.pov' line 61: Parse Warning: Text primitive 'cmap' extension is
> >   experimental and may be subject to future changes.
> > File 'pav-patt.pov' line 61: Parse Error: Expected 'numeric expression', utf8
> >   found instead
> > Fatal error in parser: Cannot parse input.
> > Render failed
> >
> >
> > same for 'ascii'.
>
> Yes, change of plan, sorry. Specify `charset FLOAT` here, with `FLOAT`
> being one of the following values:
>
>      0       No remapping (effectively UCS4)
>      1200    UCS2 character set (16-bit subset of UCS, aka BMP)
>      1251    Windows-1251 character set (aka "ANSI Cyrillic")
>      1252    Windows-1252 character set (aka "ANSI Latin")
>      10000   Mac OS Roman
>      12000   UCS4 character set
>      28591   ISO-8859-1 character set (aka Latin-1)
>      -1      Special remapping for legacy Microsoft symbol fonts
>
> Note that these are character sets (collections of characters with an
> associated mapping to integral values, aka code points), _not_ character
> encoding schemes (character set with an associated scheme for storing
> character sequences as byte streams). So with UTF-8 being an encoding
> scheme, there's no dedicated value for it - use the value for UCS4
> instead, which is the character set used in UTF-8.
>
> There is no speicifc value for ASCII, but any of the above values except
> -1 will do, as they're all supersets of ASCII.
>
> We could also probably do without values 1200 (UCS2 being a subset of
> UCS4) and 28591 (ISO-8895-1 being a subset of both UCS2 and
> Windows-1252), but I happen to have implemented them anyway.
>
>
> I concede that the numeric values aren't easy to memorize, but this
> could be solved by supplying an include file that defines some common
> macros for the entire CMAP block, and/or variables (or maybe even a
> dictionary with string keys) for the charset numeric values.
>
>
> Also, as the first warning message already mentions, stay tuned for
> future changes to this feature. I'm still not happy with it - ideas for
> improvement continue to be highly welcome - and integration of the
> FreeType library may also necessitate modifications.

I think a dictionary (provided in 'charsets.inc?') with keys like 'utf8' and
'ascii' etc sounds ok.


regards, jr.

Post a reply to this message