POV-Ray: Newsgroups: povray.beta-test: v3.8 character set handling

POV-Ray : Newsgroups : povray.beta-test : v3.8 character set handling		Server Time 12 Jul 2025 13:37:33 EDT (-0400)

<<< Previous 10 Messages

Goto Initial 10 Messages

From: clipka
Subject: Re: v3.8 character set handling
Date: 12 Jan 2019 20:54:02
Message: <5c3a9a3a@news.povray.org>

Am 12.01.2019 um 19:09 schrieb jr:

>> To simplify conversion of old scenes, the text primitive syntax will be
>> extended with a syntax allowing for more control over the lookup process:
>>
>>       #declare MyText = "a\u20ACb";
>>       text { ttf "sym.ttf" cmap { 3,0 charset windows1252 } MyText }
> 
> with alpha.10008988, and same code as in other thread modified to read:
> 
> #version 3.8;
> ......
> text { ttf "arialbd.ttf" cmap { 1,0 charset utf8 S }
> ......
> 
> I get the following error:
> 
> File 'pav-patt.pov' line 61: Parse Warning: Text primitive 'cmap' extension is
>   experimental and may be subject to future changes.
> File 'pav-patt.pov' line 61: Parse Error: Expected 'numeric expression', utf8
>   found instead
> Fatal error in parser: Cannot parse input.
> Render failed
> 
> 
> same for 'ascii'.

Yes, change of plan, sorry. Specify `charset FLOAT` here, with `FLOAT` 
being one of the following values:

     0       No remapping (effectively UCS4)
     1200    UCS2 character set (16-bit subset of UCS, aka BMP)
     1251    Windows-1251 character set (aka "ANSI Cyrillic")
     1252    Windows-1252 character set (aka "ANSI Latin")
     10000   Mac OS Roman
     12000   UCS4 character set
     28591   ISO-8859-1 character set (aka Latin-1)
     -1      Special remapping for legacy Microsoft symbol fonts

Note that these are character sets (collections of characters with an 
associated mapping to integral values, aka code points), _not_ character 
encoding schemes (character set with an associated scheme for storing 
character sequences as byte streams). So with UTF-8 being an encoding 
scheme, there's no dedicated value for it - use the value for UCS4 
instead, which is the character set used in UTF-8.

There is no speicifc value for ASCII, but any of the above values except 
-1 will do, as they're all supersets of ASCII.

We could also probably do without values 1200 (UCS2 being a subset of 
UCS4) and 28591 (ISO-8895-1 being a subset of both UCS2 and 
Windows-1252), but I happen to have implemented them anyway.


I concede that the numeric values aren't easy to memorize, but this 
could be solved by supplying an include file that defines some common 
macros for the entire CMAP block, and/or variables (or maybe even a 
dictionary with string keys) for the charset numeric values.


Also, as the first warning message already mentions, stay tuned for 
future changes to this feature. I'm still not happy with it - ideas for 
improvement continue to be highly welcome - and integration of the 
FreeType library may also necessitate modifications.

Post a reply to this message

From: jr
Subject: Re: v3.8 character set handling
Date: 13 Jan 2019 04:50:00
Message: <web.5c3b093dd2c7bd4748892b50@news.povray.org>

hi,

clipka <ano### [at] anonymousorg> wrote:
> Am 12.01.2019 um 19:09 schrieb jr:
>
> >> To simplify conversion of old scenes, the text primitive syntax will be
> >> extended with a syntax allowing for more control over the lookup process:
> >>
> >>       #declare MyText = "a\u20ACb";
> >>       text { ttf "sym.ttf" cmap { 3,0 charset windows1252 } MyText }
> >
> > with alpha.10008988, and same code as in other thread modified to read:
> >
> > #version 3.8;
> > ......
> > text { ttf "arialbd.ttf" cmap { 1,0 charset utf8 S }
> > ......
> >
> > I get the following error:
> >
> > File 'pav-patt.pov' line 61: Parse Warning: Text primitive 'cmap' extension is
> >   experimental and may be subject to future changes.
> > File 'pav-patt.pov' line 61: Parse Error: Expected 'numeric expression', utf8
> >   found instead
> > Fatal error in parser: Cannot parse input.
> > Render failed
> >
> >
> > same for 'ascii'.
>
> Yes, change of plan, sorry. Specify `charset FLOAT` here, with `FLOAT`
> being one of the following values:
>
>      0       No remapping (effectively UCS4)
>      1200    UCS2 character set (16-bit subset of UCS, aka BMP)
>      1251    Windows-1251 character set (aka "ANSI Cyrillic")
>      1252    Windows-1252 character set (aka "ANSI Latin")
>      10000   Mac OS Roman
>      12000   UCS4 character set
>      28591   ISO-8859-1 character set (aka Latin-1)
>      -1      Special remapping for legacy Microsoft symbol fonts
>
> Note that these are character sets (collections of characters with an
> associated mapping to integral values, aka code points), _not_ character
> encoding schemes (character set with an associated scheme for storing
> character sequences as byte streams). So with UTF-8 being an encoding
> scheme, there's no dedicated value for it - use the value for UCS4
> instead, which is the character set used in UTF-8.
>
> There is no speicifc value for ASCII, but any of the above values except
> -1 will do, as they're all supersets of ASCII.
>
> We could also probably do without values 1200 (UCS2 being a subset of
> UCS4) and 28591 (ISO-8895-1 being a subset of both UCS2 and
> Windows-1252), but I happen to have implemented them anyway.
>
>
> I concede that the numeric values aren't easy to memorize, but this
> could be solved by supplying an include file that defines some common
> macros for the entire CMAP block, and/or variables (or maybe even a
> dictionary with string keys) for the charset numeric values.
>
>
> Also, as the first warning message already mentions, stay tuned for
> future changes to this feature. I'm still not happy with it - ideas for
> improvement continue to be highly welcome - and integration of the
> FreeType library may also necessitate modifications.

I think a dictionary (provided in 'charsets.inc?') with keys like 'utf8' and
'ascii' etc sounds ok.


regards, jr.

Post a reply to this message

From: jr
Subject: Re: v3.8 character set handling
Date: 13 Jan 2019 07:20:00
Message: <web.5c3b2c0bd2c7bd4748892b50@news.povray.org>

hi,

clipka <ano### [at] anonymousorg> wrote:
> >>       text { ttf "sym.ttf" cmap { 3,0 charset windows1252 } MyText }
>
>      0       No remapping (effectively UCS4)
>      1200    UCS2 character set (16-bit subset of UCS, aka BMP)
>      1251    Windows-1251 character set (aka "ANSI Cyrillic")
>      1252    Windows-1252 character set (aka "ANSI Latin")
>      10000   Mac OS Roman
>      12000   UCS4 character set
>      28591   ISO-8859-1 character set (aka Latin-1)
>      -1      Special remapping for legacy Microsoft symbol fonts
>

can you confirm that I'm using the correct syntax?  because the new alpha gives
me the same error.

Script started on Sun 13 Jan 2019 12:05:40 GMT
jr@crow:1:pave$ c### [at] pav-pattpov
// Hintergrund
#version 3.8;
global_settings {assumed_gamma 1}
  ...
    text { ttf "arialbd.ttf" cmap { 1,0 charset 0 } S }
  ...

jr@crow:2:pave$ pov38 +a0.1 +ipa### [at] tpov
Persistence of Vision(tm) Ray Tracer Version 3.8.0-alpha.10011104.unofficial
 (g++ -std=gnu++11 4.8.2 @ x86_64-slackware-linux-gnu)
  ...
==== [Parsing...] ==========================================================
File 'pav-patt.pov' line 61: Parse Warning: Text primitive 'cmap' extension is
 experimental and may be subject to future changes.
File 'pav-patt.pov' line 61: Parse Error: Expected 'numeric expression', } found
 instead
Fatal error in parser: Cannot parse input.
Render failed

regards, jr.

Post a reply to this message

From: clipka
Subject: Re: v3.8 character set handling
Date: 13 Jan 2019 08:35:18
Message: <5c3b3e96@news.povray.org>

Am 13.01.2019 um 13:16 schrieb jr:

> can you confirm that I'm using the correct syntax?  because the new alpha gives
> me the same error.

To be precise, it gives you the same error /message/.

It's not my usual style, but for the sake of maximum user experience 
I'll say no more, except that no nits were picked in the making of this 
post ;)

(Took me a while, too.)

Post a reply to this message

From: jr
Subject: Re: v3.8 character set handling
Date: 13 Jan 2019 09:00:00
Message: <web.5c3b4410d2c7bd4748892b50@news.povray.org>

hi,

clipka <ano### [at] anonymousorg> wrote:
> Am 13.01.2019 um 13:16 schrieb jr:
> > can you confirm that I'm using the correct syntax?  because the new alpha gives
> > me the same error.
>
> To be precise, it gives you the same error /message/.

syntax correct, then.  on to the next alpha..  :-)

> It's not my usual style, but for the sake of maximum user experience
> I'll say no more, except that no nits were picked in the making of this
> post ;)
>
> (Took me a while, too.)

(I blame the binge-watching.  :-))


regards, jr.

Post a reply to this message

<<< Previous 10 Messages

Goto Initial 10 Messages