|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Am 04.01.2019 um 19:18 schrieb Alain:
> Will it be possible to directly use UTF-8 characters ?
> After all, if you can directly enter characters like à é è ô ç (direct
> access) or easily like €(altchar+e) ñ(altchar+ç,n) from your keyboard as
> I just did, you should be able to use them instead of the cumbersome codes.
Short answer: The `\uXXXX` notation won't be necessary. I just used it
to avoid non-ASCII characters in my post.
Looooong answer:
It depends on what you're taling about.
First, let's get an elephant - or should I say mammoth - out of the
room: The editor component of the Windows GUI. It's old and crappy, and
doesn't support UTF-8 at all. It does support Windows-1252 though (at
least on my system; I guess it may depend on what locale you have
configured in Windows), which has all the characters you mentioned.
Now if you are using a different editor, using verbatim "UTF-8
characters" should be no problem: Enter the characters, save the file as
UTF-8, done.
The characters will be encoded directly as UTF-8, and the parser will
work with them just fine (provided you're only using them in string
literals or comments); no need for `\uXXXX` notation.
Alternatively, you could enter the same characters in the same editor,
and save the file as "Windows-1252" (or maybe called "ANSI" or
"Latin-1"), or enter them in POV-Ray for Windows and just save the file
without specifying a particular encoding (because you can't).
In that case the characters will be encoded as Windows-1252, and in most
cases the parser will also work with them just fine (again, string
literals or comments only); again no need for `\uXXXX` notation.
What the parser will do in such a case is first convert the
Windows-1252-enoded characters to Unicode, and then proceed in just the
same way.
For example:
#declare MyText = "a€b"; // a Euro sign between `a` and `b`
will create a string containing `a` (U+0061) followed by a Euro sign
(U+20AC) followed by `b` (U+0062), no matter whether the file uses UTF-8
encoding or Windows-1252 encoding. In both cases, the parser will
interpret the thing between `a` and `b` as U+20AC, even though in a
UTF-8 encided file that thing is represented by the byte sequence hex
E2,82,AC while in a Windows-1252 encoded file it is represented by the
single byte hex 80.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
in news:5c3003db$1@news.povray.org clipka wrote:
> what you suggest would still require
> at least some baseline maintenance on all the old versions we want to
> support
Where lies the "break even point", how much does being backwards
compatibel cost versus this other maintenance with regards to the ability
/ possibility to take bigger / different development steps? Now, I know
you can't put a percentage on that ;) Just me wondering, looking at what
happened in the Python world with 2 & 3. Yesterday I 'broke' my mesh
macro's that are also in 3.7 by adding a dictionary and by changing the
way resolution is set...
Ingo
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Le 19-01-04 à 21:04, clipka a écrit :
> Am 04.01.2019 um 19:18 schrieb Alain:
>
>> Will it be possible to directly use UTF-8 characters ?
>> After all, if you can directly enter characters like à é è ô ç (direct
>> access) or easily like €(altchar+e) ñ(altchar+ç,n) from your keyboard
>> as I just did, you should be able to use them instead of the
>> cumbersome codes.
>
> Short answer: The `\uXXXX` notation won't be necessary. I just used it
> to avoid non-ASCII characters in my post.
>
>
> Looooong answer:
>
>
> It depends on what you're taling about.
>
> First, let's get an elephant - or should I say mammoth - out of the
> room: The editor component of the Windows GUI. It's old and crappy, and
> doesn't support UTF-8 at all. It does support Windows-1252 though (at
> least on my system; I guess it may depend on what locale you have
> configured in Windows), which has all the characters you mentioned.
>
>
> Now if you are using a different editor, using verbatim "UTF-8
> characters" should be no problem: Enter the characters, save the file as
> UTF-8, done.
>
> The characters will be encoded directly as UTF-8, and the parser will
> work with them just fine (provided you're only using them in string
> literals or comments); no need for `\uXXXX` notation.
>
>
> Alternatively, you could enter the same characters in the same editor,
> and save the file as "Windows-1252" (or maybe called "ANSI" or
> "Latin-1"), or enter them in POV-Ray for Windows and just save the file
> without specifying a particular encoding (because you can't).
>
> In that case the characters will be encoded as Windows-1252, and in most
> cases the parser will also work with them just fine (again, string
> literals or comments only); again no need for `\uXXXX` notation.
>
> What the parser will do in such a case is first convert the
> Windows-1252-enoded characters to Unicode, and then proceed in just the
> same way.
>
>
> For example:
>
> #declare MyText = "a€b"; // a Euro sign between `a` and `b`
>
> will create a string containing `a` (U+0061) followed by a Euro sign
> (U+20AC) followed by `b` (U+0062), no matter whether the file uses UTF-8
> encoding or Windows-1252 encoding. In both cases, the parser will
> interpret the thing between `a` and `b` as U+20AC, even though in a
> UTF-8 encided file that thing is represented by the byte sequence hex
> E2,82,AC while in a Windows-1252 encoded file it is represented by the
> single byte hex 80.
Nice.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Am 03.01.2019 um 20:49 schrieb clipka:
> (5) Text primitives will use UCS encoding unless specified otherwise.
...
> To simplify conversion of old scenes, the text primitive syntax will be
> extended with a syntax allowing for more control over the lookup process:
>
> #declare MyText = "a\u20ACb";
> text { ttf "sym.ttf" cmap { 3,0 charset windows1252 } MyText }
I think I will change that as following:
text { ttf "sym.ttf" cmap { 3,0 charset 1252 } MyText }
with a few select charset numbers defined; most notably:
1252 Windows code page 1252
(for obvious reasons)
10000 Mac OS Roman
(because Windows supports this as code page 10000)
61440 MS legacy symbol font (Wingdings etc.) remapping to
Unicode Private Use Area U+F000..U+F0FF
(because 61440 = hex F000)
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
hi,
clipka <ano### [at] anonymousorg> wrote:
> To simplify conversion of old scenes, the text primitive syntax will be
> extended with a syntax allowing for more control over the lookup process:
>
> #declare MyText = "a\u20ACb";
> text { ttf "sym.ttf" cmap { 3,0 charset windows1252 } MyText }
with alpha.10008988, and same code as in other thread modified to read:
#version 3.8;
.....
text { ttf "arialbd.ttf" cmap { 1,0 charset utf8 S }
.....
I get the following error:
File 'pav-patt.pov' line 61: Parse Warning: Text primitive 'cmap' extension is
experimental and may be subject to future changes.
File 'pav-patt.pov' line 61: Parse Error: Expected 'numeric expression', utf8
found instead
Fatal error in parser: Cannot parse input.
Render failed
same for 'ascii'.
regards, jr.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Am 12.01.2019 um 19:09 schrieb jr:
>> To simplify conversion of old scenes, the text primitive syntax will be
>> extended with a syntax allowing for more control over the lookup process:
>>
>> #declare MyText = "a\u20ACb";
>> text { ttf "sym.ttf" cmap { 3,0 charset windows1252 } MyText }
>
> with alpha.10008988, and same code as in other thread modified to read:
>
> #version 3.8;
> ......
> text { ttf "arialbd.ttf" cmap { 1,0 charset utf8 S }
> ......
>
> I get the following error:
>
> File 'pav-patt.pov' line 61: Parse Warning: Text primitive 'cmap' extension is
> experimental and may be subject to future changes.
> File 'pav-patt.pov' line 61: Parse Error: Expected 'numeric expression', utf8
> found instead
> Fatal error in parser: Cannot parse input.
> Render failed
>
>
> same for 'ascii'.
Yes, change of plan, sorry. Specify `charset FLOAT` here, with `FLOAT`
being one of the following values:
0 No remapping (effectively UCS4)
1200 UCS2 character set (16-bit subset of UCS, aka BMP)
1251 Windows-1251 character set (aka "ANSI Cyrillic")
1252 Windows-1252 character set (aka "ANSI Latin")
10000 Mac OS Roman
12000 UCS4 character set
28591 ISO-8859-1 character set (aka Latin-1)
-1 Special remapping for legacy Microsoft symbol fonts
Note that these are character sets (collections of characters with an
associated mapping to integral values, aka code points), _not_ character
encoding schemes (character set with an associated scheme for storing
character sequences as byte streams). So with UTF-8 being an encoding
scheme, there's no dedicated value for it - use the value for UCS4
instead, which is the character set used in UTF-8.
There is no speicifc value for ASCII, but any of the above values except
-1 will do, as they're all supersets of ASCII.
We could also probably do without values 1200 (UCS2 being a subset of
UCS4) and 28591 (ISO-8895-1 being a subset of both UCS2 and
Windows-1252), but I happen to have implemented them anyway.
I concede that the numeric values aren't easy to memorize, but this
could be solved by supplying an include file that defines some common
macros for the entire CMAP block, and/or variables (or maybe even a
dictionary with string keys) for the charset numeric values.
Also, as the first warning message already mentions, stay tuned for
future changes to this feature. I'm still not happy with it - ideas for
improvement continue to be highly welcome - and integration of the
FreeType library may also necessitate modifications.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
hi,
clipka <ano### [at] anonymousorg> wrote:
> Am 12.01.2019 um 19:09 schrieb jr:
>
> >> To simplify conversion of old scenes, the text primitive syntax will be
> >> extended with a syntax allowing for more control over the lookup process:
> >>
> >> #declare MyText = "a\u20ACb";
> >> text { ttf "sym.ttf" cmap { 3,0 charset windows1252 } MyText }
> >
> > with alpha.10008988, and same code as in other thread modified to read:
> >
> > #version 3.8;
> > ......
> > text { ttf "arialbd.ttf" cmap { 1,0 charset utf8 S }
> > ......
> >
> > I get the following error:
> >
> > File 'pav-patt.pov' line 61: Parse Warning: Text primitive 'cmap' extension is
> > experimental and may be subject to future changes.
> > File 'pav-patt.pov' line 61: Parse Error: Expected 'numeric expression', utf8
> > found instead
> > Fatal error in parser: Cannot parse input.
> > Render failed
> >
> >
> > same for 'ascii'.
>
> Yes, change of plan, sorry. Specify `charset FLOAT` here, with `FLOAT`
> being one of the following values:
>
> 0 No remapping (effectively UCS4)
> 1200 UCS2 character set (16-bit subset of UCS, aka BMP)
> 1251 Windows-1251 character set (aka "ANSI Cyrillic")
> 1252 Windows-1252 character set (aka "ANSI Latin")
> 10000 Mac OS Roman
> 12000 UCS4 character set
> 28591 ISO-8859-1 character set (aka Latin-1)
> -1 Special remapping for legacy Microsoft symbol fonts
>
> Note that these are character sets (collections of characters with an
> associated mapping to integral values, aka code points), _not_ character
> encoding schemes (character set with an associated scheme for storing
> character sequences as byte streams). So with UTF-8 being an encoding
> scheme, there's no dedicated value for it - use the value for UCS4
> instead, which is the character set used in UTF-8.
>
> There is no speicifc value for ASCII, but any of the above values except
> -1 will do, as they're all supersets of ASCII.
>
> We could also probably do without values 1200 (UCS2 being a subset of
> UCS4) and 28591 (ISO-8895-1 being a subset of both UCS2 and
> Windows-1252), but I happen to have implemented them anyway.
>
>
> I concede that the numeric values aren't easy to memorize, but this
> could be solved by supplying an include file that defines some common
> macros for the entire CMAP block, and/or variables (or maybe even a
> dictionary with string keys) for the charset numeric values.
>
>
> Also, as the first warning message already mentions, stay tuned for
> future changes to this feature. I'm still not happy with it - ideas for
> improvement continue to be highly welcome - and integration of the
> FreeType library may also necessitate modifications.
I think a dictionary (provided in 'charsets.inc?') with keys like 'utf8' and
'ascii' etc sounds ok.
regards, jr.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
hi,
clipka <ano### [at] anonymousorg> wrote:
> >> text { ttf "sym.ttf" cmap { 3,0 charset windows1252 } MyText }
>
> 0 No remapping (effectively UCS4)
> 1200 UCS2 character set (16-bit subset of UCS, aka BMP)
> 1251 Windows-1251 character set (aka "ANSI Cyrillic")
> 1252 Windows-1252 character set (aka "ANSI Latin")
> 10000 Mac OS Roman
> 12000 UCS4 character set
> 28591 ISO-8859-1 character set (aka Latin-1)
> -1 Special remapping for legacy Microsoft symbol fonts
>
can you confirm that I'm using the correct syntax? because the new alpha gives
me the same error.
Script started on Sun 13 Jan 2019 12:05:40 GMT
jr@crow:1:pave$ c### [at] pav-pattpov
// Hintergrund
#version 3.8;
global_settings {assumed_gamma 1}
...
text { ttf "arialbd.ttf" cmap { 1,0 charset 0 } S }
...
jr@crow:2:pave$ pov38 +a0.1 +ipa### [at] tpov
Persistence of Vision(tm) Ray Tracer Version 3.8.0-alpha.10011104.unofficial
(g++ -std=gnu++11 4.8.2 @ x86_64-slackware-linux-gnu)
...
==== [Parsing...] ==========================================================
File 'pav-patt.pov' line 61: Parse Warning: Text primitive 'cmap' extension is
experimental and may be subject to future changes.
File 'pav-patt.pov' line 61: Parse Error: Expected 'numeric expression', } found
instead
Fatal error in parser: Cannot parse input.
Render failed
regards, jr.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Am 13.01.2019 um 13:16 schrieb jr:
> can you confirm that I'm using the correct syntax? because the new alpha gives
> me the same error.
To be precise, it gives you the same error /message/.
It's not my usual style, but for the sake of maximum user experience
I'll say no more, except that no nits were picked in the making of this
post ;)
(Took me a while, too.)
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
hi,
clipka <ano### [at] anonymousorg> wrote:
> Am 13.01.2019 um 13:16 schrieb jr:
> > can you confirm that I'm using the correct syntax? because the new alpha gives
> > me the same error.
>
> To be precise, it gives you the same error /message/.
syntax correct, then. on to the next alpha.. :-)
> It's not my usual style, but for the sake of maximum user experience
> I'll say no more, except that no nits were picked in the making of this
> post ;)
>
> (Took me a while, too.)
(I blame the binge-watching. :-))
regards, jr.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|