POV-Ray: Newsgroups: povray.beta-test: v3.8 character set handling: Re: v3.8 character set handling

POV-Ray : Newsgroups : povray.beta-test : v3.8 character set handling : Re: v3.8 character set handling		Server Time 12 Jul 2025 14:33:03 EDT (-0400)

From: clipka
Date: 4 Jan 2019 21:04:41
Message: <5c3010b9@news.povray.org>

Am 04.01.2019 um 19:18 schrieb Alain:

> Will it be possible to directly use UTF-8 characters ?
> After all, if you can directly enter characters like à é è ô ç (direct 
> access) or easily like €(altchar+e) ñ(altchar+ç,n) from your keyboard as 
> I just did, you should be able to use them instead of the cumbersome codes.

Short answer: The `\uXXXX` notation won't be necessary. I just used it 
to avoid non-ASCII characters in my post.


Looooong answer:


It depends on what you're taling about.

First, let's get an elephant - or should I say mammoth - out of the 
room: The editor component of the Windows GUI. It's old and crappy, and 
doesn't support UTF-8 at all. It does support Windows-1252 though (at 
least on my system; I guess it may depend on what locale you have 
configured in Windows), which has all the characters you mentioned.


Now if you are using a different editor, using verbatim "UTF-8 
characters" should be no problem: Enter the characters, save the file as 
UTF-8, done.

The characters will be encoded directly as UTF-8, and the parser will 
work with them just fine (provided you're only using them in string 
literals or comments); no need for `\uXXXX` notation.


Alternatively, you could enter the same characters in the same editor, 
and save the file as "Windows-1252" (or maybe called "ANSI" or 
"Latin-1"), or enter them in POV-Ray for Windows and just save the file 
without specifying a particular encoding (because you can't).

In that case the characters will be encoded as Windows-1252, and in most 
cases the parser will also work with them just fine (again, string 
literals or comments only); again no need for `\uXXXX` notation.

What the parser will do in such a case is first convert the 
Windows-1252-enoded characters to Unicode, and then proceed in just the 
same way.


For example:

     #declare MyText = "a€b"; // a Euro sign between `a` and `b`

will create a string containing `a` (U+0061) followed by a Euro sign 
(U+20AC) followed by `b` (U+0062), no matter whether the file uses UTF-8 
encoding or Windows-1252 encoding. In both cases, the parser will 
interpret the thing between `a` and `b` as U+20AC, even though in a 
UTF-8 encided file that thing is represented by the byte sequence hex 
E2,82,AC while in a Windows-1252 encoded file it is represented by the 
single byte hex 80.

Post a reply to this message