POV-Ray: Newsgroups: povray.beta-test: v3.8 character set handling: Re: v3.8 character set handling

POV-Ray : Newsgroups : povray.beta-test : v3.8 character set handling : Re: v3.8 character set handling		Server Time 3 Jul 2025 23:10:31 EDT (-0400)

From: Alain
Date: 6 Jan 2019 12:08:35
Message: <5c323613$1@news.povray.org>

Le 19-01-04 à 21:04, clipka a écrit :
> Am 04.01.2019 um 19:18 schrieb Alain:
> 
>> Will it be possible to directly use UTF-8 characters ?
>> After all, if you can directly enter characters like à é è ô ç (direct 
>> access) or easily like €(altchar+e) ñ(altchar+ç,n) from your keyboard 
>> as I just did, you should be able to use them instead of the 
>> cumbersome codes.
> 
> Short answer: The `\uXXXX` notation won't be necessary. I just used it 
> to avoid non-ASCII characters in my post.
> 
> 
> Looooong answer:
> 
> 
> It depends on what you're taling about.
> 
> First, let's get an elephant - or should I say mammoth - out of the 
> room: The editor component of the Windows GUI. It's old and crappy, and 
> doesn't support UTF-8 at all. It does support Windows-1252 though (at 
> least on my system; I guess it may depend on what locale you have 
> configured in Windows), which has all the characters you mentioned.
> 
> 
> Now if you are using a different editor, using verbatim "UTF-8 
> characters" should be no problem: Enter the characters, save the file as 
> UTF-8, done.
> 
> The characters will be encoded directly as UTF-8, and the parser will 
> work with them just fine (provided you're only using them in string 
> literals or comments); no need for `\uXXXX` notation.
> 
> 
> Alternatively, you could enter the same characters in the same editor, 
> and save the file as "Windows-1252" (or maybe called "ANSI" or 
> "Latin-1"), or enter them in POV-Ray for Windows and just save the file 
> without specifying a particular encoding (because you can't).
> 
> In that case the characters will be encoded as Windows-1252, and in most 
> cases the parser will also work with them just fine (again, string 
> literals or comments only); again no need for `\uXXXX` notation.
> 
> What the parser will do in such a case is first convert the 
> Windows-1252-enoded characters to Unicode, and then proceed in just the 
> same way.
> 
> 
> For example:
> 
>      #declare MyText = "a€b"; // a Euro sign between `a` and `b`
> 
> will create a string containing `a` (U+0061) followed by a Euro sign 
> (U+20AC) followed by `b` (U+0062), no matter whether the file uses UTF-8 
> encoding or Windows-1252 encoding. In both cases, the parser will 
> interpret the thing between `a` and `b` as U+20AC, even though in a 
> UTF-8 encided file that thing is represented by the byte sequence hex 
> E2,82,AC while in a Windows-1252 encoded file it is represented by the 
> single byte hex 80.

Nice.

Post a reply to this message