|
|
Am 04.01.2019 um 19:18 schrieb Alain:
> Will it be possible to directly use UTF-8 characters ?
> After all, if you can directly enter characters like à é è ô ç (direct
> access) or easily like €(altchar+e) ñ(altchar+ç,n) from your keyboard as
> I just did, you should be able to use them instead of the cumbersome codes.
Short answer: The `\uXXXX` notation won't be necessary. I just used it
to avoid non-ASCII characters in my post.
Looooong answer:
It depends on what you're taling about.
First, let's get an elephant - or should I say mammoth - out of the
room: The editor component of the Windows GUI. It's old and crappy, and
doesn't support UTF-8 at all. It does support Windows-1252 though (at
least on my system; I guess it may depend on what locale you have
configured in Windows), which has all the characters you mentioned.
Now if you are using a different editor, using verbatim "UTF-8
characters" should be no problem: Enter the characters, save the file as
UTF-8, done.
The characters will be encoded directly as UTF-8, and the parser will
work with them just fine (provided you're only using them in string
literals or comments); no need for `\uXXXX` notation.
Alternatively, you could enter the same characters in the same editor,
and save the file as "Windows-1252" (or maybe called "ANSI" or
"Latin-1"), or enter them in POV-Ray for Windows and just save the file
without specifying a particular encoding (because you can't).
In that case the characters will be encoded as Windows-1252, and in most
cases the parser will also work with them just fine (again, string
literals or comments only); again no need for `\uXXXX` notation.
What the parser will do in such a case is first convert the
Windows-1252-enoded characters to Unicode, and then proceed in just the
same way.
For example:
#declare MyText = "a€b"; // a Euro sign between `a` and `b`
will create a string containing `a` (U+0061) followed by a Euro sign
(U+20AC) followed by `b` (U+0062), no matter whether the file uses UTF-8
encoding or Windows-1252 encoding. In both cases, the parser will
interpret the thing between `a` and `b` as U+20AC, even though in a
UTF-8 encided file that thing is represented by the byte sequence hex
E2,82,AC while in a Windows-1252 encoded file it is represented by the
single byte hex 80.
Post a reply to this message
|
|