POV-Ray: Newsgroups: povray.general: Unicode: Re: Unicode

POV-Ray : Newsgroups : povray.general : Unicode : Re: Unicode		Server Time 1 Jul 2025 11:42:31 EDT (-0400)

From: Ron Parker
Date: 13 Sep 1999 09:52:37
Message: <slrn7tq0d6.v8.parkerr@ron.gwmicro.com>

On Mon, 13 Sep 1999 01:04:10 -0700, Jon A. Cruz wrote:
>There was some talk of wanting to at least allow different languages in the
>comments. The main problem is that the text display is fairly isolated, whereas
>the parsing is more of all over. 

The part you'd want to change, though, is mostly a smallish case statement
in tokenize.c.  At the moment, it treats a-z, A-Z, and _ as the beginning of
a symbol.  You'd want to make it treat anything with the high bit set as a 
symbol as well.  You'd also need to modify the Read_Symbol code to recognize
and parse correctly characters with the high bit set.  This would allow 
high-bit characters to be used inside of declared symbols, which includes 
macro names and arguments.  There'd probably also be a few small modifications
needed to some error-reporting code, unless you're comfortable with sending 
UTF-8 to the error stream in the event of an undefined symbol or the like.

Believe it or not, the tokenizer can already deal with high-bit UTF-8 characters
inside of comments and literal strings.  The built-in editor in the Windows 
version can't, of course, but I'd expect someone using UTF-8 to use something
more suited to Unicode, such as Unipad.

Parsing UCS-2 or UCS-4 would be a lot more difficult, of course.

Post a reply to this message