|
![](/i/fill.gif) |
On Mon, 13 Sep 1999 01:04:10 -0700, Jon A. Cruz wrote:
>There was some talk of wanting to at least allow different languages in the
>comments. The main problem is that the text display is fairly isolated, whereas
>the parsing is more of all over.
The part you'd want to change, though, is mostly a smallish case statement
in tokenize.c. At the moment, it treats a-z, A-Z, and _ as the beginning of
a symbol. You'd want to make it treat anything with the high bit set as a
symbol as well. You'd also need to modify the Read_Symbol code to recognize
and parse correctly characters with the high bit set. This would allow
high-bit characters to be used inside of declared symbols, which includes
macro names and arguments. There'd probably also be a few small modifications
needed to some error-reporting code, unless you're comfortable with sending
UTF-8 to the error stream in the event of an undefined symbol or the like.
Believe it or not, the tokenizer can already deal with high-bit UTF-8 characters
inside of comments and literal strings. The built-in editor in the Windows
version can't, of course, but I'd expect someone using UTF-8 to use something
more suited to Unicode, such as Unipad.
Parsing UCS-2 or UCS-4 would be a lot more difficult, of course.
Post a reply to this message
|
![](/i/fill.gif) |