POV-Ray : Newsgroups : povray.windows : Double-byte font trouble Server Time
28 Jul 2024 16:30:56 EDT (-0400)
  Double-byte font trouble (Message 11 to 12 of 12)  
<<< Previous 10 Messages Goto Initial 10 Messages
From: Ron Parker
Subject: Re: Double-byte font trouble
Date: 25 Jan 1999 10:13:26
Message: <36ac8a16.0@news.povray.org>
On Sat, 23 Jan 1999 01:15:08 -0800, Jon A. Cruz <jon### [at] geocitiescom> wrote:
>"Ronald L. Parker" wrote:
>My suggestion is to go ahead and drop MBCS and switch to UTF-8 for Unicode.
>That allows specifying Unicode chars, but ASCII chars 0-0x7f stay the same.
>
>That way any standard POV files will work untouched, and only files edited
>for a MBCS patched version would need converting. That can also eleminate a
>need for a flag to signal Unicode mode.

For the record, I really like this idea and will probably switch to 
UTF-8 in the next Superpatch (whenever THAT is...)  I might also add 
a switch to allow you to use plain ol' UCS-2, just to make it easier 
to do a character or two from a Unicode font using the NT Character 
Map application or the charts at unicode.org.  

There'll be a problem for those of you who regularly use the 8-bit 
characters in ISO Latin-1, though.  If anyone who regularly uses 
8-bit characters has an idea how this can be implemented such that 
it's backward-compatible with the official version of POV, please 
speak up.  I think I can detect that a given string of characters 
is not valid UTF-8 and fall back to the 8-bit CMAP table, but 
this doesn't work in 100% of cases.  Some perfectly valid but 
unlikely strings of 8-bit characters could map to weird glyphs.

You would need three high-bit characters in a row, the first one 
has to be one of the sixteen characters that has a high nybble of 

high nybble of $8 through $B. (mostly symbols of one form or 
another, unlikely to be inside or at the end of a word, particularly
in combination, though there are conceivable exceptions.)  Some of 
the combinations thus formed might even be valid in the font you're 
using (particularly if you're using a Unicode font on NT, most of 
which include Arabic and Hebrew script.)


Post a reply to this message

From: Ron Parker
Subject: Re: Double-byte font trouble
Date: 25 Jan 1999 12:35:01
Message: <36acab45.0@news.povray.org>
On 25 Jan 1999 10:13:26 -0500, Ron Parker <par### [at] my-dejanewscom> wrote:
>You would need three high-bit characters in a row, the first one 
>has to be one of the sixteen characters that has a high nybble of 

>high nybble of $8 through $B. (mostly symbols of one form or 
>another, unlikely to be inside or at the end of a word, particularly
>in combination, though there are conceivable exceptions.)  Some of 
>the combinations thus formed might even be valid in the font you're 
>using (particularly if you're using a Unicode font on NT, most of 
>which include Arabic and Hebrew script.)

Erg.  It's worse than I thought.  You can also have sequences of
two high-bit characters, where the first has a high nybble of
0xC or 0xD and the second has a high nybble of 0x8 through 0xB.
These combinations encode characters in the range 0x80 through
0x7ff, which includes the entire set of high-bit characters.  
So if you use a character from the range 0xC0 through 0xC3 


trademarks of e.g. Spanish words!), you're guaranteed to have a 
valid UTF-8 representation for a character that's probably 
represented in your font.  

Maybe I need to add a "charset" keyword with allowed values of 
US-ASCII, ISO-8859-1, UTF-8, and ISO-10646-UCS-2 with a default 
of ISO-8859-1.  (All of these are IANA names for the various
charsets, so don't blame me for the huge name UCS-2 has. :) )

Any thoughts? Anyone? Does anyone but me and Jon care anymore?


Post a reply to this message

<<< Previous 10 Messages Goto Initial 10 Messages

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.