POV-Ray: Newsgroups: povray.off-topic: Unicode: Re: Unicode

POV-Ray : Newsgroups : povray.off-topic : Unicode : Re: Unicode		Server Time 15 Jul 2025 16:50:08 EDT (-0400)

From: Warp
Date: 31 Jan 2010 19:13:34
Message: <4b661cae@news.povray.org>

Fredrik Eriksson <fe79}--at--{yahoo}--dot--{com> wrote:
> When dealing with just one character at a time, the application receives  
> UTF-16 code points and must identify and deal with surrogates if needed.  

  Hmm, I'm not exactly sure what you mean by "UTF-16 code point".

  According to the unicode.org glossary, a "code point" is a value in the
Unicode codespace, ie. a value in the range between 0 and 10FFFF. (Or what
I often refer to as "raw Unicode value".)

  UTF-16 is a translation format between Unicode code points and bytes.
In other words, the raw unicode value is taken and encoded into a series
of bytes (2 or 4 of them, depending on the value) using a certain algorithm
(this encoding algorithm is designed to avoid any code point which has a
value larger than 127 into producing a byte with a value smaller than that).
Decoding from UTF-16 back to a Unicode code point is the reverse operation.

  Thus I'm not exactly sure what you mean by "UTF-16 code point", as it
seems to be mixing the two things into one concept.

  Anyways, if the Unicode-aware program requests a Unicode character from
the system, and the system returns it UTF-16-encoded, I suppose that means
that the program must decode it to a Unicode code point before it can use
it (unless it specifically handles UTF-16 strings directrly, of course).

  Hmm, that sounds like a hindrance. Couldn't the system return raw Unicode
code points directly?

> For an application that does not need to deal with "exotic" alphabets  
> (e.g. Chinese), one can typically get away just fine with treating the  
> UTF-16 code points as if they were UCS-2.

  OTOH, if a program is Unicode-aware, it should really be prepared to
handle any characters in the entire Unicode codespace.

-- 
                                                          - Warp

Post a reply to this message