POV-Ray: Newsgroups: povray.off-topic: Unicode

POV-Ray : Newsgroups : povray.off-topic : Unicode		Server Time 13 Jul 2025 06:34:11 EDT (-0400)

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>

From: Warp
Subject: Re: Unicode
Date: 31 Jan 2010 17:10:45
Message: <4b65ffe4@news.povray.org>

Fredrik Eriksson <fe79}--at--{yahoo}--dot--{com> wrote:
> >  Alt+<+>+<xxxx>, where xxxx is the hexadecimal Unicode code point,  
> > generates a Unicode-encoded (UTF-16) character."

  If Windows takes the raw Unicode value (as entered by the user) and
encodes it in UTF-16 before passing it to the application, that would
mean that the application must understand UTF-16 as its input from (what
it perceives as) the keyboard.

  Which programs actually support that? (And why UTF-16, of all possible
Unicode encoding formats?)

  (An alternative to this is that the person who wrote that sentence has
no idea what he's talking about, unless I'm understanding incorrectly the
whole process involved.)

-- 
                                                          - Warp

Post a reply to this message

From: Fredrik Eriksson
Subject: Re: Unicode
Date: 31 Jan 2010 17:54:54
Message: <op.u7e5xqel7bxctx@toad.bredbandsbolaget.se>

On Sun, 31 Jan 2010 23:10:45 +0100, Warp <war### [at] tagpovrayorg> wrote:
>
>   If Windows takes the raw Unicode value (as entered by the user) and
> encodes it in UTF-16 before passing it to the application, that would
> mean that the application must understand UTF-16 as its input from (what
> it perceives as) the keyboard.

UTF-16 is the internal system encoding in all NT-based versions of  
Windows. It is also the encoding in which programs receive incoming  
character codes.

>   Which programs actually support that?

I think most Windows programs are capable of handling Unicode these days.  
Those that do not will either ignore non-ASCII characters or display them  
wrong (typically as boxes or question marks).

-- 
FE

Post a reply to this message

From: Warp
Subject: Re: Unicode
Date: 31 Jan 2010 18:21:33
Message: <4b66107d@news.povray.org>

Fredrik Eriksson <fe79}--at--{yahoo}--dot--{com> wrote:
> On Sun, 31 Jan 2010 23:10:45 +0100, Warp <war### [at] tagpovrayorg> wrote:
> >
> >   If Windows takes the raw Unicode value (as entered by the user) and
> > encodes it in UTF-16 before passing it to the application, that would
> > mean that the application must understand UTF-16 as its input from (what
> > it perceives as) the keyboard.

> UTF-16 is the internal system encoding in all NT-based versions of  
> Windows. It is also the encoding in which programs receive incoming  
> character codes.

  I don't understand how that can work. Do programs convert the UTF-16
encoded characters back to raw Unicode values before using them? (Does
this perhaps happen inside the system call which is used to read the
character?)

  UTF-encoded text (regardless of which UTF encoding scheme is used)
cannot easily be used directly in programs because characters cannot
be easily indexed (because UTF is a variable-length encoding scheme;
in the case of UTF-16 each character is either 2 or 4 bytes long,
depending on the Unicode value.) I have hard time believing that programs
are internally handling the text in UTF-16 format, so I'm assuming that
a decoding to raw Unicode is done first.

> >   Which programs actually support that?

> I think most Windows programs are capable of handling Unicode these days.  

  But how do they handle the characters? Do they ask the system for a
(Unicode) character and get a raw Unicode wide char?

-- 
                                                          - Warp

Post a reply to this message

From: Fredrik Eriksson
Subject: Re: Unicode
Date: 31 Jan 2010 18:49:36
Message: <op.u7e8gwys7bxctx@toad.bredbandsbolaget.se>

On Mon, 01 Feb 2010 00:21:33 +0100, Warp <war### [at] tagpovrayorg> wrote:
> Fredrik Eriksson <fe79}--at--{yahoo}--dot--{com> wrote:
>> UTF-16 is the internal system encoding in all NT-based versions of
>> Windows. It is also the encoding in which programs receive incoming
>> character codes.
>
>   I don't understand how that can work. Do programs convert the UTF-16
> encoded characters back to raw Unicode values before using them? (Does
> this perhaps happen inside the system call which is used to read the
> character?)

>> I think most Windows programs are capable of handling Unicode these  
>> days.
>
>   But how do they handle the characters? Do they ask the system for a
> (Unicode) character and get a raw Unicode wide char?

When dealing with just one character at a time, the application receives  
UTF-16 code points and must identify and deal with surrogates if needed.  
For display purposes, one can offload some of the work on the Uniscribe  
API. Mostly though, strings are passed around as a whole (i.e. as lists of  
code points), in which case it "just works"; all standard widgets and  
API-functions fully support supplementary characters.

For an application that does not need to deal with "exotic" alphabets  
(e.g. Chinese), one can typically get away just fine with treating the  
UTF-16 code points as if they were UCS-2.

-- 
FE

Post a reply to this message

From: Warp
Subject: Re: Unicode
Date: 31 Jan 2010 19:13:34
Message: <4b661cae@news.povray.org>

Fredrik Eriksson <fe79}--at--{yahoo}--dot--{com> wrote:
> When dealing with just one character at a time, the application receives  
> UTF-16 code points and must identify and deal with surrogates if needed.  

  Hmm, I'm not exactly sure what you mean by "UTF-16 code point".

  According to the unicode.org glossary, a "code point" is a value in the
Unicode codespace, ie. a value in the range between 0 and 10FFFF. (Or what
I often refer to as "raw Unicode value".)

  UTF-16 is a translation format between Unicode code points and bytes.
In other words, the raw unicode value is taken and encoded into a series
of bytes (2 or 4 of them, depending on the value) using a certain algorithm
(this encoding algorithm is designed to avoid any code point which has a
value larger than 127 into producing a byte with a value smaller than that).
Decoding from UTF-16 back to a Unicode code point is the reverse operation.

  Thus I'm not exactly sure what you mean by "UTF-16 code point", as it
seems to be mixing the two things into one concept.

  Anyways, if the Unicode-aware program requests a Unicode character from
the system, and the system returns it UTF-16-encoded, I suppose that means
that the program must decode it to a Unicode code point before it can use
it (unless it specifically handles UTF-16 strings directrly, of course).

  Hmm, that sounds like a hindrance. Couldn't the system return raw Unicode
code points directly?

> For an application that does not need to deal with "exotic" alphabets  
> (e.g. Chinese), one can typically get away just fine with treating the  
> UTF-16 code points as if they were UCS-2.

  OTOH, if a program is Unicode-aware, it should really be prepared to
handle any characters in the entire Unicode codespace.

-- 
                                                          - Warp

Post a reply to this message

From: Fredrik Eriksson
Subject: Re: Unicode
Date: 31 Jan 2010 19:28:45
Message: <op.u7e995gr7bxctx@toad.bredbandsbolaget.se>

On Mon, 01 Feb 2010 01:13:34 +0100, Warp <war### [at] tagpovrayorg> wrote:
>
>   Hmm, I'm not exactly sure what you mean by "UTF-16 code point".

I mean a single 16-bit value from a UTF-16 string.

>   Thus I'm not exactly sure what you mean by "UTF-16 code point", as it
> seems to be mixing the two things into one concept.

Yeah, I should have used "code unit" instead.



>   Anyways, if the Unicode-aware program requests a Unicode character from
> the system, and the system returns it UTF-16-encoded, I suppose that  
> means that the program must decode it to a Unicode code point before it
> can use it (unless it specifically handles UTF-16 strings directrly, of
> course).

Yes, unless...

>   Hmm, that sounds like a hindrance. Couldn't the system return raw  
> Unicode code points directly?

Starting with Windows XP, it can.



-- 
FE

Post a reply to this message

From: TC
Subject: Re: Unicode
Date: 31 Jan 2010 21:10:32
Message: <4b663818$1@news.povray.org>

Hope the following helps:

Internally .net stores characters as unicode (UTF16). There is an "encoding" 
class that helps you to deal with encodings. Windows .net API provides you 
with the option to use many different character encodings.

Here some excerpts from the MS online help:

Dim fileReader As String
fileReader = My.Computer.FileSystem.ReadAllText("C:\test.txt", 
System.Text.Encoding.UTF32)
MsgBox(fileReader)

Writing goes something like this:

My.Computer.FileSystem.WriteAllText(file ,text ,append ,encoding)

There are a lot of different encodings supported by .net:

      ASCIIEncoding
     Represents an ASCII character encoding of Unicode characters.

      UnicodeEncoding
     Represents a UTF-16 encoding of Unicode characters.

      UTF32Encoding
     Represents a UTF-32 encoding of Unicode characters.

      UTF7Encoding
     Represents a UTF-7 encoding of Unicode characters.

      UTF8Encoding
     Represents a UTF-8 encoding of Unicode characters.



Chr uses the Encoding class in the System.Text namespace to determine if the 
current thread is using a single-byte character set (SBCS) or a double-byte 
character set (DBCS). It then takes CharCode as a code point in the 
appropriate set. The range can be 0 through 255 for SBCS characters 
and -32768 through 65535 for DBCS characters.

The returned value depends on the code page for the current thread, which is 
contained in the ANSICodePage property of the TextInfo class in the 
System.Globalization namespace. You can obtain ANSICodePage by specifying 
System.Globalization.CultureInfo.CurrentCulture.TextInfo.ANSICodePage.

ChrW takes CharCode as a Unicode code point. The range is independent of the 
culture and code page settings for the current thread. Values from -32768 
through -1 are treated the same as values in the range +32768 through 
+65535.

Numbers from 0 through 31 are the same as standard nonprintable ASCII codes. 
For example, Chr(10) returns a line feed character.

Post a reply to this message

From: scott
Subject: Re: Unicode
Date: 1 Feb 2010 02:38:03
Message: <4b6684db$1@news.povray.org>

> OK, so here's a question: How do you actually type in Unicode characters 
> that aren't on your keyboard?

Hold down left alt and press some numbers on the keypad.  THe only one I can 
remember is the one for the plus/minus symbol (Alt+0177) as I use it a lot 
and it's not on my keyboard.

> It's nice that Unicode exists, and a tiny fraction of software in 
> existence even supports it,

Funny I found that most software does support it, otherwise I would be 
continually screwed in my job not being able to open documents or understand 
email from Japanese people.  Here (Vista Business 64bit) I can even create a 
file/folder in explorer with Japanese characters, then zip it with WinZip 
and everything works fine.

> and approximately 3 fonts in the world have the appropriate glyphs in 
> them.

I know when I get Japanese documents, they normally use a special font like 
"MS Mincho", I suspect it would be a bit of a wasted effort to make every 
font contain every single unicode character.  There also seems to be fonts 
like "Arial Unicode MS" that maybe contain all (most?) of them (it at least 
contains all the Japanese characters).

Post a reply to this message

From: Tim Attwood
Subject: Re: Unicode
Date: 1 Feb 2010 07:02:01
Message: <4b66c2b9$1@news.povray.org>

In Word you can use some keyboard shortcuts...

(ctrl ~)n
(ctrl :)u
(ctrl ^)a

they follow that pattern, so it shouldn't be too
hard to remember, but it's a three key press,
ctrl, shift, whatever.

Post a reply to this message

From: scott
Subject: Re: Unicode
Date: 1 Feb 2010 07:35:18
Message: <4b66ca86$1@news.povray.org>

> In Word you can use some keyboard shortcuts...
>
> (ctrl ~)n
> (ctrl :)u
> (ctrl ^)a

On my German keyboard pressing ^ by itself does that, annoying if you're 
trying to type "^error" or something (often in C++ .net) because it comes 

to go above other letters.

Post a reply to this message

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>