POV-Ray: Newsgroups: povray.advanced-users: Non ascii characters: Re: Non ascii characters

POV-Ray : Newsgroups : povray.advanced-users : Non ascii characters : Re: Non ascii characters		Server Time 30 Mar 2026 06:04:37 EDT (-0400)

From: Joel Yliluoma
Date: 5 Dec 2005 07:48:52
Message: <slrndp8dpk.joc.bisqwit@bisqwit.iki.fi>

On Sat, 26 Nov 2005 14:01:29 -0700, Patrick Elliott wrote:
> How can it be, given that it doesn't allow access to all unicode 
> characters? It by definition can't, since true unicode requires a 
> 'section' code, followed by a 'character' code, of *always* two bytes. 

You must be confusing something.

Unicode is a character-set with integer range 00000-1FFFF mapping to
different characters.
You can use http://bisqwit.iki.fi/japtools/unicodemap.php to browse it,
for example.

For example, the character U+05E1 always means the hebrew letter samekh,
and nothing else. There is no "section code" (whatever that means).

In UTF-8, the unicode characters are encoded in varying number of bytes.

UTF-8 encoding:

bytes	bits	representation
1	7	0bbbbbbb
2	11	110bbbbb 10bbbbbb
3	16	1110bbbb 10bbbbbb 10bbbbbb
4	21	11110bbb 10bbbbbb 10bbbbbb 10bbbbbb

Each b represents a bit that can be used to store character data.

So, the character U+05E1, which is in 010111100001 in binary,
is encoded as 11010111 10100001, that is, E7 A1.

The character U+0041, that is the latin capital letter "A",
is encoded as 01000001, that is, 41, which, not coincidentally,
is exactly the same as "A" in ASCII.

-- 
Joel Yliluoma - http://bisqwit.iki.fi/
: comprehension = 1 / (2 ^ precision)
: Try to be as precise as can be and no one will comprehend what you mean.
: Say nothing, and everybody will understand.

Post a reply to this message