POV-Ray : Newsgroups : povray.off-topic : Haskell raving : Re: Haskell raving Server Time
11 Oct 2024 13:17:50 EDT (-0400)
  Re: Haskell raving  
From: Joel Yliluoma
Date: 19 Nov 2007 05:31:53
Message: <slrnfk2pgp.529.bisqwit@bisqwit.iki.fi>
On Thu, 01 Nov 2007 21:13:15 +0000, Orchid XP v7 wrote:
> I was under the impression that these encodings apply to *strings*, not 
> individual characters by themselves...

The encoding applies to individual characters, and
from those characters is the string composed.



  code point character   utf8 encoding
   U+006B     k            6B


   U+0070     p            70
   U+0069     i            69


So the UTF-8 encoding of the string becomes 9 bytes long in total.

Similarly, the Czech word for "cat" would be encoded like this:

  code point character   utf8 encoding
   U+006B     k            6B
   U+006F     o            6F
   U+010D     ?            C4 8D
   U+006B     k            6B
   U+0061     a            61

(Note: I'm posting in iso-8859-1, which cannot express
the third character in the word: a "c" with a hacek,
hence substituting with "?".)

And the Japanese word for Japan would be:

  code point character   utf8 encoding
   U+65E5     ?            E6 97 A5
   U+672C     ?            E6 9C AC 

The encoding (UTF-8) has a few clever attained design goals:
- Backwards compatibility with ASCII
- Asciibetical sorting still works the same way
- Forward and backward seeking in the string possible without desynchronization
- Minimal space wasted
- Possibility to extend naturally if the unicode set grows

-- 
Joel Yliluoma - http://iki.fi/bisqwit/


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.