POV-Ray : Newsgroups : povray.off-topic : Haskell raving Server Time
11 Oct 2024 19:15:28 EDT (-0400)
  Haskell raving (Message 43 to 52 of 92)  
<<< Previous 10 Messages Goto Latest 10 Messages Next 10 Messages >>>
From: Orchid XP v7
Subject: Re: Haskell raving
Date: 1 Nov 2007 14:03:10
Message: <472a22ee@news.povray.org>
Warp wrote:
> Alain <ele### [at] netscapenet> wrote:
>> It's about 12 BITS per characters, on average, not BYTES! That's useing UTF8 
>> encoding. About 16 BITS per characters if using UTF16 encoding.
>> UTF8 is only 7 BITS per characters if you stick to only standard ASCII 
>> characters set, but it gets bigger if you also use extended ASCII or characters 
>> from foreign alphabets.
> 
>   Except that if each single character is indeed garbage-collected, that
> requires quite a lot of memory per character (compared to the size of the
> character).

I can't actually find documentation to hand to clarify whether it's 12 
bits or 12 bytes per character. (My strong suspecion is that it *is* 12 
*bytes* - since, after all, a single Unicode code point is 24 bits already.)

The situation is actually worse than it looks. All this lazy evaluation 
magic is implemented by storing program state around the place, so an 
"unevaluated" string probably takes up more space still...


Post a reply to this message

From: Orchid XP v7
Subject: Re: Haskell raving
Date: 1 Nov 2007 14:04:23
Message: <472a2337@news.povray.org>
Warp wrote:

>   But hey, that's the fad in modern programming: Memory usage and speed
> are irrelevant. Processors are getting faster and memory amounts are
> doubling every couple of years, so who cares if something like a string
> takes 12 times as much memory as it could?

Well, no. There are people working in the Haskell community to fix this 
and related problems. (Can't comment on other programming languages...)

The nice thing about Haskell is that is *allows* you to come up with a 
better implementation and slip it in there later.


Post a reply to this message

From: Orchid XP v7
Subject: Re: Haskell raving
Date: 1 Nov 2007 14:06:13
Message: <472a23a5$1@news.povray.org>
Warp wrote:
> Orchid XP v7 <voi### [at] devnull> wrote:
>> Simply let lazy evaluation read the file as required, and the GC can 
>> delete the data you've already processed transparently behind you.
> 
>   That's not possible if the file is indeed read into one single string.

Nope, linked list. Each link in the chain can be read seperately when 
accessed.

(As a matter of fact, I very much doubt it reads a single character at 
once. It probably reads much more than that, for efficiency's sake...)


Post a reply to this message

From: Le Forgeron
Subject: Re: Haskell raving
Date: 1 Nov 2007 14:26:03
Message: <472a284b@news.povray.org>
Le 01.11.2007 20:00, Orchid XP v7 nous fit lire :
> No - it's Unicode. 24 bits per character. ;-)

Unicode is just a bunch of tables of glyphs (lot of tables, lot of
glyphs).
It is past 24 bits since a few...  (even past 32 bits!!!)
The real thing is how you encode all these.
UTF-8 is one way (the popular one these days),
UTF-16 another...  and raw storage the worst idea ever!

UTF-8 is about 8 bits for ascii range, usually go up to 24 bits (3 x
8) for classical japanese, 16 for most french variants...

> 
> But yes, the standard Haskell string type is geared to flexibility, not
> performance. See my ByteString comments...

Fixed size unicode... if only they stop adding more tables!

-- 
The superior man understands what is right;
the inferior man understands what will sell.
-- Confucius


Post a reply to this message

From: Warp
Subject: Re: Haskell raving
Date: 1 Nov 2007 15:17:12
Message: <472a3448@news.povray.org>
Orchid XP v7 <voi### [at] devnull> wrote:
> This is why the "ByteString" library was developed.

  It just shows that the original article which spawned this thread is
naive. *In theory* you can have all kinds of fancy high-level uber-abstract
constructs which abstract away all the dirty internal details. *In practice*,
however, you still need those dirty details if you want any efficiency.
The fancy theories may be good in a big bunch of programs, but like so many
abstractions before, it's not the final silver bullet of programming.

-- 
                                                          - Warp


Post a reply to this message

From: Warp
Subject: Re: Haskell raving
Date: 1 Nov 2007 15:18:15
Message: <472a3487@news.povray.org>
Orchid XP v7 <voi### [at] devnull> wrote:
> Notice that a string is a *single-linked* list. (I.e., there are "next" 
> pointers but no "prev" pointers.)

  You can't random-access a string in haskell? You can just go through it
once and that's it? That would be quite limiting...

-- 
                                                          - Warp


Post a reply to this message

From: Warp
Subject: Re: Haskell raving
Date: 1 Nov 2007 15:19:34
Message: <472a34d6@news.povray.org>
Orchid XP v7 <voi### [at] devnull> wrote:
> The nice thing about Haskell is that is *allows* you to come up with a 
> better implementation and slip it in there later.

  Is that really so? Can you refactor an existing solution (eg. in a library)
so that it will be more efficient (eg. memorywise) without breaking any
existing programs?

-- 
                                                          - Warp


Post a reply to this message

From: somebody
Subject: Re: Haskell raving
Date: 1 Nov 2007 15:21:15
Message: <472a353b@news.povray.org>
"Darren New" <dne### [at] sanrrcom> wrote
> Warp wrote:

> Depends on the size of the string, really. If my 100-byte string holding
> an email address turns into 1200 bytes, I'm not really going to worry
> about it much. :-)

Ah yes. Next thing you know, somebody else (or you, in a couple of weeks) is
re-using that piece of code to manage an address book, later, a mailing
list. It's never a bad idea for your code to be scalable. OOP in general,
and bottom up thinking has been a minor disaster in this respect, since many
a programmer fail to think about optimized data structures for the overall
picture any more, and things get quickly out of hand memory and
performancewise.


Post a reply to this message

From: Warp
Subject: Re: Haskell raving
Date: 1 Nov 2007 15:23:37
Message: <472a35c8@news.povray.org>
Orchid XP v7 <voi### [at] devnull> wrote:
> (My strong suspecion is that it *is* 12 
> *bytes* - since, after all, a single Unicode code point is 24 bits already.)

  That depends on whether it's a "raw" unicode value (ie. in practice an
integer) or eg. an UTF8-encoded unicode character (which would make each
character of variable size).

> The situation is actually worse than it looks. All this lazy evaluation 
> magic is implemented by storing program state around the place, so an 
> "unevaluated" string probably takes up more space still...

  I have always wondered about that. Lazy evaluation can indeed be of
great help optimizing many things. For example, if you read a file into
a string and then just read a small part of it, the lazy evaluation will
automatically skip reading the rest of the file.

  However, in situations where lazy evaluation does not actually help
much (or at all), which isn't a very uncommon case, it feels like a
useless waste of memory. After all, the haskell interpreter/compiler
has to *somehow* store the info about what is still to be done. This must
take some extra memory. If the laziness was of absolutely no help, then
that memory is wasted.

-- 
                                                          - Warp


Post a reply to this message

From: Warp
Subject: Re: Haskell raving
Date: 1 Nov 2007 15:32:03
Message: <472a37c3@news.povray.org>
Le Forgeron <jgr### [at] freefr> wrote:
> It is past 24 bits since a few...  (even past 32 bits!!!)

  That's not true. The current unicode standard defines about 100000
characters. Thus raw unicode values require only 17 bits

  UTF-8 encoding "wastes" some bits (in order to use less bits for the
most used western characters) and requires at most 4 bytes per character
(even though the characters requiring more than 3 bytes are very rarely
used).

> The real thing is how you encode all these.
> UTF-8 is one way (the popular one these days),
> UTF-16 another...  and raw storage the worst idea ever!

  Why would raw storage be the worst idea? There are several advantages.
For instance, each character takes the same amount of space (instead of
taking a variable amount like with the encodings), which means that you
can directly index the nth character in a string (in an utf8-encoded string
if you want the nth character you have to actually traverse the entire
string up to that point and decode it along the way). It's also the most
efficient way (speedwise) of handling the characters because you don't
need to be doing conversions back and forth between the encoding and
the raw values.
  The disadvantage is, of course, an increased memory requirement.

-- 
                                                          - Warp


Post a reply to this message

<<< Previous 10 Messages Goto Latest 10 Messages Next 10 Messages >>>

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.