POV-Ray: Newsgroups: povray.off-topic: Haskell raving

POV-Ray : Newsgroups : povray.off-topic : Haskell raving		Server Time 13 Dec 2025 12:36:49 EST (-0500)

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>

From: Warp
Subject: Re: Haskell raving
Date: 1 Nov 2007 15:23:37
Message: <472a35c8@news.povray.org>

Orchid XP v7 <voi### [at] devnull> wrote:
> (My strong suspecion is that it *is* 12 
> *bytes* - since, after all, a single Unicode code point is 24 bits already.)

  That depends on whether it's a "raw" unicode value (ie. in practice an
integer) or eg. an UTF8-encoded unicode character (which would make each
character of variable size).

> The situation is actually worse than it looks. All this lazy evaluation 
> magic is implemented by storing program state around the place, so an 
> "unevaluated" string probably takes up more space still...

  I have always wondered about that. Lazy evaluation can indeed be of
great help optimizing many things. For example, if you read a file into
a string and then just read a small part of it, the lazy evaluation will
automatically skip reading the rest of the file.

  However, in situations where lazy evaluation does not actually help
much (or at all), which isn't a very uncommon case, it feels like a
useless waste of memory. After all, the haskell interpreter/compiler
has to *somehow* store the info about what is still to be done. This must
take some extra memory. If the laziness was of absolutely no help, then
that memory is wasted.

-- 
                                                          - Warp

Post a reply to this message

From: Warp
Subject: Re: Haskell raving
Date: 1 Nov 2007 15:32:03
Message: <472a37c3@news.povray.org>

Le Forgeron <jgr### [at] freefr> wrote:
> It is past 24 bits since a few...  (even past 32 bits!!!)

  That's not true. The current unicode standard defines about 100000
characters. Thus raw unicode values require only 17 bits

  UTF-8 encoding "wastes" some bits (in order to use less bits for the
most used western characters) and requires at most 4 bytes per character
(even though the characters requiring more than 3 bytes are very rarely
used).

> The real thing is how you encode all these.
> UTF-8 is one way (the popular one these days),
> UTF-16 another...  and raw storage the worst idea ever!

  Why would raw storage be the worst idea? There are several advantages.
For instance, each character takes the same amount of space (instead of
taking a variable amount like with the encodings), which means that you
can directly index the nth character in a string (in an utf8-encoded string
if you want the nth character you have to actually traverse the entire
string up to that point and decode it along the way). It's also the most
efficient way (speedwise) of handling the characters because you don't
need to be doing conversions back and forth between the encoding and
the raw values.
  The disadvantage is, of course, an increased memory requirement.

-- 
                                                          - Warp

Post a reply to this message

From: Orchid XP v7
Subject: Re: Haskell raving
Date: 1 Nov 2007 15:59:10
Message: <472a3e1e@news.povray.org>

Warp wrote:
> Orchid XP v7 <voi### [at] devnull> wrote:
>> Notice that a string is a *single-linked* list. (I.e., there are "next" 
>> pointers but no "prev" pointers.)
> 
>   You can't random-access a string in haskell? You can just go through it
> once and that's it? That would be quite limiting...

Not unless you transform it into something else first.

(This "ByteString" thing I keep harping on about gives you something 
like O(n/k) random access time, for some implementation-dependent k.)

Typically, if you read a text string, you're probably going to feed it 
into some kind of parser, and then manipulate the resulting parse tree. 
And parsing only requires linear access.

Post a reply to this message

From: Orchid XP v7
Subject: Re: Haskell raving
Date: 1 Nov 2007 16:00:19
Message: <472a3e63$1@news.povray.org>

Warp wrote:
> Orchid XP v7 <voi### [at] devnull> wrote:
>> This is why the "ByteString" library was developed.
> 
>   It just shows that the original article which spawned this thread is
> naive. *In theory* you can have all kinds of fancy high-level uber-abstract
> constructs which abstract away all the dirty internal details. *In practice*,
> however, you still need those dirty details if you want any efficiency.
> The fancy theories may be good in a big bunch of programs, but like so many
> abstractions before, it's not the final silver bullet of programming.

The point being, ByteString is implemented as an array, yet still *looks 
like* a normal linked-list. So you get all the nice fancy theory *and* 
the efficient runtime behaviour, all at once. So I'm not sure it is 
"flawed"...

Post a reply to this message

From: Orchid XP v7
Subject: Re: Haskell raving
Date: 1 Nov 2007 16:05:00
Message: <472a3f7c$1@news.povray.org>

Warp wrote:
> Orchid XP v7 <voi### [at] devnull> wrote:
>> The nice thing about Haskell is that is *allows* you to come up with a 
>> better implementation and slip it in there later.
> 
>   Is that really so? Can you refactor an existing solution (eg. in a library)
> so that it will be more efficient (eg. memorywise) without breaking any
> existing programs?

It depends on the library, and how carefully it was thought out in the 
first place. (I imagine this applies in most languages.)

For example, there is a current project to introduce 
trippy-new-killer-feature "stream fusion". This involves rewriting 
almost *all* the standard Haskell list processing functions (of which 
there are a fairly mind-blowing number) so that they work completely 
differently internally.

Obviously, these are the most-used functions in all of Haskell [three 
guesses why they're being optimised?], and any change to the interface 
would break practically every Haskell program ever written. And breaking 
almost every Haskell program ever written would be a Very Bad Thing(tm). 
Fortunately, they haven't needed to...

Can the same be said for *every* library? Well, that is a harder 
question to answer. My guess would be "many of them", but I don't have 
any really scientific proof of that.

Post a reply to this message

From: Orchid XP v7
Subject: Re: Haskell raving
Date: 1 Nov 2007 16:10:47
Message: <472a40d7$1@news.povray.org>

Warp wrote:
> Orchid XP v7 <voi### [at] devnull> wrote:
>> (My strong suspecion is that it *is* 12 
>> *bytes* - since, after all, a single Unicode code point is 24 bits already.)
> 
>   That depends on whether it's a "raw" unicode value (ie. in practice an
> integer) or eg. an UTF8-encoded unicode character (which would make each
> character of variable size).

My understanding is that single characters are stored as 32-bit 
integers. (Because that's simple, and they fit nicely into CPU 
registers.) I could be mistaken though...

>> The situation is actually worse than it looks. All this lazy evaluation 
>> magic is implemented by storing program state around the place, so an 
>> "unevaluated" string probably takes up more space still...
> 
>   I have always wondered about that. Lazy evaluation can indeed be of
> great help optimizing many things. For example, if you read a file into
> a string and then just read a small part of it, the lazy evaluation will
> automatically skip reading the rest of the file.
> 
>   However, in situations where lazy evaluation does not actually help
> much (or at all), which isn't a very uncommon case, it feels like a
> useless waste of memory.

And time - don't forget time. (Takes time to build a closure, takes time 
to execute it. All of which can be avoided if you know you don't need it.)

For this reason, we have the following facts:

1. There is a special compiler pass for detecting places where laziness 
is *not* necessary, and automatically removing it. (E.g., an inner loop 
where you can *see* that the value is obviously used in the next loop.) 
When you compile a library, the compiler even leaves a note to itself 
telling it how strict each function is with its arguments.

2. There are mechanisms for the programmer to explicitly turn off 
laziness where not required. (Unlike the compiler's automatic detection 
stuff, this operation actually changes the behaviour of the program - 
which is why you, the programmer, must say this is OK.)

Post a reply to this message

From: Orchid XP v7
Subject: Re: Haskell raving
Date: 1 Nov 2007 16:13:08
Message: <472a4164$1@news.povray.org>

Le Forgeron wrote:
> Le 01.11.2007 20:00, Orchid XP v7 nous fit lire :
>> No - it's Unicode. 24 bits per character. ;-)
> 
> Unicode is just a bunch of tables of glyphs (lot of tables, lot of
> glyphs).
> It is past 24 bits since a few...  (even past 32 bits!!!)
> The real thing is how you encode all these.
> UTF-8 is one way (the popular one these days),
> UTF-16 another...  and raw storage the worst idea ever!
> 
> UTF-8 is about 8 bits for ascii range, usually go up to 24 bits (3 x
> 8) for classical japanese, 16 for most french variants...

I was under the impression that these encodings apply to *strings*, not 
individual characters by themselves...

>> But yes, the standard Haskell string type is geared to flexibility, not
>> performance. See my ByteString comments...
> 
> Fixed size unicode... if only they stop adding more tables!

Oh, ByteString (as the name somewhat implies) only supports the first 
256 Unicode code-points. ;-)

Somebody should really fix that eventually...

Post a reply to this message

From: Orchid XP v7
Subject: Re: Haskell raving
Date: 1 Nov 2007 16:15:33
Message: <472a41f5$1@news.povray.org>

Warp wrote:

>   UTF-8 encoding "wastes" some bits (in order to use less bits for the
> most used western characters) and requires at most 4 bytes per character
> (even though the characters requiring more than 3 bytes are very rarely
> used).

And thus, like any decent variable-length encoding scheme, it tries to 
assign short codes to common symbols. (Although UTF-8 probably fails 
horribly for, say, Japanese text. I don't actually know...)

>> UTF-16 another...  and raw storage the worst idea ever!
> 
>   Why would raw storage be the worst idea? There are several advantages.
>   The disadvantage is, of course, an increased memory requirement.

You win some, you loose some. Programming is all about these kinds of 
compromises. :-)

Post a reply to this message

From: Warp
Subject: Re: Haskell raving
Date: 1 Nov 2007 16:21:30
Message: <472a435a@news.povray.org>

Orchid XP v7 <voi### [at] devnull> wrote:
> The point being, ByteString is implemented as an array, yet still *looks 
> like* a normal linked-list. So you get all the nice fancy theory *and* 
> the efficient runtime behaviour, all at once. So I'm not sure it is 
> "flawed"...

  Then why do you need any other type of string?

-- 
                                                          - Warp

Post a reply to this message

From: Tim Attwood
Subject: Re: Haskell raving
Date: 1 Nov 2007 17:37:34
Message: <472a552e$1@news.povray.org>

>> Yeah, it does take a lot, and it uses unicode for characters too,
>> I think I heard it takes about 12 bytes per character that way.
>> It's perty bad on big files.
>
>  But hey, that's the fad in modern programming: Memory usage and speed
> are irrelevant. Processors are getting faster and memory amounts are
> doubling every couple of years, so who cares if something like a string
> takes 12 times as much memory as it could?

Wasn't that part of  Leo Brodie's point, it's more important that a
solution be adequate, and correct than simple?

Post a reply to this message

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>