POV-Ray: Newsgroups: povray.off-topic: Haskell raving

POV-Ray : Newsgroups : povray.off-topic : Haskell raving		Server Time 2 Jul 2025 06:39:35 EDT (-0400)

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>

From: Invisible
Subject: Re: Haskell raving
Date: 2 Nov 2007 11:13:46
Message: <472b4cba@news.povray.org>

Warp wrote:

>   Is that really so? Can you refactor an existing solution (eg. in a library)
> so that it will be more efficient (eg. memorywise) without breaking any
> existing programs?

I guess the key to a question like this is to think about what kinds of 
things would possibly *prevent* such a refactor.

I have never written a nontrivial program in C, but in Pascal I 
frequently found that the type system was one such reason. In Pascal 
(and AFAIK in plain C) you *must* expose all implementation details of a 
type in order for anybody else to be able to touch it. This is one of 
the Big Problems that OOP is supposed to fix. (And, IMHO, does fix very 
well indeed.)

Haskell does quite well too in this regard. I can write a module in 
Haskell that exports a type, without telling you a damn thing about what 
the type is. It might just be an alias for some existing type, or it 
could be something really complex. You don't know. You don't care. And I 
can *change* what that type actually is, and you *still* don't need to 
care. All you see is that there exists a type with a specific name, and 
some functions that can do things with it.

We could summarise all that by simply saying that hiding unecessary 
detail is a Good Thing. So perhaps we should look at where details can 
leak out in Haskell...

One place is I/O. Functions that do I/O have to be "marked" as such. 
This is by design; normal functions can be run in any order without 
consequence, whereas I/O operations *usually* need to be run in a very 
precise order. This information is obviously very important, and so is 
exposed at the type level.

What this means is that you can't flip a normal function into an 
I/O-performing function without altering at least the function's type. 
But then, you're turning an "I can run this at any time" function into a 
"it might well matter when I run this" function. Arguably this demands 
that the client code change anyway, in which case the enforced interface 
change is arguably good. Either way, this kind of code change seems 
quite rare in practice. (Certainly rare for performance tuning.)

What other information is leaked? Well, a function's type tells you 
exactly what inputs it depends on. (Unless it does I/O anyway...) So if 
you change the algorithm to take more information into account, the type 
signature must change.

Let us take an example: Suppose you have a function that decodes 
different character sets. You give it a byte, and it translates it to a 
Unicode code-point using some kind of rule.

Now suppose you want to add UTF-8 decoding. Oops! That's a 
variable-length encoding. To decode a byte, you might need to know what 
the previous byte was. But in Haskell, you can't do that.

Now, in C you could just store some status information in a global 
variable somewhere. So in C you can add UTF-8 without having to change 
your public interface. Yay!

Now suppose some client decides to call the function from multiple 
threads processing several input streams. OOPS!! Chaos ensues... The 
*only* way you can fix this is if you alter your public interface. There 
are several possible fixes, but all involve client-visible change.

On the other hand, Haskell would force you to change the public 
interface in the first place. (And again you have several options here.)

To summarise: In both C and Haskell you end up changing the interface if 
you want the thing to work properly. Haskell just forces you to do it 
right at the start, whereas C potentially does strange things until you 
realise what's up. And in both cases, the key to success is to design 
the public interface properly in the first place! (I.e., write a 
function that converts whole strings, not individual octets.)

I said above that leaking unecessary detail is bad. My arguments below 
that could essentially be read as saying "this is leaking detail, but it 
is *necessary* detail, so that's OK". So it's really whether you 
consider these things "necessary" or not.

Another potential problem area is laziness. Being lazy can speed 
programs up no end. It can also slow them down tremendously. 
Deliberately turning off laziness is a fairly common way to improve the 
efficiency of Haskell code. However, this is often an obvservable 
change. Code that works with the lazy version of a library might well 
lock up given a stricter version.

So here is an implementation change that is totally invisible at the 
interface level, yet might cause client code to stop working properly.

(It's actually fairly common to provide libraries with two versions, so 
the client can choose. And because suddenly changing a library's 
implementation can cause problems...)

Post a reply to this message

From: Alain
Subject: Re: Haskell raving
Date: 2 Nov 2007 14:14:04
Message: <472b76fc$1@news.povray.org>

Warp nous apporta ses lumieres en ce 2007/11/01 14:40:
> Alain <ele### [at] netscapenet> wrote:
>> It's about 12 BITS per characters, on average, not BYTES! That's useing UTF8 
>> encoding. About 16 BITS per characters if using UTF16 encoding.
>> UTF8 is only 7 BITS per characters if you stick to only standard ASCII 
>> characters set, but it gets bigger if you also use extended ASCII or characters 
>> from foreign alphabets.
> 
>   Except that if each single character is indeed garbage-collected, that
> requires quite a lot of memory per character (compared to the size of the
> character).
> 
Garbage collection is BAAAAD!!!
Any implementation that permit you not to use that is GOOD.

I got hit real bad by garbage collection, ONCE! The programm was running realy 
good, no bug at all, then is just stopped there, nothing at all was hapening, 
the systemwas totaly unresponsive, uterly frozen, then it resumed after around 
15 minutes. It was garbage collecting.
I changed the programm logic to prevent any future garbage collection.

-- 
Alain
-------------------------------------------------
You know you've been raytracing too long when you're starting to find these 
quotes more unsettling than funny.
     -- Alex McLeod a.k.a. Giant Robot Messiah

Post a reply to this message

From: Nicolas Alvarez
Subject: Re: Haskell raving
Date: 2 Nov 2007 14:16:44
Message: <472b779c$1@news.povray.org>

Alain escribió:
> Garbage collection is BAAAAD!!!
> Any implementation that permit you not to use that is GOOD.
> 
> I got hit real bad by garbage collection, ONCE! The programm was running 
> realy good, no bug at all, then is just stopped there, nothing at all 
> was hapening, the systemwas totaly unresponsive, uterly frozen, then it 
> resumed after around 15 minutes. It was garbage collecting.
> I changed the programm logic to prevent any future garbage collection.
> 

http://www.virtualdub.org/blog/pivot/entry.php?id=176

Post a reply to this message

From: Alain
Subject: Re: Haskell raving
Date: 2 Nov 2007 14:20:20
Message: <472b7874@news.povray.org>

Orchid XP v7 nous apporta ses lumieres en ce 2007/11/01 15:03:
> Warp wrote:
>> Alain <ele### [at] netscapenet> wrote:
>>> It's about 12 BITS per characters, on average, not BYTES! That's 
>>> useing UTF8 encoding. About 16 BITS per characters if using UTF16 
>>> encoding.
>>> UTF8 is only 7 BITS per characters if you stick to only standard 
>>> ASCII characters set, but it gets bigger if you also use extended 
>>> ASCII or characters from foreign alphabets.
>>
>>   Except that if each single character is indeed garbage-collected, that
>> requires quite a lot of memory per character (compared to the size of the
>> character).
> 
> I can't actually find documentation to hand to clarify whether it's 12 
> bits or 12 bytes per character. (My strong suspecion is that it *is* 12 
> *bytes* - since, after all, a single Unicode code point is 24 bits 
> already.)
> 
> The situation is actually worse than it looks. All this lazy evaluation 
> magic is implemented by storing program state around the place, so an 
> "unevaluated" string probably takes up more space still...
Using UTF8 encoding, a single character can be 1 BYTE, 2, 3 or 4 Bytes long. 
Standard ASCII characters are 1 BYTE. extended ASCII and some more use 2 bytes.
BIT 7 is a flag for 2 bytes codes. BITS 6 and 7 set to 1 = 3 bytes codes. BITS 
5, 6 and 7 set to 1 = 4 bytes codes.
Using UTF16 encoding, any character is 2 BYTES long, for a grand total is 65536 
possible characters, not all of them been printable.

-- 
Alain
-------------------------------------------------
Error in operator: add beer

Post a reply to this message

From: Le Forgeron
Subject: Re: Haskell raving
Date: 2 Nov 2007 14:31:50
Message: <472b7b26$1@news.povray.org>

Le 01.11.2007 22:15, Orchid XP v7 nous fit lire :
 >>> UTF-16 another...  and raw storage the worst idea ever!
>>
>>   Why would raw storage be the worst idea? There are several advantages.
>>   The disadvantage is, of course, an increased memory requirement.
> 
> You win some, you loose some. Programming is all about these kinds of
> compromises. :-)

Raw storage is the worst, because they are still adding glyph in
unicode...
When it started, you could have used only 16 bits... now you can't
anymore.
It's not about programming, but "selling" the concept. Raw storage
means no coding, no signaling of actual number of bits used...

On one side, you have billions of people just happy with only 7
bits. May be even with 6 bits it would be enough (you do not need
all these letters in double, do you ? ten number (roman did it with
6 or 7 glyphs) and a few ponctuations signs.)
And other billions which could probably be fine with a fixed 16
bits... maybe not, they are still adding!
and people in the middle, (japanese recommended 1945... and all the
other countries...)
And there is the few who want to be able to mix everything at the
same time in the same document.


UTF-8 is interesting for the 6 bits people, as they do not loose too
much storage and it's simple.
(far more than all the old trick of "active page" and such coding).
But UTF-8 nowadays can extend to already 4 x 8 bits ( or worse ?).
What would have happened if the basic byte was 9 bits long instead
of 8 ? Would they go by 4, or would 3 have been enough ?
Obviously all latin-european coutries would have fit on the same
page of 512, and maybe we would still be playing with paging-code ?
(No thanks!)


-- 
The superior man understands what is right;
the inferior man understands what will sell.
-- Confucius

Post a reply to this message

From: Alain
Subject: Re: Haskell raving
Date: 2 Nov 2007 14:32:06
Message: <472b7b36$1@news.povray.org>

Orchid XP v7 nous apporta ses lumieres en ce 2007/11/01 17:15:
> Warp wrote:
> 
>>   UTF-8 encoding "wastes" some bits (in order to use less bits for the
>> most used western characters) and requires at most 4 bytes per character
>> (even though the characters requiring more than 3 bytes are very rarely
>> used).
> 
> And thus, like any decent variable-length encoding scheme, it tries to 
> assign short codes to common symbols. (Although UTF-8 probably fails 
> horribly for, say, Japanese text. I don't actually know...)
For Japanese and Chinese, it average around 3 bytes per characters. It's not so 
bad after all, as each characters in those represent a whole word, some even 
represent a whole phrase or some complexe concept.
A 1000 glyphs text in Chinese would be, roughly, a 1000 to 5000 words text in 
English!
> 
>>> UTF-16 another...  and raw storage the worst idea ever!
>>
>>   Why would raw storage be the worst idea? There are several advantages.
>>   The disadvantage is, of course, an increased memory requirement.
> 
> You win some, you loose some. Programming is all about these kinds of 
> compromises. :-)
All possible characters for all European languages fit in 1 or 2 bytes, and I 
think that it also include Arabic and Cyrilic. The Asiatic glyphs use the bulk 
of the 3 and 4 bytes codes.

-- 
Alain
-------------------------------------------------
If you're ever about to be mugged by a couple of clowns, don't hesitate - go for 
the juggler.

Post a reply to this message

From: Orchid XP v7
Subject: Re: Haskell raving
Date: 2 Nov 2007 14:33:56
Message: <472b7ba4@news.povray.org>

Alain wrote:

> Garbage collection is BAAAAD!!!
> Any implementation that permit you not to use that is GOOD.

Gee, that's not much of a wild generalisation or anything like that...

> I got hit real bad by garbage collection, ONCE! The programm was running 
> realy good, no bug at all, then is just stopped there, nothing at all 
> was hapening, the systemwas totaly unresponsive, uterly frozen, then it 
> resumed after around 15 minutes. It was garbage collecting.

Yeah, old GC algorithms used to do this. Research has been done, 
solutions have been found, etc.

Post a reply to this message

From: Warp
Subject: Re: Haskell raving
Date: 2 Nov 2007 15:14:18
Message: <472b851a@news.povray.org>

Alain <ele### [at] netscapenet> wrote:
> Using UTF16 encoding, any character is 2 BYTES long, for a grand total is 65536 
> possible characters, not all of them been printable.

  Wrong. UTF16-encoding results in either 2-byte or 4-byte characters,
depending on the unicode value.

  Perhaps you are confusing it with UCS2?

-- 
                                                          - Warp

Post a reply to this message

From: Warp
Subject: Re: Haskell raving
Date: 2 Nov 2007 16:10:17
Message: <472b9238@news.povray.org>

Alain <ele### [at] netscapenet> wrote:
> > And thus, like any decent variable-length encoding scheme, it tries to 
> > assign short codes to common symbols. (Although UTF-8 probably fails 
> > horribly for, say, Japanese text. I don't actually know...)
> For Japanese and Chinese, it average around 3 bytes per characters. It's not so 
> bad after all, as each characters in those represent a whole word, some even 
> represent a whole phrase or some complexe concept.

  UTF16 is better because it uses 2 bytes for the vast majority of the most
commonly used kanjis and other symbols used in Japanese.

-- 
                                                          - Warp

Post a reply to this message

From: Darren New
Subject: Re: Haskell raving
Date: 2 Nov 2007 19:01:33
Message: <472bba5d$1@news.povray.org>

Invisible wrote:
> (and AFAIK in plain C) you *must* expose all implementation details of a 
> type in order for anybody else to be able to touch it. 

Nah. Just use a forward-declared struct and pass pointers to it. You 
know, like fopen/fread/fwrite/fclose.  Note those haven't changed since 
the first edition of K&R.

-- 
   Darren New / San Diego, CA, USA (PST)
     Remember the good old days, when we
     used to complain about cryptography
     being export-restricted?

Post a reply to this message

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>