POV-Ray : Newsgroups : povray.off-topic : Quote of the day : Re: Quote of the day Server Time
29 Jul 2024 02:32:21 EDT (-0400)
  Re: Quote of the day  
From: Invisible
Date: 3 Jul 2012 06:49:21
Message: <4ff2ce31@news.povray.org>
On 03/07/2012 05:40 AM, Warp wrote:

>    I don't understand at which point Haskell decides to stop laziness and
> start actually computing results. It cannot be lazy forever.

This is an important question, and one which I haven't seen really 
explained properly anywhere. The only time I've seen it explained is 
when people try to write complicated mathematical equations which 
rigorously expound the rules... These are not the sort of things your 
average Joe Programmer is likely to comprehend. And comprehending this 
stuff is really rather important if you want to write code with good 
performance!



The rules aren't especially complicated. They just don't appear to be 
written down anywhere in an easily digestible form.

1. Execution starts from "main".

2. I/O actions execute sequentially, in the order written.

3. Any arguments to an I/O action get executed when the I/O action 
executes. (Usually.)

4. If there is a conditional branch in the I/O sequence, the condition 
is evaluated so we can decide which branch to take.

Together, these rules tell us when I/O actions execute, and when that 
causes pure expressions to execute. When a pure expression executes, we 
walk down the expression tree in a certain order, recursively executing 
the sub-expressions.

For the most part, there are two rules to pure code:

1. If the result of a function is another expression, execute that 
expression.

2. If the result of a function depends on a conditional branch, execute 
the conditional so we can decide which branch to take.

An example may help to clarify:

   or1 False False = False
   or1 False True  = True
   or1 True  False = True
   or1 True  True  = True

   or2 False x = x
   or2 True  x = True

The first function /obviously/ implements the truth-table of the 
logical-OR function. It might not be immediately obvious, but for any 
True/False combination, or2 produces the same result as or1. (Try it!)

Given this, you might think that these are two equivalent ways of 
writing the same function. But actually, they are not. or1 takes a 
conditional branch based on the values of /both arguments/. Therefore, 
whenever or1 is called, /both arguments/ will be computed. or2, on the 
other hand, takes a conditional branch based on /one/ argument only. If 
that argument is False, then the second argument will be returned (and 
hence immediately computed). However, if the first argument is True, 
then the second argument will /never/ be computed.

Hopefully that gives you some idea how pure code implicitly defines 
execution order. Admittedly it's quite subtle; even the experts get this 
stuff occasionally. Arguably that's Haskell's greatest weakness. (Even 
some of the original language architects seem ambivilent about whether 
default-lazy was the right choice.)

As well as conditional constructs, there are various ways that you can 
/force/ stuff to execute Right Now, rather than at some later time. The 
main one is the "seq" function. seq is magical, in that it doesn't 
"need" its first argument, and yet the language spec guarantees that it 
will always be executed. For example,

   length list = work 0 list
     where
       work n (head : tail) = work (n+1) tail
       work n []            = n

This counts the number of elements in a list. However, what it does is 
fill "n" with a giant expression that looks like 1+(1+(1+(1+...)))). And 
then, right at the end, it returns n, forcing this giant expression to 
be executed. If the input list is large enough, this easily overflows 
the stack.

To fix this, we need to do two things. First, we create a local variable 
for the new, updated count:

   work (n+1) tail

becomes

   let n1 = n+1 in work n1 tail

By itself, this does nothing. But now we can apply the seq function:

   let n1 = n+1 in n1 `seq` work n1 tail

This /forces/ n1 to be executed /before/ the recursive call to work is 
executed. Since this happens every time round the loop, n always ends up 
holding an evaluated integer.

In fact, now the compiler can /see/ that n is always evaluated, and it 
starts applying optimisations like treating n is a normal 
machine-integer rather than a heap object, producing a tight loop like a 
Java or C programmer would write.

seq is used for other reasons; for example, when doing parallel 
programming, you /want/ the parallel threads to compute stuff, even 
though they don't "do" anything with the result. (You may remember the 
magical "par" function forces something to be evaluated in parallel, 
much like "seq" forces it in series.)

It's also important to remember that values are not simple "evaluated" 
or "unevaluated". It's not black and white, it's shades of grey. Each 
individual heap object is either black or white; a given expression's 
result might consist of hundreds of individual heap objects, hence the 
shades of grey. Anything atomic like an integer or a Bool is black or 
white; things like lists or strings or whatever have shades of grey.

(Actually, saying that, it's not inconceivable that an /arbitrary 
precision/ integer might consists of more than one heap object...)



Actually, having written Logic Box, one of the next projects I was 
thinking about attacking was writing an interactive simulator that shows 
you how the computer runs Haskell code. I actually ended up with a mild 
case of analysis-paralysis, but I think it's tractable. Would that be 
something that interests you? (Or anybody else here...?)


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.