POV-Ray: Newsgroups: povray.off-topic: Reach for the SKI: Re: Reach for the SKI

POV-Ray : Newsgroups : povray.off-topic : Reach for the SKI : Re: Reach for the SKI		Server Time 13 Jul 2025 07:32:30 EDT (-0400)
From: Orchid Win7 v1
Date: 21 Jul 2015 14:51:20
Message: <55ae94a8$1@news.povray.org>
On 20/07/2015 10:03 PM, Orchid Win7 v1 wrote:

> The lambda calculus is a programming language which is *so simple* that
> it doesn't even have arithmetic built-in. In order to do trivial
> calculations, you first have to manually *define* what arithmetic "is".
> For every single program. Needless to say, programs in the lambda
> calculus are... quite verbose.

The lambda calculus has a single data-type: functions. Everything in the 
language is basically an infinite-argument function. Because each 
function takes a function as input and returns another function as 
output. The notion of "curried functions" originates here; we can 
represent a 2-argument function as actually being a 1-argument function 
that returns a 1-argument function that returns the actual result [which 
is again a function, but hey].

It goes without saying that trying to describe "functions that take 
functions and construct functions that return functions for constructing 
functions that make functions" gets confusing rather rapidly!

So how the hell do you "do" anything with the lambda calculus? Well, 
think about it this way: To do anything with a digital computer, you 
have to take whatever it is you want processed and *encode* it into 
binary. You then run a program --- which is *also* binary --- which 
produces a result [again binary]. You then *decode* the result binary to 
get back something meaningful.

For example, to add two numbers, you encode each of them as a series of 
bits. The computer than has a hardware-accelerated operation for 
transforming the bits in such a way that when you decode the result, it 
represents the sum of the original two numbers. Phrased like that, it 
sounds really complicated. But in practise it isn't so much.

The lambda calculus is similar. Except that there's no 
hardware-accelerated "add two numbers" operation built-in. You have to 
define one yourself. But I'm getting ahead of myself.

The syntax of the lambda calculus goes like this. A lambda expression is 
always one of three things:

* A variable reference.

* An anonymous function.

* A function call.

You reference a variable by simply writing its name. Likewise, you call 
a function by writing the function to call followed by the argument to 
call it with. An anonymous function consists of the Greek letter lambda 
("λ"), followed by a variable name, followed by an arrow, followed by 
the function body.

In symbols:

* <var name>

* λ <var name> → <expression>

* <expression> <expression>

You "execute" lambda expressions via a process of "beta reductions". For 
example:

   (λ x → x x) foo

To "execute" this, take the body of the lambda function (the bit after 
the arrow) and replace every "x" with "foo". The result is obviously

   foo foo

As another example,

   (λ x → x y) foo

This becomes

   foo y

The thing on the right doesn't have to be a variable, of course. It can 
be another function. For example,

   (λ x → x x) (λ x → x x)

Now we have a confusion; the "x" on the left is a *different* variable 
from the "x" on the right. Each is only in scope between the lambda and 
the close-bracket. This kind of thing gets confusing quickly; this is 
one of the reasons I went with the SKI calculus for my cryptic interpreter.



So let's try to implement some stuff in the lambda calculus. We'll start 
with something really trivial: Bool functions. Now before we can do 
that, we need to figure out a way to turn Bools into functions, and 
functions back into Bools. In other words, we need an *encoding*.

According to the Church encoding [named after Alonzo Church], we have

   True  =   (λ t → (λ f → t))
   False =   (λ t → (λ f → f))

In other words, "True" is represented by any function that accepts 2 
arguments and returns the 1st argument, unmodified. Similarly, "False" 
is represented by any function that accepts 2 arguments and returns the 
2nd argument unmodified.

With this way of thinking, then if "b" is a variable which contains [a 
function that's meant to represent] a Bool, then we can do

   b abc xyz

If b = True, then the result of this expression is abc, otherwise it is 
xyz. So it's a little like writing

   if (b == true) then return abc; else return xyz;

Suppose we have a "b", and we want the logical-NOT of that. How can we 
compute it? Well, actually that's easy:

   b False True

Or rather, in proper lambda syntax,

   b (λ t → (λ f → f)) (λ t → (λ f → t))

If we want a NOT *function*, we can write

   NOT = λ b → b (λ t → (λ f → f)) (λ t → (λ f → t))

Now we have a 1-argument function that accepts a Bool and returns the 
logical-NOT of it.

First: that was easy! Second: man, that's really verbose! This makes 
*Java* look succinct. (!)

How about a logical-AND?

   AND = λ b → (λ b → a b (λ t → (λ f → f)))

If you're not expert in Boolean algebra, this might not be immediately 
clear, but take a look at the truth table for the AND operation:

   a | b | a & b
  ---+---+-------
   F | F |   F
   F | T |   F
   T | F |   F
   T | T |   T

If you study this table, you will notice that when a=T then a & b = b. 
So in our lambda expression, if a is true, we return b. You will also 
notice that if a=F then a & b = F, utterly regardless of b. So if a is 
false, we return false [which is the (λ t → (λ f → f)) part at the end].

The OR function is fairly similar:

   OR = λ b → (λ b → a (λ t → (λ f → t)) b)

This time, if a=T we return T unconditionally, otherwise we return b.

Given the above, we can implement XOR as

   a XOR b = (a OR b) AND (NOT (a AND b))

which is wordy in words, but as a lambda expression it's

   XOR =
     (λ A → (λ B →
       (λ b → (λ b → a b (λ t → (λ f → f))))
         (A (λ t → (λ f → t)) B)
         (A B (λ t → (λ f → f)))
       )
     )

That is a LOOONG way to write A XOR B! Perhaps a better way is to attack 
the truth table directly. If you study it, you'll notice that when A=F 
then A XOR B = B, otherwise A XOR B = NOT B.

   XOR = (λ A → (λ B → A (B (λ t → (λ f → f)) (λ t → (λ f → t))) B))

That's a bit shorter.

It should be clear by this point that figuring out what some random 
chunk of lambda expression does is quite fiddly. You can also see why 
I'm looking into building automated tools for constructing this stuff. 
If it takes multiple lines of code just to XOR two variables, imagine 
trying to implement something actually complex!



In fact, the Church encoding offers a general way to encode any 
algebraic data type. Suppose, for example, that an IO Mode has three 
possible values: ReadOnly, ReadWrite, and Append. You can represent an 
IO Mode as a 3-argument function that returns one of its arguments 
unchanged. Which one tells you what of the three possible values it 
represents.

   ReadOnly  = λ r → (λ w → (λ a → r))
   ReadWrite = λ r → (λ w → (λ a → w))
   Append    = λ r → (λ w → (λ a → a))

It goes without saying that you can't "tell" what a particular function 
is supposed to represent just by looking at it --- much like you can't 
tell what the bit-sequence 11010011 is meant to be. (Is that unsigned 
binary? Ones complement? Twos complement? Excess-N? Sign and magnitude? 
Fixed point? Floating-point? Scaled integer? ASCII code? Some other 
random code...?)

For data records, you can use the following trick. Let's say we want to 
store a data pair. "5" is not a valid lambda expression, but for 
simplicity let's pretend it is. To represent the pair (5, 7), you could 
write a function

   (λ f → f 5 7)

This function takes a function as input, and passes the two elements of 
the pair to that function as arguments. Using this, you can easily write 
a "get X" or "get Y" function:

   GetX = λ p → p (λ x → (λ y → x))
   GetY = λ p → p (λ x → (λ y → y))

That is, we take a pair ("p") as input, and call it, passing a function 
that accepts two values ("x" and "y") as input and returns the desired 
one. We could also easily do some calculation on the two coordinates 
before returning them. It's also easy to see how we could extend this 
scheme to deal with more than just 2 items.

Let us consider the Haskell programmer's favourite ADT:

   data List x = Null | Cons x (List x)

We can turn this into lambda expressions:

   null     = λ n → (λ c → n)
   cons h t = λ n → (λ c → c h t)

So a "list" is a 2-argument function. If it's an end-of-list marker, it 
returns the first argument. Otherwise it takes the second argument and 
calls it with the head and tail of the list as arguments. Now we can 
easily write a function to (say) get the first element out of a list:

   head = λ l → l ??? (λ h → (λ t → h))

The "???" stands for the fact that if the list is empty, then there *is* 
no "first element" to return, so we'd need to decide what to do here. 
(Perhaps return some sort of default value instead, depending on what 
this is a list *of*.)



One tricky issue is recursion. Since all functions are anonymous, it 
*looks like* it's impossible for a function to call itself recursively. 
However, it *is* in fact possible.

The basic idea is to write a non-recursive function, and pass a copy of 
the function in as an argument. For example:

   (λ x → x x) (λ f → ...my function ...)

This does *almost* what we want. The function on the left makes two 
copies of the function on the right, and passes one copy as the argument 
to the other. The function on the right accepts a copy of itself as 
input, and is therefore able to call itself.

But this isn't quite right. This only lets us do *one* recursive call. 
We only made *two* copies of the function. To allow unlimited recursion, 
we really need an unlimited number of copies somehow.

The function we search for is the so-called "Y combinator". It has the 
property that

   Y f = f (Y f) = f (f (Y f)) = f (f (f (Y f))) = ...

In short, if f expects a copy of itself as input, then Y f generates 
infinity copies of f, and we can have unlimited recursion.

Now, how to implement Y?

Wikipedia gives one implementation:

   Y = λ g → (λ x → g (x x)) (λ x → g (x x))

If we give it an argument:

   Y f = (λ g → (λ x → g (x x)) (λ x → g (x x))) f
       =       ((λ x → f (x x)) (λ x → f (x x)))
       =     f ((λ x → f (x x)) (λ x → f (x x)))
       =     f (Y f)

The first line is the definition of Y. On the second like, every g has 
become f. Taking the lambda on the left and replacing each x with the 
lambda on the right yields the third line. But if you look, the part in 
brackets is identical to the line above, which represents Y f. Which is 
what the final line says. So here we have unlimited recursion.

Now we can do something like

   MAP =
     Y
     (λ map →
       (λ f → (λ list →
         (λ null → (λ cons →
           list null (λ head → (λ tail → cons (f head) (map f tail)))
         )
       ))
     )

The Y combinator applies the inner function to itself, allowing it to 
recurse over a list. If the list is null, return null. Otherwise, apply 
f to the head of the list, and recursively apply map f to the tail.

Yeah, clear as mud, right? Trust me, it does work...

> Let me say that again: The SKI combinator calculus is a programming
> language consisting only of three letters (and millions of brackets),
> and it's somehow Turing-complete.

First, let me define what these letters actually are:

   I = λ x → x
   K = λ x → (λ y → x)
   S = λ f → (λ g → (λ x → (f x) (g x)))

The I function ("identity") takes an argument and just returns it 
unchanged. The K function ("konstant") takes two arguments, and returns 
the first one. In other words, Kx is a function that already returns x. 
Finally, the S function... is the complicated one. But basically it 
accepts two functions (f and g) and an input (x), and passes that input 
to both functions, before calling the result of one with the result of 
the other.

It should now be fairly obvious how the SKI interpreter rules come 
about, given the lambda definitions for what these functions actually do.

Amazingly, every possible lambda expression can be transformed into a 
lambda expression containing only S, K and I. [Provided there are no 
"free variables" to start with.] Not only that, but there is a very 
simple algorithm for doing precisely this. Let's write an expression in 
square brackets to denote converting it to SKI. The rules are then:

What [λ x → e] becomes depends on e:

* If e is just x, then we have [λ x → x] = I.

* If e doesn't contain x at all, then e never changes no matter what x 
is. So we can do [λ x → e] = K[e].

* If e is a function call, then e = l r

   * If r is just x, then we have [λ x → l x] which is equivalent to 
[l]. (This is an "eta reduction".)

   * Otherwise, we can use S: [λ x → l r] = S[λ x → l][λ x → r]

* Finally, [λ x → (λ y → e)] = [λ x → [λ y → e]]. (That is, encode the 
inner lambda first, and then encode the other lambda.)

Let's try our luck with some simple expressions. Perhaps the True 
expression:

   λ t → (λ f → t)

OK, so how does this go down?

   [λ t → (λ f → t)]

Well, t definitely appears in the function body, so let's encode the 
inner lambda first.

   [λ t → [λ f → t]]

Well, f doesn't appear in the body at all, so the K rule applies:

   [λ t → Kt]

Now the eta reduction rule applies:

   [K]

The SKI encoding of K is just K. So

   True = K

How about False?

   [λ t → (λ f → f)]
   K[λ f → f]
   KI

That was easy!

So how about NOT?

   [λ b → b (λ t → (λ f → f)) (λ t → (λ f → t))]
   S[λ b → b (λ t → (λ f → f))][λ b → (λ t → (λ f → t))]
   S[λ b → b (λ t → (λ f → f))](K[λ t → (λ f → t)])
   S[λ b → b (λ t → (λ f → f))](K[λ t → [λ f → t]])
   S[λ b → b (λ t → (λ f → f))](K[λ t → Kt])
   S[λ b → b (λ t → (λ f → f))](KK)
   S(S[λ b → b][λ b → (λ t → (λ f → f))])(KK)
   S(SI[λ b → (λ t → (λ f → f))])(KK)
   S(SI(K[λ t → (λ f → f)]))(KK)
   S(SI(K(K[λ f → f])))(KK)
   S(SI(K(KI)))(KK)

Well, it's a lot of steps, but the end result isn't too bad. The hardest 
part is getting all the millions of brackets right! Fortunately, this 
stuff is easy for a computer to do. Indeed, I've written a small Haskell 
program where you type stuff into a HTML form, and the web server send 
back a trace like the one above, pretty colour-coded as it does the 
encoding. (It also allows you to execute arbitrary lambda expressions, 
execute SKI expressions, and more besides.)

One of the most annoying things about carrying this process out by hand 
is when you see an expression like λ b → ...b..., where the variable 
*is* mentioned, but only in one place, and really deeply burried. So you 
end up with a trail of K values. And then, because there's another 
lambda around it, you've got to wrap all those Ks in Ss, and the 
expression ends up really big and complicated.

You can somewhat fix that by adding a few new functions:

   L = λ f → (λ g → (λ x → (f x) (g  )))
   R = λ f → (λ g → (λ x → (f  ) (g x)))

These functions are both similar to S, but without the need to K the 
left or right sub-expression. Which is a tiny saving in itself, but it 
means that as you encode further lambdas wrapping the whole thing, the K 
doesn't accumulate a stream of other combinators. The end result can be 
quite a saving in text.

   [λ b → b (λ t → (λ f → f)) (λ t → (λ f → t))]
   L[λ b → b (λ t → (λ f → f))][λ t → (λ f → t)]
   L[λ b → b (λ t → (λ f → f))][λ t → [λ f → t]]
   L[λ b → b (λ t → (λ f → f))][λ t → Kt]
   L[λ b → b (λ t → (λ f → f))]K
   L(L[λ b → b][λ t → (λ f → f)])K
   L(LI[λ t → (λ f → f)])K
   L(LI(K[λ f → f]))K
   L(LI(KI))K

As you can see, this is a bit smaller than the previous expression.
Post a reply to this message