POV-Ray : Newsgroups : povray.off-topic : Reach for the SKI : Reach for the SKI Server Time
8 Jul 2024 08:54:08 EDT (-0400)
  Reach for the SKI  
From: Orchid Win7 v1
Date: 20 Jul 2015 17:03:11
Message: <55ad620f$1@news.povray.org>
(OK, that was terrible.)



Over the weekend, I implemented an interpreter for the SKI combinator 
calculus. In Bash script.

I am a sick, sick individual.



"The who to the what now?" you ask. Well, around about the time Alan 
Turing was inventing Turing machines, a guy called Alonzo Church was 
inventing something called the lambda calculus. If a Turing machine is 
like an ultra-simplified computer, then the lambda calculus is like an 
ultra-simplified programming language.

(Of course, all the experts know that "really" they're neither of these 
things. They're actually abstract models of computation. But we'll 
cheerfully ignore that for a moment.)

The lambda calculus is a programming language which is *so simple* that 
it doesn't even have arithmetic built-in. In order to do trivial 
calculations, you first have to manually *define* what arithmetic "is". 
For every single program. Needless to say, programs in the lambda 
calculus are... quite verbose.

The lambda calculus is conceptually simple, but rather fiddly to parse, 
and even more awkward to execute. (In particular, there are some fun 
edge-cases to do with scoping rules.) However, every possible lambda 
expression can be encoded into an equivalent expression that contains 
only three pre-defined values, traditionally denoted by the letters S, K 
and I.

Let me say that again: The SKI combinator calculus is a programming 
language consisting only of three letters (and millions of brackets), 
and it's somehow Turing-complete.

The rules for executing SKI expressions are considerably simpler. And 
the language has much easier to parse. So I set about writing an 
interpreter for it. As a Unix shell script. It uses some features 
specific to the Bourne Again Shell ("Bash"), so it's not portable to 
other systems. But Bash is everybody's default shell anyway, so...



As you may know, any possible computable function can be computed by a 
suitable Turing machine. However, Turing machines have no notion of 
input/output operations, which is what we generally want real-world 
programs to have. Likewise, neither the lambda calculus nor the SKI 
calculus has a way to do I/O. So I added one.

It's based on the idea of how the original Haskell I/O system used to 
work. (I.e., before we got monads.) Initially, I made the system so that 
the SKI program accepts an arbitrary-length stream of code numbers as 
input, and returns an arbitrary-length stream of code numbers as output. 
I later changed it to be a stream of *bits*. My plan was that the 
interpreter would read these codes and do different I/O operations based 
on the "commands" it receives. I've since decided that since even the 
most trivial SKI program is quite hard enough, this extra layer isn't 
needed. (Yet?)

I've dubbed the resulting language "Pointless". So how do you execute a 
Pointless program? The rules are quite simple:

* The only legal input characters are: S K I ( )

* The input source code is a sequence of "items". An "item" consists of 
a single letter, or a pair of brackets enclosing ONE or more sub-items. 
(It is an error for there to be ZERO items in the brackets.)

* To "execute" the program, you apply a series of transformations to the 
source code, according to the rules below.

* First, append the items G, E, P to the right-hand end of the program.

* Now endlessly apply the transformation rules. Look at the LEFT-MOST 
item in the program, and select the applicable rule:

RULE #1: "(" - Delete the opening bracket and the MATCHING closing bracket.

Rule #2: "I" - Delete the I.

Rule #3: "K" - Delete the K, keep the next item, delete the item after that.

Rule #4: "S" - Delete the S. Number the following items as 1, 2, 3. 
Remove them, and replace with the sub-expression "1 3 (2 3)".

Rule #5: "G" - Read one bit of input. Delete the G and replace it with:

+ S(K(S(SI(K (KI) )))G   if the bit read was 0
+ S(K(S(SI(K (K ) )))G   if the bit read was 1

Rule #6: "E" - End of program. Stop executing.

Rule #7: "P" - Delete the P, keep the next item, insert O, Z after it.

Rule #8: "O" - Delete the O. Output a 1 bit.

Rule #9: "Z" - Delete the Z. Output a 0 bit.

The first 4 rules constitute the entire SKI calculus. The 5 subsequent 
rules specify how to perform I/O.

For example, consider the expression

   SSKKII

By rule #4, 1=S, 2=K, 3=K, and we get

   SK(KK)II

Applying rule #4 again, 1=K, 2=KK, 3=I and we have

   KI((KK)I)I

Now rule #3 applies, which deletes the ((KK)I) item:

   I

As a slightly longer example,

   S(S(KS)K)IZO
                        S: 1=(S(KS)K), 2=I, 3=Z

   (S(KS)K)Z(IZ)O
   ^      ^             Brackets

   S(KS)KZ(IZ)O
                        S: 1=(KS), 2=K, 3=Z

   (KS)Z(KZ)(IZ)O
   ^  ^                 Brackets

   KSZ(KZ)(IZ)O
                        K: 1=S, 2=Z

   S(KZ)(IZ)O
                        S: 1=(KZ), 2=(IZ), 3=O

   (KZ)O((IZ)O)
   ^  ^                 Brackets

   KZO((IZ)O)
                        K: 1=Z, 2=O

   Z((IZ)O)
                        Output 0

   ((IZ)O)
   ^     ^              Brackets

   (IZ)O
   ^  ^                 Brackets

   IZO
                        I

   ZO
                        Output 0

   O
                        Output 1

So, in the end, this program outputs 001.



It turns out you can write a parser that parses items and an execution 
engine that implements the 9 rules in less than 99 lines of shell 
script, as I shall shortly demonstrate.

My initial thought was that certainly the first 4 rules ought to be 
pretty simple to implement using sed (the Unix Stream EDitor). However, 
sed only understands regular expressions, and such a construct cannot 
handle *nested* brackets. For that, you need a context-sensitive grammar.

After many experiments, I hit upon a single idea: represent 
single-letter items as themselves, and represent compound items by ID 
numbers. So the shell script first runs the input through the "tr" 
command (TRanslate), which strips all characters except the ones we're 
interested in. (In particular, it strips all whitespace.) Then the 
script creates one file on disk for each compound item.

For example, the expression

   S(SIK)(KI(SS))

might be translated into a bunch of disk files that look like this:

   0001.item: S 2 3
   0002.item: S I K
   0003.item: K I 4
   0004.item: S S

Now bash has a "read" command, which reads one line of input and splits 
it into "words". By putting spaces around each item, this makes Bash 
automatically split up the words, and put them into separate variables. 
(It also means we can count higher than 9, of course...)

The parser is a fairly simple affair; once all spaces have been 
stripped, look at each character. If it's an open bracket, allocate a 
new item ID and start parsing items into that file. If it's a close 
bracket, return to the caller [presumably a bracket handler, which will 
then back out of the current bracket sequence].

Now ordinarily Bash tries to read lines. But according to Google [more 
exactly, Stack Overflow], the magical incantation

   IFS='' read -r -n1 CHAR

will read exactly one character into ${CHAR}. (IFS='' disables all 
delimiter characters, -r disables character escapes in the input string, 
-n1 means "read 1 char".)

Once everything is item files containing space-delimited tokens, a call like

   read TOKEN1 TOKEN2 TOKENS <Filename

will redirect stdin to "Filename", and execute the "read" command, each 
puts the first token in ${TOKEN1}, the second in ${TOKEN2}, and ALL 
remaining tokens into ${TOKENS}.

At this point, it's just a matter of reading the item representing the 
top-level program expression, figuring out what the first token is, and 
overwriting the item file with the new, modified expression. (In some 
cases, that means allocating new items, but for the most part it's just 
rearranging existing ones.)



So, without further ado, here is the actual shell script:

----------------------------------------------------------------------

#!/bin/bash

shopt -s extglob   # Allow regex in case patterns.

if [ ! -r "$1" ]; then echo "Can't open file '$1' for reading." ; exit 
10 ; fi

if [ -d Heap ]; then rm -r Heap ; fi
mkdir Heap
echo 1 >Heap/Next

function NewItem()
{
   local ITEM=$( cat Heap/Next )
   echo ${ITEM}
   let ITEM++
   echo ${ITEM} >Heap/Next
}

function ItemPath() { printf "Heap/%04d.item" $1 ; }

function Ingest()
{
   printf '('
   tr -cd 'SKI()' <"$1"
   printf 'GEP)'
}

function ParseItem()
{
   IFS='' read -r -n1 CHAR
   if [ "${CHAR}" == "(" ]; then ParseItems ; return 0 ; fi
   if [ "${CHAR}" == ")" ]; then              return 1 ; fi
   printf "${CHAR} " ; return 0
}

function ParseItems()
{
   local ITEM=$( NewItem )
   local FILE=$( ItemPath ${ITEM} )
   printf "${ITEM} "
   while ParseItem >>${FILE} ; do : ; done
}

function Decode()
{
   local ITEMS ITEM
   read  ITEMS <$( ItemPath $1 )
   for ITEM in ${ITEMS}
   do
     if [[ "${ITEM}" =~ [0-9]+ ]]
       then printf '(' ; Decode ${ITEM} ; printf ')'
       else printf "${ITEM}"
     fi
   done
}

function Exec()
{
   local FILE=$( ItemPath $1 )
   while true
   do
     Decode $1 && printf '\n'
     read CMD ITEM1 ITEM2 ITEM3 ITEMS <${FILE}
     case "${CMD}" in
       +([0-9]))
         read OTHERS <$( ItemPath ${CMD} )
         printf "${OTHERS} ${ITEM1} ${ITEM2} ${ITEM3} ${ITEMS}" >${FILE}
         ;;
       "I") printf "${ITEM1} ${ITEM2} ${ITEM3} ${ITEMS}" >${FILE} ;;
       "K") printf "${ITEM1}          ${ITEM3} ${ITEMS}" >${FILE} ;;
       "S")
         CHILD=$( NewItem )
         printf "${ITEM2} ${ITEM3}"                   >$( ItemPath 
${CHILD} )
         printf "${ITEM1} ${ITEM3} ${CHILD} ${ITEMS}" >${FILE}
         ;;
       "G")
         read BIT <&10
         case "${BIT}" in
           "")  printf "K          ${ITEM1} ${ITEM2} ${ITEM3} ${ITEMS}" 
 >${FILE} ;;
           "0") printf "${CONS0} G ${ITEM1} ${ITEM2} ${ITEM3} ${ITEMS}" 
 >${FILE} ;;
           "1") printf "${CONS1} G ${ITEM1} ${ITEM2} ${ITEM3} ${ITEMS}" 
 >${FILE} ;;
         esac
         ;;
       "E") echo "End of program." ; exit 0 ;;
       "P") printf "${ITEM1} O Z ${ITEM2} ${ITEM3} ${ITEMS}" >${FILE} ;;
       "Z") echo 0 >&11 ; printf "${ITEM1} ${ITEM2} ${ITEM3} ${ITEMS}" > 
${FILE} ;;
       "O") echo 1 >&11 ; printf "${ITEM1} ${ITEM2} ${ITEM3} ${ITEMS}" > 
${FILE} ;;
     esac
   done
}

CONS0=$( printf '(S(K(S(SI(K(KI))))))' | ParseItem )
CONS1=$( printf '(S(K(S(SI(KK)))))'    | ParseItem )
ROOT_ITEM=$( Ingest "$1" | ParseItem )
Exec ${ROOT_ITEM} 10<Input.rec 11>Output.rec

----------------------------------------------------------------------

Assumptions / limitations:

* The input bits must be stored in a text file [or conceivably a named 
pipe] called "Input.rec", which must contain one "0" or "1" per line. 
Even if the program being interpreted has no input commands!

* The output bits are stored in a similar file Output.rec, which will be 
created if it doesn't already exist.

* The shell script must be executed with the filename of the SKI program 
as its only input. Otherwise you'll get an error message about being 
unable to open the file.

* There is minimal error-checking. It isn't hard to add this, but it 
makes the shell script longer than 99 lines. I wanted to show how 
"simple" the job is, without a lot of error-handling clutter.

* In particular, mismatched brackets or incorrect SKI programs will 
probably provoke utterly baffling failures.

To try it out, you need some SKI programs to run. Possibly the simplest 
program is one that reads 0 bits of input and returns 0 bits of output:

   KK

Even simpler is a program who's output is identical to its input:

   I

Unimpressed, huh? OK, let's try something a bit more interesting. This 
program takes no input, and always produces 101 as its output:

   K
   (
     S(K(S(SI(K K ))))
     (
       S(K(S(SI(K (KI) ))))
       (
         S(K(S(SI(K K ))))
         K
       )
     )
   )

I said the lambda calculus is verbose, right? Well, the SKI calculus is 
even worse! (Wikipedia says something about  Θ(3^n) worse. Then again, 
Wikipedia also says [citation needed], so...) By adding more possible 
predefined letters, it's possible to reduce the bloat --- at the expense 
of making the interpreter more complicated, of course.

To do anything much less trivial than this... well, the programs start 
to get LARGE. I need to build myself a "compiler" to produce such 
programs. (But not in Bash - GOD NO!)

Still, this whole idea came about as I was thinking about a Bash script 
that contains a tarball that it then unzips, which mostly contains a 
giant SKI program and the SKI interpreter. Good luck to anybody trying 
to reverse-engineer that! ;-)


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.