|
![](/i/fill.gif) |
(OK, that was terrible.)
Over the weekend, I implemented an interpreter for the SKI combinator
calculus. In Bash script.
I am a sick, sick individual.
"The who to the what now?" you ask. Well, around about the time Alan
Turing was inventing Turing machines, a guy called Alonzo Church was
inventing something called the lambda calculus. If a Turing machine is
like an ultra-simplified computer, then the lambda calculus is like an
ultra-simplified programming language.
(Of course, all the experts know that "really" they're neither of these
things. They're actually abstract models of computation. But we'll
cheerfully ignore that for a moment.)
The lambda calculus is a programming language which is *so simple* that
it doesn't even have arithmetic built-in. In order to do trivial
calculations, you first have to manually *define* what arithmetic "is".
For every single program. Needless to say, programs in the lambda
calculus are... quite verbose.
The lambda calculus is conceptually simple, but rather fiddly to parse,
and even more awkward to execute. (In particular, there are some fun
edge-cases to do with scoping rules.) However, every possible lambda
expression can be encoded into an equivalent expression that contains
only three pre-defined values, traditionally denoted by the letters S, K
and I.
Let me say that again: The SKI combinator calculus is a programming
language consisting only of three letters (and millions of brackets),
and it's somehow Turing-complete.
The rules for executing SKI expressions are considerably simpler. And
the language has much easier to parse. So I set about writing an
interpreter for it. As a Unix shell script. It uses some features
specific to the Bourne Again Shell ("Bash"), so it's not portable to
other systems. But Bash is everybody's default shell anyway, so...
As you may know, any possible computable function can be computed by a
suitable Turing machine. However, Turing machines have no notion of
input/output operations, which is what we generally want real-world
programs to have. Likewise, neither the lambda calculus nor the SKI
calculus has a way to do I/O. So I added one.
It's based on the idea of how the original Haskell I/O system used to
work. (I.e., before we got monads.) Initially, I made the system so that
the SKI program accepts an arbitrary-length stream of code numbers as
input, and returns an arbitrary-length stream of code numbers as output.
I later changed it to be a stream of *bits*. My plan was that the
interpreter would read these codes and do different I/O operations based
on the "commands" it receives. I've since decided that since even the
most trivial SKI program is quite hard enough, this extra layer isn't
needed. (Yet?)
I've dubbed the resulting language "Pointless". So how do you execute a
Pointless program? The rules are quite simple:
* The only legal input characters are: S K I ( )
* The input source code is a sequence of "items". An "item" consists of
a single letter, or a pair of brackets enclosing ONE or more sub-items.
(It is an error for there to be ZERO items in the brackets.)
* To "execute" the program, you apply a series of transformations to the
source code, according to the rules below.
* First, append the items G, E, P to the right-hand end of the program.
* Now endlessly apply the transformation rules. Look at the LEFT-MOST
item in the program, and select the applicable rule:
RULE #1: "(" - Delete the opening bracket and the MATCHING closing bracket.
Rule #2: "I" - Delete the I.
Rule #3: "K" - Delete the K, keep the next item, delete the item after that.
Rule #4: "S" - Delete the S. Number the following items as 1, 2, 3.
Remove them, and replace with the sub-expression "1 3 (2 3)".
Rule #5: "G" - Read one bit of input. Delete the G and replace it with:
+ S(K(S(SI(K (KI) )))G if the bit read was 0
+ S(K(S(SI(K (K ) )))G if the bit read was 1
Rule #6: "E" - End of program. Stop executing.
Rule #7: "P" - Delete the P, keep the next item, insert O, Z after it.
Rule #8: "O" - Delete the O. Output a 1 bit.
Rule #9: "Z" - Delete the Z. Output a 0 bit.
The first 4 rules constitute the entire SKI calculus. The 5 subsequent
rules specify how to perform I/O.
For example, consider the expression
SSKKII
By rule #4, 1=S, 2=K, 3=K, and we get
SK(KK)II
Applying rule #4 again, 1=K, 2=KK, 3=I and we have
KI((KK)I)I
Now rule #3 applies, which deletes the ((KK)I) item:
I
As a slightly longer example,
S(S(KS)K)IZO
S: 1=(S(KS)K), 2=I, 3=Z
(S(KS)K)Z(IZ)O
^ ^ Brackets
S(KS)KZ(IZ)O
S: 1=(KS), 2=K, 3=Z
(KS)Z(KZ)(IZ)O
^ ^ Brackets
KSZ(KZ)(IZ)O
K: 1=S, 2=Z
S(KZ)(IZ)O
S: 1=(KZ), 2=(IZ), 3=O
(KZ)O((IZ)O)
^ ^ Brackets
KZO((IZ)O)
K: 1=Z, 2=O
Z((IZ)O)
Output 0
((IZ)O)
^ ^ Brackets
(IZ)O
^ ^ Brackets
IZO
I
ZO
Output 0
O
Output 1
So, in the end, this program outputs 001.
It turns out you can write a parser that parses items and an execution
engine that implements the 9 rules in less than 99 lines of shell
script, as I shall shortly demonstrate.
My initial thought was that certainly the first 4 rules ought to be
pretty simple to implement using sed (the Unix Stream EDitor). However,
sed only understands regular expressions, and such a construct cannot
handle *nested* brackets. For that, you need a context-sensitive grammar.
After many experiments, I hit upon a single idea: represent
single-letter items as themselves, and represent compound items by ID
numbers. So the shell script first runs the input through the "tr"
command (TRanslate), which strips all characters except the ones we're
interested in. (In particular, it strips all whitespace.) Then the
script creates one file on disk for each compound item.
For example, the expression
S(SIK)(KI(SS))
might be translated into a bunch of disk files that look like this:
0001.item: S 2 3
0002.item: S I K
0003.item: K I 4
0004.item: S S
Now bash has a "read" command, which reads one line of input and splits
it into "words". By putting spaces around each item, this makes Bash
automatically split up the words, and put them into separate variables.
(It also means we can count higher than 9, of course...)
The parser is a fairly simple affair; once all spaces have been
stripped, look at each character. If it's an open bracket, allocate a
new item ID and start parsing items into that file. If it's a close
bracket, return to the caller [presumably a bracket handler, which will
then back out of the current bracket sequence].
Now ordinarily Bash tries to read lines. But according to Google [more
exactly, Stack Overflow], the magical incantation
IFS='' read -r -n1 CHAR
will read exactly one character into ${CHAR}. (IFS='' disables all
delimiter characters, -r disables character escapes in the input string,
-n1 means "read 1 char".)
Once everything is item files containing space-delimited tokens, a call like
read TOKEN1 TOKEN2 TOKENS <Filename
will redirect stdin to "Filename", and execute the "read" command, each
puts the first token in ${TOKEN1}, the second in ${TOKEN2}, and ALL
remaining tokens into ${TOKENS}.
At this point, it's just a matter of reading the item representing the
top-level program expression, figuring out what the first token is, and
overwriting the item file with the new, modified expression. (In some
cases, that means allocating new items, but for the most part it's just
rearranging existing ones.)
So, without further ado, here is the actual shell script:
----------------------------------------------------------------------
#!/bin/bash
shopt -s extglob # Allow regex in case patterns.
if [ ! -r "$1" ]; then echo "Can't open file '$1' for reading." ; exit
10 ; fi
if [ -d Heap ]; then rm -r Heap ; fi
mkdir Heap
echo 1 >Heap/Next
function NewItem()
{
local ITEM=$( cat Heap/Next )
echo ${ITEM}
let ITEM++
echo ${ITEM} >Heap/Next
}
function ItemPath() { printf "Heap/%04d.item" $1 ; }
function Ingest()
{
printf '('
tr -cd 'SKI()' <"$1"
printf 'GEP)'
}
function ParseItem()
{
IFS='' read -r -n1 CHAR
if [ "${CHAR}" == "(" ]; then ParseItems ; return 0 ; fi
if [ "${CHAR}" == ")" ]; then return 1 ; fi
printf "${CHAR} " ; return 0
}
function ParseItems()
{
local ITEM=$( NewItem )
local FILE=$( ItemPath ${ITEM} )
printf "${ITEM} "
while ParseItem >>${FILE} ; do : ; done
}
function Decode()
{
local ITEMS ITEM
read ITEMS <$( ItemPath $1 )
for ITEM in ${ITEMS}
do
if [[ "${ITEM}" =~ [0-9]+ ]]
then printf '(' ; Decode ${ITEM} ; printf ')'
else printf "${ITEM}"
fi
done
}
function Exec()
{
local FILE=$( ItemPath $1 )
while true
do
Decode $1 && printf '\n'
read CMD ITEM1 ITEM2 ITEM3 ITEMS <${FILE}
case "${CMD}" in
+([0-9]))
read OTHERS <$( ItemPath ${CMD} )
printf "${OTHERS} ${ITEM1} ${ITEM2} ${ITEM3} ${ITEMS}" >${FILE}
;;
"I") printf "${ITEM1} ${ITEM2} ${ITEM3} ${ITEMS}" >${FILE} ;;
"K") printf "${ITEM1} ${ITEM3} ${ITEMS}" >${FILE} ;;
"S")
CHILD=$( NewItem )
printf "${ITEM2} ${ITEM3}" >$( ItemPath
${CHILD} )
printf "${ITEM1} ${ITEM3} ${CHILD} ${ITEMS}" >${FILE}
;;
"G")
read BIT <&10
case "${BIT}" in
"") printf "K ${ITEM1} ${ITEM2} ${ITEM3} ${ITEMS}"
>${FILE} ;;
"0") printf "${CONS0} G ${ITEM1} ${ITEM2} ${ITEM3} ${ITEMS}"
>${FILE} ;;
"1") printf "${CONS1} G ${ITEM1} ${ITEM2} ${ITEM3} ${ITEMS}"
>${FILE} ;;
esac
;;
"E") echo "End of program." ; exit 0 ;;
"P") printf "${ITEM1} O Z ${ITEM2} ${ITEM3} ${ITEMS}" >${FILE} ;;
"Z") echo 0 >&11 ; printf "${ITEM1} ${ITEM2} ${ITEM3} ${ITEMS}" >
${FILE} ;;
"O") echo 1 >&11 ; printf "${ITEM1} ${ITEM2} ${ITEM3} ${ITEMS}" >
${FILE} ;;
esac
done
}
CONS0=$( printf '(S(K(S(SI(K(KI))))))' | ParseItem )
CONS1=$( printf '(S(K(S(SI(KK)))))' | ParseItem )
ROOT_ITEM=$( Ingest "$1" | ParseItem )
Exec ${ROOT_ITEM} 10<Input.rec 11>Output.rec
----------------------------------------------------------------------
Assumptions / limitations:
* The input bits must be stored in a text file [or conceivably a named
pipe] called "Input.rec", which must contain one "0" or "1" per line.
Even if the program being interpreted has no input commands!
* The output bits are stored in a similar file Output.rec, which will be
created if it doesn't already exist.
* The shell script must be executed with the filename of the SKI program
as its only input. Otherwise you'll get an error message about being
unable to open the file.
* There is minimal error-checking. It isn't hard to add this, but it
makes the shell script longer than 99 lines. I wanted to show how
"simple" the job is, without a lot of error-handling clutter.
* In particular, mismatched brackets or incorrect SKI programs will
probably provoke utterly baffling failures.
To try it out, you need some SKI programs to run. Possibly the simplest
program is one that reads 0 bits of input and returns 0 bits of output:
KK
Even simpler is a program who's output is identical to its input:
I
Unimpressed, huh? OK, let's try something a bit more interesting. This
program takes no input, and always produces 101 as its output:
K
(
S(K(S(SI(K K ))))
(
S(K(S(SI(K (KI) ))))
(
S(K(S(SI(K K ))))
K
)
)
)
I said the lambda calculus is verbose, right? Well, the SKI calculus is
even worse! (Wikipedia says something about Θ(3^n) worse. Then again,
Wikipedia also says [citation needed], so...) By adding more possible
predefined letters, it's possible to reduce the bloat --- at the expense
of making the interpreter more complicated, of course.
To do anything much less trivial than this... well, the programs start
to get LARGE. I need to build myself a "compiler" to produce such
programs. (But not in Bash - GOD NO!)
Still, this whole idea came about as I was thinking about a Bash script
that contains a tarball that it then unzips, which mostly contains a
giant SKI program and the SKI interpreter. Good luck to anybody trying
to reverse-engineer that! ;-)
Post a reply to this message
|
![](/i/fill.gif) |