|
|
I haven't attempted to actually compile and run any Java code for about
10 years now. So for the past month, I've been writing Java code. I will
probably write about the actual coding experience later; right now I
want to talk about what I've produced.
It's a small program called Logic Box. It lets you do logic programming,
after a fashion.
More exactly, Logic Box solves predicates. For the purposes of this
discussion, a "predicate" is simply a logical statement which may be
true or false. For example, 2 = 2 is obviously true, while 5 = 7 is
clearly false.
Given such a statement, Logic Box can tell whether it is true or false.
But that's pretty easy, really. The interesting part is that a predicate
may contain /variables/. Such a predicate is now no longer simple "true"
or "false"; it depends on what the variables contain. So we can write
predicates which are /sometimes/ true and /sometimes/ false.
Given such a predicate, Logic Box will attempt to /make/ the predicate
true. That is, it tries to figure out what you need to set the variables
to in order to "solve" the predicate (i.e., make it true).
The simplest kind of predicate is the equal predicate. It compares two
expressions (not predicates) and is satisfied if both expressions are
/identical/, once any bound variables have been replaced with the values
they are bound to. Not all variables have to be bound to something; for
example, x = x is unconditionally true, regardless of what value x has.
There are currently three sorts of expression: an integer, a variable
name, or a "record". A record has a name and zero or more fields, which
are arbitrary expressions. (In other words, records can be nested to any
depth.) You can use the name of the record as the actual data. For
example, Foo[] and Bar[] are two distinct values. Or you can use the
name to indicate the "sort" of data, and the fields to represent the
actual data. For example, Date[28, 03, 1980] or Time[09, 27, 56]. (Both
of these are three integers, but they are not comparable because the
names are different.)
(Note carefully: "Foo" is a variable, "Foo[]" is a record.)
Solving a predicate such as 7 = x isn't hard. The solution is simply x
:= 7. (Logic Box uses the notation that "=" denotes a /test for/
equality between two arbitrary expressions, while ":=" denotes the
/assignment/ of a value to a variable, otherwise known as a "binding".)
But consider the predicate
Triple[a, x, y] = Triple[1, Pair[2, a], Pair[x, 7]]
Can this be solved? Actually yes, it can:
a := 1; x := Pair[2, 1]; y := Pair[Pair[2, 1], 7]
If you take the original predicate and replace each variable with the
corresponding binding, you will find that
Triple[1, Pair[2, 1], Pair[Pair[2, 1], 7] = Triple[1, Pair[2, 1],
Pair[Pair[2, 1], 7]
Clearly both sides are identical, so the predicate is solved.
Notice how variables are not limited to containing integers; they may
contain arbitrary expressions - even expressions involving other
variables. However, a variable can never be bound to an expression
containing itself. That would result in an infinite loop. So something
like x := Pair[1, x] is not allowed. (Imagine trying to replace every x
with Pair[1, x]. The process never halts.)
As well as the equal predicate, you can use the usual AND ("&") and OR
("|") operators. By default, the operator precedence is =, &, |, in that
order. You can use brackets to override this as usual. (There is a
slight glitch: brackets are not allowed inside expressions, only inside
predicates. So "Pair[(x), (y)] = z" is not allowed, neither is "(Pair[x,
y]) = z", but "(Pair[x, y] = z)" is fine.)
The meaning of AND is pretty obvious; the predicate is solved only if
the solution solves /both/ child predicates simultaneously. The meaning
of OR is equally obvious. What is perhaps /not/ obvious is that
predicates involving OR can have /multiple solutions/. Consider, for
example, x = 1 | x = 2. This has two solutions, x := 1 and x := 2. In
general, a given predicate can have an unlimited number of solutions,
and Logic Box will attempt to find all of them for you.
Perhaps most interestingly of all, you can define "named predicates",
which behave something like subroutines. In particular, a named
predicate can invoke itself recursively, creating a way to implement
loops. (This would otherwise be impossible.)
Logic Box comes with a small bunch of named predicates, mostly to do
with list processing. All the list predicates begin with "L". Some
predicates take a list, but treat it as a poor man's set. These begin
with "S".
As an example, the SMember{} predicate tests whether a given item is a
member of a set (actually a list). So, for example,
SMember{2, List[1, 2, 3]}
is true, while
SMember{5, List[1, 2, 3]}
is false.
Perhaps the most fascinating thing about logic programming is that you
can write a predicate like SMember{}, and once written, it works
forwards /and backwards/! For example,
SMember{5, List[1, x, 3]}
has one solution, namely x := 5. Logic Box doesn't tell you /if/ the
predicate is true; it tells you how to /make/ it true. In this case, by
inserting the 5 into the set. We haven't told Logic Box to do this; the
code for SMember{} doesn't say how to accomplish such a thing. It's just
an automatic consequence of the language.
We could similarly have written
SMember{5, List[x, y, z]}
This generates 3 solutions:
x := 5;
y := 5;
z := 5;
Clearly each of these solves the predicate. Alternatively,
SMember{x, List[1, 2, 3]}
has solutions
x := 1;
y := 2;
z := 3;
Here what we have done, in effect, is to make Logic Box loop over all
the elements of the set. So SMember{} can mean "/is/ this a member of
the set?", but it can also mean "/get/ a member of the set". You could
them write a second predicate involving x, join them with an AND, and
Logic Box will find any solutions there might be. For example,
SMember{x, List[1, 2, 3]} & SMember{x, List[2, 4, 6, 8]}
will find any elements in the intersection of the two sets. (In this
case, x := 2 is the only solution.) Replace the AND with an OR and you
get the union of the sets instead.
Another, more list-oriented predicate is LJoin{}. This works in a
slightly strange way. Given three lists, the predicate is satisfied if
joining the first two lists together yields the last list. For example,
LJoin{List[1, 2, 3], List[4, 5, 6], List[1, 2, 3, 4, 5, 6]}
is true, because List[1, 2, 3] + List[4, 5, 6] = List[1, 2, 3, 4, 5, 6].
In this form, the predicate is testing /if/ joining two lists yields a
third. But if we replace the third list with a variable name, Logic Box
will /make/ the predicate true, by computing what the join of the two
lists is:
LJoin{List[1, 2, 3], List[4, 5, 6], out}
out := List[1, 2, 3, 4, 5, 6]
So we can test /if/ this is the join of two lists, or we can /get/ the
join of two lists. But we can also run this backwards:
LJoin{List[1, 2, 3], y, List[1, 2, 3, 4, 5, 6}
y := List[4, 5, 6]
Logic Box has stripped the prefix 1, 2, 3 off of our long list. We
didn't tell it how to do that in the code for LJoin{}; again, it's
automatic. We can /even/ to this:
LJoin{x, y, List[1, 2, 3, 4, 5, 6]}
This has 7 solutions:
x := List[]; y := List[1, 2, 3, 4, 5, 6];
x := List[1]; y := List[2, 3, 4, 5, 6];
x := List[1, 2]; y := List[3, 4, 5, 6];
x := List[1, 2, 3]; y := List[4, 5, 6];
x := List[1, 2, 3, 4]; y := List[5, 6];
x := List[1, 2, 3, 4, 5]; y := List[6];
x := List[1, 2, 3, 4, 5, 6]; y := List[];
Again, Logic Box has performed a search and discovered every possible
way of satisfying the predicate - in other words, every possible way to
split a list in half. So the /joining/ predicate can be used for
/splitting/.
We can of course go still further, doing loopy things like
LJoin{List[x, y, z], w, List[1, 2, 3, 4, 5, 6]}
to fetch the first three elements of the list plus the remainder. You
could of course implement all of this functionality in a normal
programming language; logic programming just lets you do it all with one
small piece of code.
Beware that it's very easy to ask a question which has infinity answers.
For example,
LJoin{x, y, z}
has infinity solutions, as Logic Box attempts to construct all possible
lists of every combination of sizes. Also, sometimes a predicate
/doesn't/ work in a particular direction; for example LReverse{} works
forwards, but goes into an infinite search if run backwards. It depends
on the exact order in which Logic Box searches the solution space, which
depends on the exact way in which the predicate is phrased.
I should point out that the examples above don't actually work as
written. I describe expressions such as List[1, 2, 3] as "packed lists".
The language provides no way to iterate over such structures. Instead,
you have to use "unpacked lists", which look like
Node[1, Node[2, Node[3, End[]]]]
Clearly that's much more typing (although it allows easy iteration).
There's an LUnpack{} predicate which can unpack a list for you. (It uses
an internal hard-wired language primitive to achieve this feat.) There
ought to be one LPack{} predicate, but unfortunately we need a separate
LPack{} and LUnpack{}, due to issues to search space ordering. (The two
predicates have identical implementations, except for the ordering of
their terms.)
[In future, I might make LPack{} itself a hard-wired primitive, allowing
it to work in any direction. It's mildly annoying that this is necessary
though...]
As an example of use,
LUnpack{List[1, 2, 3], x} & LReverse{x, y} & LPack{out, y}
In both LPack{} and LUnpack{}, the first argument is a packed list, and
the second is an unpacked list. The difference is which one is the
"input" and which the "output". (Alternatively, you can just write out
the unpacked lists by hand. It's not /that/ hard to do...)
Now that we know about unpacked lists, let's look at the source code for
LJoin{}:
LJoin{xs, ys, zs}:
(xs = End[] & ys = zs) |
(~head ~tailX ~tailZ
xs = Node[head, tailX] &
zs = Node[head, tailZ] &
LJoin{tailX, ys, tailZ}
)
;
That's /it/. That's the whole thing. That's how we can join, strip and
split lists all at once. That's all there is to it.
So how does it work? Well, the first line says that the predicate can be
satisfied if the first list is empty, and the second two lists are
identical. (I.e., prefixing nothing does nothing.)
The notation "~x ..." means "there exists a value x such that ..."
Essentially what it does is to create a local variable. As you can see,
LJoin{} calls itself recursively, so we don't want variables from
different invocations clashing because they have the same name. (The
arguments to a predicate are automatically local, by a different mechanism.)
So, the equations are saying:
1. Both xs and zs are Node[] (not End[]).
2. There exists a value (head) which is the first element of xs and zs.
3. Recursively, join the tails of the lists.
Eventually we will reach the end of xs, terminating the loop. We see
that either xs = End[] or xs = Node[]. These two possibilities are
mutually exclusive; they cannot both occur. So - provided xs is defined
- eventually the loop must end.
What might /not/ be clear is how this generates multiple solutions in
cases such as LJoin{x, y, List[1, 2, 3]}. Where is the "searching"
happening? Well, that would be the OR operator. It tries both
alternatives. Look again at the solutions list: the /first/ solution is
xs = End[] and ys = zs, because that's the first line in LJoin{}.
Next, it tries xs with one element. It knows what element, because zs =
Node[head, tailZ], and zs is defined to begin with a 1, so xs must begin
with a 1. In this fashion, it can iterate over all the possible splits.
Eventually we reach the point where zs = End[] rather than Node[], so
the second case fails, and only the first case is applicable, ending the
loop.
You may find it entertaining to try to write some more predicates of
your own - e.g., to add a new element to a set-list if not already
present, to interleave to lists, etc.
So that's what Logic Box /does/. But how does it /work/?
Well, implementing Logic Box is about 50 lines of Haskell. It's a bit
more in Java, but it's not /that/ big. I spent about 90% of my
programming effort on making Logic Box /interactive/. It actually
/shows/ you what it's doing, as it does it.
(Unfortunately, for mildly complicated predicates, the result just ends
up being a giant wall of text which is still disappointingly hard to
follow. But I tried...)
So, let's start with the equal predicate. This compares two expressions.
If there are no variables involved, you simply compare the two
expressions, and either they are already identical, or they will never
be identical. So that's easy. But what if there are variables?
The general algorithm is apparently called "unification". It's
surprisingly simple, although it is quite fiddly to get right.
The general idea is to compare the two expressions. If they are
identical, you're done. If one of the expressions is a variable, you
bind it to the other expression. (I.e., you make the other expression
the variable's value.) But what if the variable /already/ has a binding?
Well, you /could/ try unifying the new value against the existing one,
but that gets kinda messy. There's a much simpler way.
What you do is you "cook" the expressions before unifying them. That is,
you replace any variables which already have bindings with the
expressions they're bound to. In this way, any variables which already
have bindings vanish from the input, and by unifying the inputs you're
automatically unifying the contents of those variables. It's much easier
to program that way.
There is one slight bit of trickiness. Consider the predicate
x = y & y = 5
Logic Box answers that the solution is
x := 5; y := 5;
which is true. However, when unifying the first equality, we get the
partial solution x := y. When we unify the second one, we add y := 5. So
at this point, we have
x := y; y := 5;
To get from this to the final result, we perform a "simplification".
This basically involves looking up every variable and cooking its
expression using the current solution. If the expression changes in this
process, set the cooked expression as the variable's new expression and
restart the loop. The loop terminates when no amount of cooking changes
any expression.
So, the complete algorithm is this:
1. Cook both expressions.
2. If both expressions are identical, success.
3. If one of the expressions is a variable:
A. If the variable appears in the other side as well, fail.
B. Bind the variable to the other expression.
C. Run the simplification loop.
D. Success.
4. If both expressions are records:
A. If the record names do not match, fail.
B. If the field counts do not match, fail
C. Unify each pair of fields in turn.
So unification works with a "current solution", which is gradually
extended as more data is processed. In particular, the final step above
performs unification recursively. The outer unification fails if any of
the inner ones fail. Otherwise, the updated output from each sub-step
feeds into the next sub-step, until no more steps remain.
The AND predicate is very simple on paper, but was the hardest one to
program. In theory, you run each predicate until it produces a result,
and then run the next predicate with the result from the previous step
as its initial solution.
Oh, wait, predicates can potentially produce /multiple/ solutions, remember?
This means that you have to try all possible combinations. Again, on
paper, that's simple:
out1 := run predicate 1 [initial solution]
for each solution1 in out1
out2 := run predicate 2 [solution1]
for each solution2 in out2
out3 := run predicate 3 [solution2]
for each solution3 in out3
...
Doing it interactively is damned fiddly. You have to be able to stop and
restart multiple nested loops. It took me a while to figure that out.
The algorithm I ultimately came up with is this:
1. Set the current predicate to predicate 1.
2. Run the current predicate.
3. If a solution is received:
A. If this is the last predicate, return the solution.
B. Pass the solution to the next predicate, and make that the current
one.
4. If EOF is received:
A. If this is the first predicate, return EOF.
B. Make the previous predicate the current one.
As far as I can tell, this algorithm works. But it took a heck of a lot
of trial and error to get to this point.
By contrast, the OR predicate is very easy. Just run each predicate in
turn until EOF is received. Note that unlike AND, with OR every
predicate gets an identical initial solution. (The initial solution for
OR itself.) So the rules are simple; run current predicate, pass on any
solutions received, move to the next predicate on EOF, end when you run
out of predicates.
The way the "exists" predicate is implemented is slightly odd. This
predicate consists of a variable name followed by a predicate. Each time
it's run, it generates a unique ID number, and then replaces all
occurrences of the local variable with this ID number. The resulting
"cooked" predicate is then run as usual.
These unique IDs are called "temporary variables" (because usually they
don't form part of the final solution, only the temporary internal
state). They print out as ?1?, ?2?, ?3?, etc. (You can't type them in;
they can only be created by exists.) You may from time to time see these
pop up in a solution. Well, now you know what they are.
Named predicate calls are also very simple. You take the predicate name,
look it up in a dictionary, check it's got the right number of
arguments, replace each argument variable with the supplied argument
value, and run the resulting predicate. Done.
As I mentioned, there's also a small number of named predicates who's
implementation is actually hard-wired into the program. (I.e., it runs
Java code rather than Logic Box code.) In particular, there's
PRIM_True{} and PRIM_False{}, which unconditionally succeed or fail.
Post a reply to this message
|
|