|
|
On 05/09/2011 07:45 PM, Warp wrote:
> Orchid XP v8<voi### [at] devnull> wrote:
>> insert :: Ord x => x -> MinHeap x -> MinHeap x
>> insert x' (Leaf ) = Node x' Leaf Leaf
>> insert x' (Node x h0 h1) = Node (min x x') (insert (max x x') h1) h0
>
> This is a good example of where it becomes confusing. Even after
> studying it for a few minutes I can't really figure out what is it
> that it's doing.
The first line is a type signature. It says
"For any type x which has ordering operations, this function takes an
x and a MinHeap x and returns a MinHeap x."
The next two lines define the function. Each line conceptually means "if
the function call looks like the thing on the left, the result looks
like the thing on the right".
Therefore, if I were to write
insert 5 Leaf
it should be just as if I had written
Node 5 Leaf Leaf
The general rule of Haskell is that any name that begins with a *lower
case* letter is a variable name. (This isn't merely a convention, it is
formally part of the actual language syntax.) This is why "x" is a
variable name, but "Leaf" is not.
(If you're paying attention, you'll notice that "insert" is a variable
name by these rules. In Haskell, a named function is simply a named
variable who's value just happens to be a function. Which is kinda
weird, but consistent.)
So, textually, insert 5 Leaf means Node 5 Leaf Leaf. Operationally, what
does that actually /mean/? Well, in more typical syntax it would be
something like
MinHeap insert(x)
{
MinHeap out = new MinHeap_Node();
out.datum = x';
out.child0 = Leaf.singleton;
out.child1 = Leaf.singleton;
return out;
}
(Since "Leaf" has no data fields, it is essentially a global constant.)
In other words, we're constructing a new Node object and setting it to
point to the new datum we're inserting, and setting both of its children
to be the empty heap.
So how about that last line?
insert x' (Node x h0 h1) = ...
So if you're inserting into a heap that /isn't/ empty, then
x' is the thing you want to insert.
x is the datum in the root node of the heap.
h0 is the left child of the root node.
h1 is the right child of the root node.
Given that, the right-hand side says
... = Node (min x x') (insert (max x x') h1) h0
That parses basically as
... = new Node(min(x, x'), insert(max(x, x'), h1), h0);
In other words, create a new heap node, where the datum is x or x',
whichever is lowest. The left child is the result of running
insert(max(x, x'), h1) - in other words, recursively insert x or x'
(whichever is highest) into h1. The right child is h0.
So, we compare the existing root datum to the new datum, and keep the
lowest one in the new root node. We then recursively insert the other
datum into one of the child heaps.
Notice carefully how h0 and h1 get swapped around. This keeps the heap
balanced. (If you stare at delete_top, you'll see it also swaps the
children around - and deletes from the other side.) I haven't formally
proven it, but I /think/ this arrangement keeps the tree balanced for
all valid operation sequences. Certainly it does if you repeatedly
insert items; I'm not 100% sure about interleaved inserts and deletes.
So really, there's two ways to interpret Haskell functions. You can view
each one as a textual substitution. Given pencil and paper, you can
expand out the work that each one does. For example:
insert 1 (insert 2 (insert 3 (insert 4 empty)))
insert 1 (insert 2 (insert 3 (insert 4 Leaf))) [Because empty = Leaf]
insert 1 (insert 2 (insert 3 (Node 4 Leaf Leaf)))
insert 1 (insert 2 (Node 3 (insert 4 Leaf) Leaf))
insert 1 (insert 2 (Node 3 (Node 4 Leaf Leaf) Leaf))
insert 1 (Node 2 (insert 3 Leaf) (Node 4 Leaf Leaf))
insert 1 (Node 2 (Node 3 Leaf Leaf) (Node 4 Leaf Leaf))
Node 1 (insert 2 (Node 4 Leaf Leaf)) (Node 3 Leaf Leaf)
Node 1 (Node 2 (insert 4 Leaf) Leaf) (Node 3 Leaf Leaf)
Node 1 (Node 2 (Node 4 Leaf Leaf) Leaf) (Node 3 Leaf Leaf)
It's not hard to throw together a little program that shunts text around
like this. (Although the brackets are fiddly if you want to do it right.)
If you draw a little graph, the final expression there looks like
Node 1
Node 2 Node 3
Node 4 Leaf Leaf Leaf
Leaf Leaf
Which is a balanced tree. (As balanced as a tree containing 4 items can
be, anyway.)
So textually, that's how to run Haskell. Operationally, the thing on the
left checks which "constructor" is present (i.e., whether a node is a
Leaf or a Node), and then binds variables to things. It then constructs
the value described on the right.
It's not shown in my example, but the notation is quite powerful. It
lets you do things like inspect multiple levels of the tree:
rotate :: MinHeap x -> MinHeap x
rotate (Node x1 (Node x2 c1 c2) c3) = Node x2 c1 (Node x1 c2 c3)
That's a tree rotation [which obviously isn't a valid thing to do to a
min-heap tree, but you get the idea]. Every time "Node" appears on the
left, it means "this part has to be a Node object". Every time it
appears on the right, it means "please construct a new Node object".
Once you get over the notation, it's actually a very clear and succinct
way to work. Consider the alternative:
function rotate(root)
{
if (root.type == "node")
{
x1 = root.datum;
c3 = root.child1;
child = root.child0;
if (child.type == "node")
{
x2 = child.datum;
c1 = child.child0;
c2 = child.child1;
return new Node(x2, c1, new Node(x1, c2, c3));
}
}
}
All that code to do the same thing as the 1-liner above. Not only is the
Haskell version shorter, it's clearer: It says "my input must look like
THIS, and then I will change it to look like THAT". You don't have to
mentally execute a series of one-at-a-time baby steps to see what's
happening. It's a bit like
temp1 = 1 + 2;
temp2 = 3 + 4;
temp3 = temp1 * temp2;
temp4 = temp3 ^ 2;
verses
temp4 = ((1+2) * (3+4))^2;
The latter is not only shorter, but infinitely clearer.
...again, "more maybe I'm just delusional"...
Post a reply to this message
|
|