POV-Ray: Newsgroups: povray.off-topic: A min-heap: A min-heap

POV-Ray : Newsgroups : povray.off-topic : A min-heap : A min-heap		Server Time 12 Jul 2025 16:34:53 EDT (-0400)
From: Invisible
Date: 12 Oct 2010 09:49:29
Message: <4cb46769$1@news.povray.org>
Ever since I read that book I got for Christmas, I've been carrying 
around this implementation of a minimum-heap ("min-heap") in my head. 
The design always seemed quite neat to me, and it seemed to have 
impressive properties. But today I sat down to actually put those 
properties to the test.

I've written a short program that randomly inserts and deletes elements 
from the heap and measures how badly unbalanced the heap tree is. As per 
my expectations, it seems that inserting produces nicely balanced trees. 
Somewhat surprisingly, I have discovered that deletions can produce 
arbitrarily unbalanced trees.

The actual algorithm can be succinctly described thus:

   data Heap x = Node x (Heap x) (Heap x) | Empty

   insert x' (Node x h0 h1) = Node (min x x') (insert (max x x') h1) h0
   insert x' (Empty       ) = Node x' Empty Empty

   get_min (Node x _ _) = x
   get_min (Empty     ) = error "empty heap"

   delete_min (Node _ h0 h1) = merge h0 h1
   delete_min (Empty       ) = error "empty heap"

   merge Empty h1    = h1
   merge h0    Empty = h0
   merge h0    h1    =
     if get_min h0 < get_min h1
       then Node (get_min h0) h1 (delete_min h0)
       else Node (get_min h1) h0 (delete_min h1)

For those who do not grok Haskell, let me summarise thus:

1. A heap is simply a binary tree where every node contains one datum 
and has between 0 and 2 children.

2. Insert:
   - Inserting into an empty heap yields a trivial 1-node tree.
   - Inserting into a non-empty heap proceeds as follows:
    A. Construct a new tree node containing the smaller of the existing 
root element and the new element to be inserted.
    B. The right child of the new node is the left child of the original 
root node.
    C. The left child of the new node is constructed by recursively 
inserting the larger of the two elements into the previous right subtree.

3. Get minimum element: Just return the datum from the root node of the 
tree.

4. Delete minimum element: Just take the two subtrees of the root node 
and heap-merge them.

5. Heap merge:
   - Merging a heap with an empty heap is no-op.
   - To marge two non-empty heaps:
     A. Determine which heap has the lowest datum in its root node.
     B. Put the lowest datum into the new root node, and recursively 
delete it from the subtree you got it from.

It might just be me, but I can't help noticing how the English 
description is vastly more wordy yet less precise than the source code. 
But anyway...

Notice how an insertion always swaps the two subtrees, and how it always 
inserts into the (old) right subtree [which becomes the new left 
subtree]. In this way, the left subtree of any node is always the larger 
than or the same size as the corresponding right subtree. In other 
words, insertions leave the tree balanced.

Notice also how a deletion (or rather, a merge) always puts the subtree 
you just deleted from as the right subtree, reinforcing the same 
invariant. The left subtree can never be smaller than the right subtree.

However, they *can* be wildly different sizes. If, by some misfortune, 
it happens that you have a node where *all* the elements in one subtree 
are strictly less than *all* of the elements in the other subtree, then 
deleting from this heap will cause it to become more and more 
unbalanced. And no amount of swapping subtrees around is going to fix that.

We have that deletion leaves the heap in a state where a subsequent 
insertion will definitely insert into the smaller subtree, which is 
good. In other words, inserting will always bring the tree back towards 
being balanced again. But that still doesn't change the fact that, for 
sufficiently pathological cases, deletions can make the tree unbalanced 
to an unlimited degree.

Since a heap is (quite deliberately) only *partially* sorted, there's 
nothing to prevent this kind of thing from happening. The only way we're 
gonna fix this is to start moving stuff from one subtree to the other if 
it gets too unbalanced. But can we detect this condition in a way that 
doesn't slow the whole thing down too much?

Insert has the invariant that the sizes of the left and right subtrees 
are related. At all times, we have either L = R or L = R + 1. The 
constant swapping of the two subtrees mean that they both fill at an 
equal rate.

Insert propagates items from the top downwards. Which way the item 
propagates is down to the insertion function. But deletion moves items 
up to the top. Which side the item comes from depends on which item is 
the lowest, which the deletion function has no control over. Hence the 
problem.

Deleting from the left subtree causes it to become the right subtree. If 
we always deleted from the left, the tree would always remain balanced, 
and there would be no problem. Deleting from the right subtree, however, 
can unbalance the tree.

Now obviously, if the lowest element is in the right subtree, then it's 
in the right subtree. There's nothing we can do about that. But this 
(and only this) condition looks like a suitable signal that some 
rebalancing is required. The question is, what do we do to rebalance?

One obvious possibility is to completely rebuild the entire subtree at 
this moment. This operation has O(n log n) complexity, and is obviously 
ludicrously expensive to do on (potentially) every single delete operation.

A better possibility is to move one element from the left subtree into 
the right subtree if it's going to become too small. Actually to do this 
reliably requires that we know whether L = R or L = R + 1. In the former 
case, we do nothing, and in the latter case we rebalance by moving a 
single element across. Here's the code:

   data Balance = Equal | Unequal

   invert Equal   = Unequal
   invert Unequal = Equal

   data Heap x = Node Balance x (Heap x) Heap x) | Empty

   insert x' (Node b x h0 h1) = Node (invert b) (min x x') (insert (max 
x x') h1) h0
   insert x' (Empty         ) = Node Equal x Empty Empty

   get_min (Node _ x _ _) = x
   get_min (Empty       ) = error "empty heap"

   delete_min (Node x _ h0 h1) = merge b h0 h1
   delete_min (Empty         ) = error "empty heap"

   merge _ Empty h1    = h1
   merge _ h0    Empty = h0
   merge _ h0    h1    =
     if get_min h0 < get_min h1
       then Node (invert b) (get_min h0) h1 (delete_min h0)
       else rebalance    b  (get_min h1) h0 (delete_min h1)

   rebalance Equal   x h0 h1 = Node Unequal x h0 h1
   rebalance Unequal x h0 h1 = let x' = get_min h0 in Node Equal x 
(delete_min h0) (insert x' h1)

Here the functions "invert", "merge" and "rebalance" are private 
functions not accessible to external clients.

I've run this code through my simulator, and after many minutes of 
number crunching it has failed to produce one single unbalanced tree. 
This of course is not proof, but it shows at least that this structure 
works somewhat better than the original. (And for very little cost in 
either code size of run-time complexity.)

Unfortunately, because we've added a new invariant to the data structure 
(namely that the balance factor is always exactly 0 or -1), we lose the 
ability to cheaply merge two heaps. (In the old version, this was 
already a very cheap way to make an arbitrarily unbalanced tree; just 
merge a size-1 tree with a size-1000 tree and watch how lopsided it now 
is!) You would now be forced to do some kind of elaborate rebalancing 
procedure to restore the new invariant on the subtree sizes.
Post a reply to this message