POV-Ray: Newsgroups: povray.off-topic: A min-heap: Re: A min-heap

POV-Ray : Newsgroups : povray.off-topic : A min-heap : Re: A min-heap		Server Time 5 Jul 2025 23:44:11 EDT (-0400)
From: Invisible
Date: 14 Oct 2010 05:09:49
Message: <4cb6c8dd$1@news.povray.org>
On 13/10/2010 06:06 PM, Warp wrote:
> Invisible<voi### [at] devnull>  wrote:
>> However, now that I'm thinking about it, the thing that occurs to me is
>> that finding the minimum or maximum element of a binary search tree
>> presumably requires O(log N) operations, while for a min-heap you can
>> find the minimum in O(1) operations. (Similarly, you can find the
>> maximum of a max-heap in O(1) time.)
>
>    Your tree data structure can have a reference/handle to the first and
> last elements for fast access (which is, in fact, what eg. std::set and
> std::map do, in order to be able to get the iterators to the first and
> last elements fast). In this case you get fast access to the smallest,
> largest and median elements (the root node is obviously the median in
> a balanced binary search tree).

Presumably the root is only the median if the tree is *perfectly* balanced?

At any rate, I guess you can include direct pointers to any elements 
likely to be of interest. The question then becomes "how fast is it to 
keep those pointers up to date?" At least in the case of min/max, if you 
delete the present minimum, or insert a new, lower one, it should be 
quite efficient to update the pointers.

>> So if you're doing something like implementing a priority queue or
>> similar, a heap looks like being a better choice than a BST.
>
>    A heap is at least much *simpler* to implement (and requires less
> ancillary data) than a balanced binary search tree.

I guess that's one of the reasons I ended up playing with it. It's a 
nice simple algorithm that works well, which is kinda cool.

> However, the fact
> that you can construct a heap on an array without any ancillary data
> whatsoever is still, IMO, by far its biggest advantage as a data structure
> in such a case (ie. a priority queue). You lose much of that efficiency
> if you allocate all the nodes separately.

True enough I guess.

>    I assume that means that Haskell doesn't handle objects (if it even has
> such a concept) by value, only by reference (which, in fact, isn't very
> surprising to me).

Haskell defaults to handling everything by reference, yes. There are a 
couple of reasons for that.

In Java, if you ask for an array of Pixel objects (i.e., "Pixel[]"), 
what you get is an array of *pointers* to Pixel objects. Similarly, in 
Haskell if you ask for an array of Pixel values, you get an array of 
pointers. (This is regardless of whether you ask for an immutable array 
or a mutable array.)

Reason #1: All pointers have the same structure, which means that the 
person writing the array implementation code only needs to handle one 
case - pointers.

Reason #2: Each "Pixel value" could in fact be an unevaluated 
*expression* which will yield a Pixel value when actually evaluated. You 
cannot store an expression in 24 bits of storage. It's impossible.

Of course, in Java you could ask for a byte[] instead of a Pixel[]. Then 
you get an array of actual bytes. But now *you* must write the 
boilerplate code for writing a Pixel into the appropriate place in a 
byte[], and reading it back out again.

You can do a similar thing in Haskell. And it's similarly tedious.

Haskell adds some innovations that make it easier for *the client* of 
this code, but it's still quite tedious to write the code in the first 
place.

To use the Haskell jargon, most types are "boxed" (i.e., they are 
pointers to "boxes" on the heap). Haskell (or rather, GHC) also has 
support for "unboxed" types. (Things like integers, characters, etc., 
which really are just plain binary.) The unboxed types are necessarily 
"unlifted" (i.e., they cannot represent unevaluated expressions, only 
evaluated answers).

So, a "boxed array" is an array of pointers. An "unboxed array" is an 
array of primitive values, such as an array of integers or an array of 
characters. A side-effect of this is that the data in an unboxed array 
is always fully evaluated; you cannot store unevaluated data in an 
unboxed array (or any other unboxed type).

In languages such as C and Pascal, you can make an (unboxed) array of 
any type, but Haskell does not provide this level of sophistication (yet).

I say "yet" because several people are working on libraries that are 
supposed to make this sort of thing easier.



Interestingly, you *can* have unboxed record fields quite trivially. For 
example, if I say

   data Pixel = Pixel {red, green, blue :: Word8}

then each Pixel value contains three pointers to Word8 values. But if I say

   data Pixel = Pixel {red, green, blue :: {-# UNPACK #-} !Word8}

then each Pixel value contains three unboxed Word8 values. In other 
words, no pointers and no indirection.

(Amusingly, GHC stores Word8, Word16 and Word32 as 32-bit quantities, 
and just implements arithmetic on them differently. So declaring 
something as Word8 doesn't actually reduce storage space at all, it just 
changes the way overflow wraps around. Presumably because implementing 
one storage type rather than 3 is less work...)

Not only can you unpack Word8 and similar primitives, but you can in 
fact unpack almost any type you want. Automatically. (There are a few 
restrictions. For example, you cannot unpack a type inside itself.)

Since GHC is apparently perfectly capable of unboxing arbitrary types in 
constructor fields, there's no particular reason, in principle, why it 
can't do the same for arrays. It just doesn't.

Then again, contrary to my expectations, it seems that the entire GHC 
tool-chain has approximately 3 developers working on it full-time, with 
a large cloud of irregular contributors floating around the outside. No 
wonder it takes so long to implement anything...

(On top of that, implementing sophisticated abstractions is *far* more 
interesting than fixing mundane low-level problems. Thus we have things 
like Haskell's floor function being about 5x slow than the C equivalent, 
for no apparent reason other than "nobody has got round to fixing it 
yet". The ticket has been open for years, but nobody has bothered to fix 
it yet...)

>    Sometimes low-level efficiency means you have to bypass fancy
> abstractions...

You said it!
Post a reply to this message