|
 |
On 13/10/2010 06:06 PM, Warp wrote:
> Invisible<voi### [at] dev null> wrote:
>> However, now that I'm thinking about it, the thing that occurs to me is
>> that finding the minimum or maximum element of a binary search tree
>> presumably requires O(log N) operations, while for a min-heap you can
>> find the minimum in O(1) operations. (Similarly, you can find the
>> maximum of a max-heap in O(1) time.)
>
> Your tree data structure can have a reference/handle to the first and
> last elements for fast access (which is, in fact, what eg. std::set and
> std::map do, in order to be able to get the iterators to the first and
> last elements fast). In this case you get fast access to the smallest,
> largest and median elements (the root node is obviously the median in
> a balanced binary search tree).
Presumably the root is only the median if the tree is *perfectly* balanced?
At any rate, I guess you can include direct pointers to any elements
likely to be of interest. The question then becomes "how fast is it to
keep those pointers up to date?" At least in the case of min/max, if you
delete the present minimum, or insert a new, lower one, it should be
quite efficient to update the pointers.
>> So if you're doing something like implementing a priority queue or
>> similar, a heap looks like being a better choice than a BST.
>
> A heap is at least much *simpler* to implement (and requires less
> ancillary data) than a balanced binary search tree.
I guess that's one of the reasons I ended up playing with it. It's a
nice simple algorithm that works well, which is kinda cool.
> However, the fact
> that you can construct a heap on an array without any ancillary data
> whatsoever is still, IMO, by far its biggest advantage as a data structure
> in such a case (ie. a priority queue). You lose much of that efficiency
> if you allocate all the nodes separately.
True enough I guess.
> I assume that means that Haskell doesn't handle objects (if it even has
> such a concept) by value, only by reference (which, in fact, isn't very
> surprising to me).
Haskell defaults to handling everything by reference, yes. There are a
couple of reasons for that.
In Java, if you ask for an array of Pixel objects (i.e., "Pixel[]"),
what you get is an array of *pointers* to Pixel objects. Similarly, in
Haskell if you ask for an array of Pixel values, you get an array of
pointers. (This is regardless of whether you ask for an immutable array
or a mutable array.)
Reason #1: All pointers have the same structure, which means that the
person writing the array implementation code only needs to handle one
case - pointers.
Reason #2: Each "Pixel value" could in fact be an unevaluated
*expression* which will yield a Pixel value when actually evaluated. You
cannot store an expression in 24 bits of storage. It's impossible.
Of course, in Java you could ask for a byte[] instead of a Pixel[]. Then
you get an array of actual bytes. But now *you* must write the
boilerplate code for writing a Pixel into the appropriate place in a
byte[], and reading it back out again.
You can do a similar thing in Haskell. And it's similarly tedious.
Haskell adds some innovations that make it easier for *the client* of
this code, but it's still quite tedious to write the code in the first
place.
To use the Haskell jargon, most types are "boxed" (i.e., they are
pointers to "boxes" on the heap). Haskell (or rather, GHC) also has
support for "unboxed" types. (Things like integers, characters, etc.,
which really are just plain binary.) The unboxed types are necessarily
"unlifted" (i.e., they cannot represent unevaluated expressions, only
evaluated answers).
So, a "boxed array" is an array of pointers. An "unboxed array" is an
array of primitive values, such as an array of integers or an array of
characters. A side-effect of this is that the data in an unboxed array
is always fully evaluated; you cannot store unevaluated data in an
unboxed array (or any other unboxed type).
In languages such as C and Pascal, you can make an (unboxed) array of
any type, but Haskell does not provide this level of sophistication (yet).
I say "yet" because several people are working on libraries that are
supposed to make this sort of thing easier.
Interestingly, you *can* have unboxed record fields quite trivially. For
example, if I say
data Pixel = Pixel {red, green, blue :: Word8}
then each Pixel value contains three pointers to Word8 values. But if I say
data Pixel = Pixel {red, green, blue :: {-# UNPACK #-} !Word8}
then each Pixel value contains three unboxed Word8 values. In other
words, no pointers and no indirection.
(Amusingly, GHC stores Word8, Word16 and Word32 as 32-bit quantities,
and just implements arithmetic on them differently. So declaring
something as Word8 doesn't actually reduce storage space at all, it just
changes the way overflow wraps around. Presumably because implementing
one storage type rather than 3 is less work...)
Not only can you unpack Word8 and similar primitives, but you can in
fact unpack almost any type you want. Automatically. (There are a few
restrictions. For example, you cannot unpack a type inside itself.)
Since GHC is apparently perfectly capable of unboxing arbitrary types in
constructor fields, there's no particular reason, in principle, why it
can't do the same for arrays. It just doesn't.
Then again, contrary to my expectations, it seems that the entire GHC
tool-chain has approximately 3 developers working on it full-time, with
a large cloud of irregular contributors floating around the outside. No
wonder it takes so long to implement anything...
(On top of that, implementing sophisticated abstractions is *far* more
interesting than fixing mundane low-level problems. Thus we have things
like Haskell's floor function being about 5x slow than the C equivalent,
for no apparent reason other than "nobody has got round to fixing it
yet". The ticket has been open for years, but nobody has bothered to fix
it yet...)
> Sometimes low-level efficiency means you have to bypass fancy
> abstractions...
You said it!
Post a reply to this message
|
 |