POV-Ray: Newsgroups: povray.off-topic: A min-heap

POV-Ray : Newsgroups : povray.off-topic : A min-heap		Server Time 12 Jul 2025 23:26:27 EDT (-0400)

<<< Previous 6 Messages

Goto Initial 10 Messages

From: Warp
Subject: Re: A min-heap
Date: 14 Oct 2010 15:40:08
Message: <4cb75c98@news.povray.org>

Darren New <dne### [at] sanrrcom> wrote:
> Warp wrote:
> >   I assume that means that Haskell doesn't handle objects (if it even has
> > such a concept) by value, only by reference (which, in fact, isn't very
> > surprising to me).

> Semantically, this is a rather difficult question to answer for a purely 
> functional language where the data can't be mutated.

  Well, if eg. integers can be handled by value, why not user-defined
objects?

-- 
                                                          - Warp

Post a reply to this message

From: Darren New
Subject: Re: A min-heap
Date: 14 Oct 2010 17:48:24
Message: <4cb77aa8$1@news.povray.org>

Warp wrote:
> Darren New <dne### [at] sanrrcom> wrote:
>> Warp wrote:
>>>   I assume that means that Haskell doesn't handle objects (if it even has
>>> such a concept) by value, only by reference (which, in fact, isn't very
>>> surprising to me).
> 
>> Semantically, this is a rather difficult question to answer for a purely 
>> functional language where the data can't be mutated.
> 
>   Well, if eg. integers can be handled by value, why not user-defined
> objects?

There's no need to?  Whether the language implements user objects as values 
or as something pointed to makes no difference to the meaning of the 
language. It's of course an efficiency trade-off, which is why I said it's 
*semantically* difficult to answer.

Even if Haskell is passing around pointers to user objects, it's still not 
pass-by-reference in the usual sense of the word, because the values are 
immutable.

My comment might not have been 100% in line with what you were asking about. 
I just thought it was an interesting point.  In a language where you can't 
modify memory once you've allocated it, is there any semantic difference 
between allocating it on the heap or allocating it inline somewhere? I don't 
think so.

-- 
Darren New, San Diego CA, USA (PST)
   Serving Suggestion:
     "Don't serve this any more. It's awful."

Post a reply to this message

From: Invisible
Subject: Re: A min-heap
Date: 15 Oct 2010 04:38:01
Message: <4cb812e9$1@news.povray.org>

> My comment might not have been 100% in line with what you were asking
> about. I just thought it was an interesting point. In a language where
> you can't modify memory once you've allocated it, is there any semantic
> difference between allocating it on the heap or allocating it inline
> somewhere? I don't think so.

The Haskell language specification describes the syntax of the language 
and approximately how to execute it. However, it is silent on 
implementation details such as calling conventions. (This is not an 
oversight.)

The language spec says nothing about what goes on the heap, what goes on 
the stack, what goes in registers, etc. In Haskell, I can say "print (x 
+ 1)". Now, whether the value of "x" is in a register, on the stack, on 
the heap somewhere, and so on makes absolutely no difference to the code 
I just wrote there. One Haskell implementation might do it one way, and 
another might do it a completely different way, and the program will 
still run the same way.

In practise, when people say "Haskell does X", what they really mean is 
"the only currently available production-grade Haskell implementation 
(i.e., GHC) does X". Which isn't quite the same statement.

The simplest and easiest way to implement Haskell is for *everything* to 
be a heap-allocated dynamic object. Obviously this has quite laughable 
performance, and so real Haskell implementations go out of their way to 
be more efficient where possible. Haskell is particularly helpful in 
this regard, since it has no specified calling convention that you have 
to adhere to, and everything is immutable, so it doesn't matter where 
it's stored or whether you copy it.

Darren is quite right: Since [almost] everything is immutable, you can't 
distinguish between a thing and a reference to a thing. You can't 
distinguish between two pointers to the same thing and two pointers to 
things that are identical. (I.e., there is no difference between 
value-equal and reference-equal.) You can't distinguish copies from each 
other. Heck, you can't even distinguish an evaluated result from an 
unevaluated result!

This gives people trying to implement Haskell considerable design freedom.

In fact, consider this:

   repeat x = x : repeat x

Semantically, this generates an infinite list with all elements equal to 
x. But how is it implemented? Does it return a circular list, or does it 
actually *execute* an infinite function-call loop?

The answer is... it doesn't matter. From within Haskell, it is 
impossible to tell the difference.

Note that this is different from saying that there *is* no difference. 
For example, a circular list clearly takes less storage space (and less 
CPU time to manage) than an infinite loop that allocates another new 
list node on each cycle. (To say nothing of cache locality and so on.)

In other words, there are a whole raft of design decisions that make 
absolutely no difference to Haskell (i.e., your programs will still work 
right), but potentially make a very big difference to run-time performance.

Since it does affect performance, GHC (and probably other 
implementations) allow you to have some limited control over some of 
these things using compiler pragmas, command line switches and so on. 
But for the most part, the compiler internally uses sophisticated 
algorithms to attempt to determine the best choices automatically on a 
case-by-case basis. (And if you doubt this, check out the extensive 
published literature output of the GHC team.)



Final note: If I start dealing with mutable data structures, now it 
*does* matter whether you're talking about value-equity or 
reference-equity. Since, in a multi-threaded scenario, value-equity 
could potentially change at any moment, all the mutable containers 
implement reference-equity.

Comparing two things for equity is a pure operation, which you can do 
anywhere, at any time. If you want to compare two mutable objects for 
value-equity, you must do it by hand. This involves manually fetching 
the contents of the container - and thereby fixing exactly which moment 
in time the comparison occurs at, since fetching the contents of a 
mutable structure is an impure operation which must be explicitly 
sequenced with respect to other impure operations in the same thread.

In summary, once again Haskell's type system forces us to stop, think, 
and do the semantically correct thing.

Post a reply to this message

From: Warp
Subject: Re: A min-heap
Date: 15 Oct 2010 10:03:04
Message: <4cb85f17@news.povray.org>

Invisible <voi### [at] devnull> wrote:
> In Java, if you ask for an array of Pixel objects (i.e., "Pixel[]"), 
> what you get is an array of *pointers* to Pixel objects.

  I think that in Java it's not completely correct to call them "pointers".

  A "pointer" is usually understood as being a raw memory address pointing
to the memory location which contains the binary representation of the object
(its "value"). This is what a pointer is in C, C++ and some other languages
which have pointers.

  In Java a reference might internally be a true pointer, or it might not.
(Instead, it could be something else, such as an index to a table which
contains information about the object.)

  There is a difference. In C/C++ you can compare pointers with an ordering
comparison (less than, etc), which allows you to eg. sort pointers (ie. sort
them by the values of the pointers themselves rather than the objects they
are pointing to). Pointer arithmetic is also a "side-effect" of this: You
can add an offset to a pointer and get another pointer which points that
much further in memory (which is used for raw arrays).

  In Java you can't sort references (because references might not have a
well-defined ordering) nor perform "reference arithmetic" (because it would
be nonsensical if the references are something else than raw memory
addresses).

  Moreover, in C and C++ objects cannot be moved in memory behind the scenes
because there may be pointers pointing to them, and hence moving an object
would break the pointers (and thus the entire program). In Java, however,
objects can be moved in memory without breaking any existing references to
them, if the references are not raw memory addresses but something smarter
(such as indices to some table, or whatever; of course this introduces an
additional indirection level to accessing the object through the reference,
but it might be worth it in some cases).

-- 
                                                          - Warp

Post a reply to this message

From: Invisible
Subject: Re: A min-heap
Date: 15 Oct 2010 10:14:58
Message: <4cb861e2$1@news.povray.org>

>> In Java, if you ask for an array of Pixel objects (i.e., "Pixel[]"),
>> what you get is an array of *pointers* to Pixel objects.
>
>    I think that in Java it's not completely correct to call them "pointers".

OK, fair enough. My point was simply that you don't get an array of 
Pixel structures, you can an array of some kind of indirect reference to 
Pixel structures. (As you rightly point out, they might be actual 
pointers, or object table indices, or something else entirely...) My 
point was merely that there's (at least one) level of indirection present.

Post a reply to this message

From: Darren New
Subject: Re: A min-heap
Date: 15 Oct 2010 12:24:58
Message: <4cb8805a$1@news.povray.org>

Warp wrote:
> Invisible <voi### [at] devnull> wrote:
>> In Java, if you ask for an array of Pixel objects (i.e., "Pixel[]"), 
>> what you get is an array of *pointers* to Pixel objects.
> 
>   I think that in Java it's not completely correct to call them "pointers".

What Warp said is indeed the standard terminology in textbooks that talk 
about the stuff.

Basically, an "address" is a location indicator the hardware understands. 
(Phrased that way to make "disk address" and "virtual address" and "mapped 
address" all fall under the name "address".)

A "pointer" is an address with a type.  "int*" is a pointer, not an address.

A "reference" is an opaque pointer to an allocated object that may or may 
not be an address.

>   Moreover, in C and C++ objects cannot be moved in memory behind the scenes
> because there may be pointers pointing to them, and hence moving an object
> would break the pointers (and thus the entire program). In Java, however,
> objects can be moved in memory without breaking any existing references to
> them, if the references are not raw memory addresses but something smarter

This however is not *quite* true. C# and (I think) Java both store 
references as pointers internally, without any additional per-reference 
overhead. What lets objects be moved and GCed is the fact that the compiler 
has generated information about where the pointers are in each type and that 
information is reliable (i.e., no unions of references and integers, for 
example). One doesn't need extra indirections. One simply needs to know 
whether any given 4-byte field is a pointer or an integer, which can be 
deduced from information left for the runtime by the compiler, or basically 
what C++ might call RTTI.

The savings that C++ and C save are not storing pointer information for each 
class (and in C there really isn't any corresponding bit of info), not 
tracking auto pointers to the heap, and so on. I.e., there's less 
compile-time space used, and it lets C++ classes with no vtable not need 
extra info somewhere saying what bits are pointers.

The original versions of Smalltalk, however, did indeed use the extra 
indirection. You could have 2^15 objects at a time, even if you had 
megabytes of memory, and there was an extra table somewhere that held the 
run-time pointer from the index to the object location. (The odd-numbered 
pointers were for small integers, hence 2^15 instead of 2^16.)

-- 
Darren New, San Diego CA, USA (PST)
   Serving Suggestion:
     "Don't serve this any more. It's awful."

Post a reply to this message

From: Invisible
Subject: Re: A min-heap
Date: 18 Oct 2010 04:33:25
Message: <4cbc0655@news.povray.org>

>> Moreover, in C and C++ objects cannot be moved in memory behind the
>> scenes
>> because there may be pointers pointing to them, and hence moving an
>> object
>> would break the pointers (and thus the entire program). In Java, however,
>> objects can be moved in memory without breaking any existing
>> references to
>> them, if the references are not raw memory addresses but something
>> smarter
>
> This however is not *quite* true. C# and (I think) Java both store
> references as pointers internally, without any additional per-reference
> overhead. What lets objects be moved and GCed is the fact that the
> compiler has generated information about where the pointers are in each
> type and that information is reliable (i.e., no unions of references and
> integers, for example). One doesn't need extra indirections. One simply
> needs to know whether any given 4-byte field is a pointer or an integer,
> which can be deduced from information left for the runtime by the
> compiler, or basically what C++ might call RTTI.
>
> The original versions of Smalltalk, however, did indeed use the extra
> indirection. You could have 2^15 objects at a time, even if you had
> megabytes of memory, and there was an extra table somewhere that held
> the run-time pointer from the index to the object location. (The
> odd-numbered pointers were for small integers, hence 2^15 instead of 2^16.)

As I understand it, Smalltalk did it by having an "object table". Each 
reference to an object is just an object ID - an index into the object 
table. The object table then stores the pointers to the actual objects. 
That means you can easily move the objects around so long as you update 
the object table. (And you can potentially store additional information 
in the object table too, for example to help the GC.) The down-side is 
an extra level of indirection.

I think before you can talk about what "Java" does, you'd have to 
specify which Java implementation you mean; presumably each JVM can 
implement this stuff any way they choose.

As far as I understand it, what GHC does is this: Each object reference 
is a machine address. When the GC moves everything, it updates all the 
references. (I think that's what Darren is trying to say that Java does.)

The slight disadvantage is that this means that the GC can only be run 
at certain moments. (E.g., when the mutator *isn't* in the middle of 
doing something with a particular reference.) And since GHC has the 
limitation that all Haskell threads must be paused before the GC can 
run, that can sometimes mean that one rogue thread can take forever to 
reach a "safe point" where it can be paused, causing all CPU cores 
except one to sit idle while we wait for a GC cycle.

As you can imagine, this is an efficient way to destroy parallel 
performance. The GHC team are working on a concurrent GC engine that 
doesn't require the mutator to be paused before it can run, which should 
fix this particular issue. (Don't hold your breath though...)

Post a reply to this message

From: Darren New
Subject: Re: A min-heap
Date: 18 Oct 2010 11:29:20
Message: <4cbc67d0$1@news.povray.org>

Invisible wrote:
> As I understand it, Smalltalk did it by having an "object table". Each 
> reference to an object is just an object ID - an index into the object 
> table. The object table then stores the pointers to the actual objects. 
> That means you can easily move the objects around so long as you update 
> the object table. (And you can potentially store additional information 
> in the object table too, for example to help the GC.) The down-side is 
> an extra level of indirection.

Yes. I'm pretty sure the GC info was in the objects, tho.  Having the object 
table also made "become:" quite inexpensive. ( "A become: B" would swap 
every reference to A to be one to B and vice versa. This was important in a 
system where you could change code while things were running and you wanted 
the new class declaration to replace the old one. It was also used to resize 
arrays, which therefore didn't need the extra overhead of a hidden pointer 
pointing to the "real" array.)

> I think before you can talk about what "Java" does, you'd have to 
> specify which Java implementation you mean; presumably each JVM can 
> implement this stuff any way they choose.

Yes, that's the problem. Altho they can't really add CoW simply because 
nobody would take advantage of it by making a bunch of duplicate references 
that they didn't expect to share state. Nor to make strings mutable.

> As far as I understand it, what GHC does is this: Each object reference 
> is a machine address. When the GC moves everything, it updates all the 
> references. (I think that's what Darren is trying to say that Java does.)

Yes. Or at least C#.

-- 
Darren New, San Diego CA, USA (PST)
   Serving Suggestion:
     "Don't serve this any more. It's awful."

Post a reply to this message

From: Orchid XP v8
Subject: Re: A min-heap
Date: 18 Oct 2010 16:44:08
Message: <4cbcb198@news.povray.org>

On 18/10/2010 04:29 PM, Darren New wrote:
> Invisible wrote:
>> As I understand it, Smalltalk did it by having an "object table". Each
>> reference to an object is just an object ID - an index into the object
>> table. The object table then stores the pointers to the actual
>> objects. That means you can easily move the objects around so long as
>> you update the object table. (And you can potentially store additional
>> information in the object table too, for example to help the GC.) The
>> down-side is an extra level of indirection.
>
> Yes. I'm pretty sure the GC info was in the objects, tho.

Yeah, you might be right about that. I didn't look too closely. (I did 
however spend an evening reading through all the GC documentation to see 
how it has different areas for young objects, old objects, large 
objects, persistent objects, object objects... It all seemed quite 
sophisticated.)

> Having the
> object table also made "become:" quite inexpensive.

The thing that occurs to me is that it could potentially make it easy to 
move objects to another address space, hell maybe even a different 
physical machine. I've always thought that would be quite cool. (Of 
course, as soon as you do this, you need to decide where the "best 
place" for each object actually is.)

-- 
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*

Post a reply to this message

From: Darren New
Subject: Re: A min-heap
Date: 18 Oct 2010 17:59:13
Message: <4cbcc331$1@news.povray.org>

Orchid XP v8 wrote:
> Yeah, you might be right about that. I didn't look too closely. (I did 
> however spend an evening reading through all the GC documentation to see 
> how it has different areas for young objects, old objects, large 
> objects, persistent objects, object objects... It all seemed quite 
> sophisticated.)

That was C#. I don't think Smalltalk-80 had any of that stuff. GC gets 
really complicated when you start throwing in destructors, and even more so 
when you start letting the user arbitrarily pin down bits of heap-allocated 
memory and passing it to languages that use non-collectable pointers.

> move objects to another address space, hell maybe even a different 
> physical machine. I've always thought that would be quite cool. 

Yes. There were versions of Smalltalk that used library code to implement 
demand-paging of objects, just catching the DoesNotUnderstand message to 
page back in the object and re-issue the request.  Paging out was creating a 
small proxy and then become: on the proxy.

Lots of cool stuff you could do that way. Change the class declaration while 
the program is running and suddenly all the instances have the new layout, etc.

-- 
Darren New, San Diego CA, USA (PST)
   Serving Suggestion:
     "Don't serve this any more. It's awful."

Post a reply to this message

<<< Previous 6 Messages

Goto Initial 10 Messages