|
 |
>> Moreover, in C and C++ objects cannot be moved in memory behind the
>> scenes
>> because there may be pointers pointing to them, and hence moving an
>> object
>> would break the pointers (and thus the entire program). In Java, however,
>> objects can be moved in memory without breaking any existing
>> references to
>> them, if the references are not raw memory addresses but something
>> smarter
>
> This however is not *quite* true. C# and (I think) Java both store
> references as pointers internally, without any additional per-reference
> overhead. What lets objects be moved and GCed is the fact that the
> compiler has generated information about where the pointers are in each
> type and that information is reliable (i.e., no unions of references and
> integers, for example). One doesn't need extra indirections. One simply
> needs to know whether any given 4-byte field is a pointer or an integer,
> which can be deduced from information left for the runtime by the
> compiler, or basically what C++ might call RTTI.
>
> The original versions of Smalltalk, however, did indeed use the extra
> indirection. You could have 2^15 objects at a time, even if you had
> megabytes of memory, and there was an extra table somewhere that held
> the run-time pointer from the index to the object location. (The
> odd-numbered pointers were for small integers, hence 2^15 instead of 2^16.)
As I understand it, Smalltalk did it by having an "object table". Each
reference to an object is just an object ID - an index into the object
table. The object table then stores the pointers to the actual objects.
That means you can easily move the objects around so long as you update
the object table. (And you can potentially store additional information
in the object table too, for example to help the GC.) The down-side is
an extra level of indirection.
I think before you can talk about what "Java" does, you'd have to
specify which Java implementation you mean; presumably each JVM can
implement this stuff any way they choose.
As far as I understand it, what GHC does is this: Each object reference
is a machine address. When the GC moves everything, it updates all the
references. (I think that's what Darren is trying to say that Java does.)
The slight disadvantage is that this means that the GC can only be run
at certain moments. (E.g., when the mutator *isn't* in the middle of
doing something with a particular reference.) And since GHC has the
limitation that all Haskell threads must be paused before the GC can
run, that can sometimes mean that one rogue thread can take forever to
reach a "safe point" where it can be paused, causing all CPU cores
except one to sit idle while we wait for a GC cycle.
As you can imagine, this is an efficient way to destroy parallel
performance. The GHC team are working on a concurrent GC engine that
doesn't require the mutator to be paused before it can run, which should
fix this particular issue. (Don't hold your breath though...)
Post a reply to this message
|
 |