 |
 |
|
 |
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
>> In Java, if you ask for an array of Pixel objects (i.e., "Pixel[]"),
>> what you get is an array of *pointers* to Pixel objects.
>
> I think that in Java it's not completely correct to call them "pointers".
OK, fair enough. My point was simply that you don't get an array of
Pixel structures, you can an array of some kind of indirect reference to
Pixel structures. (As you rightly point out, they might be actual
pointers, or object table indices, or something else entirely...) My
point was merely that there's (at least one) level of indirection present.
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Warp wrote:
> Invisible <voi### [at] dev null> wrote:
>> In Java, if you ask for an array of Pixel objects (i.e., "Pixel[]"),
>> what you get is an array of *pointers* to Pixel objects.
>
> I think that in Java it's not completely correct to call them "pointers".
What Warp said is indeed the standard terminology in textbooks that talk
about the stuff.
Basically, an "address" is a location indicator the hardware understands.
(Phrased that way to make "disk address" and "virtual address" and "mapped
address" all fall under the name "address".)
A "pointer" is an address with a type. "int*" is a pointer, not an address.
A "reference" is an opaque pointer to an allocated object that may or may
not be an address.
> Moreover, in C and C++ objects cannot be moved in memory behind the scenes
> because there may be pointers pointing to them, and hence moving an object
> would break the pointers (and thus the entire program). In Java, however,
> objects can be moved in memory without breaking any existing references to
> them, if the references are not raw memory addresses but something smarter
This however is not *quite* true. C# and (I think) Java both store
references as pointers internally, without any additional per-reference
overhead. What lets objects be moved and GCed is the fact that the compiler
has generated information about where the pointers are in each type and that
information is reliable (i.e., no unions of references and integers, for
example). One doesn't need extra indirections. One simply needs to know
whether any given 4-byte field is a pointer or an integer, which can be
deduced from information left for the runtime by the compiler, or basically
what C++ might call RTTI.
The savings that C++ and C save are not storing pointer information for each
class (and in C there really isn't any corresponding bit of info), not
tracking auto pointers to the heap, and so on. I.e., there's less
compile-time space used, and it lets C++ classes with no vtable not need
extra info somewhere saying what bits are pointers.
The original versions of Smalltalk, however, did indeed use the extra
indirection. You could have 2^15 objects at a time, even if you had
megabytes of memory, and there was an extra table somewhere that held the
run-time pointer from the index to the object location. (The odd-numbered
pointers were for small integers, hence 2^15 instead of 2^16.)
--
Darren New, San Diego CA, USA (PST)
Serving Suggestion:
"Don't serve this any more. It's awful."
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
>> Moreover, in C and C++ objects cannot be moved in memory behind the
>> scenes
>> because there may be pointers pointing to them, and hence moving an
>> object
>> would break the pointers (and thus the entire program). In Java, however,
>> objects can be moved in memory without breaking any existing
>> references to
>> them, if the references are not raw memory addresses but something
>> smarter
>
> This however is not *quite* true. C# and (I think) Java both store
> references as pointers internally, without any additional per-reference
> overhead. What lets objects be moved and GCed is the fact that the
> compiler has generated information about where the pointers are in each
> type and that information is reliable (i.e., no unions of references and
> integers, for example). One doesn't need extra indirections. One simply
> needs to know whether any given 4-byte field is a pointer or an integer,
> which can be deduced from information left for the runtime by the
> compiler, or basically what C++ might call RTTI.
>
> The original versions of Smalltalk, however, did indeed use the extra
> indirection. You could have 2^15 objects at a time, even if you had
> megabytes of memory, and there was an extra table somewhere that held
> the run-time pointer from the index to the object location. (The
> odd-numbered pointers were for small integers, hence 2^15 instead of 2^16.)
As I understand it, Smalltalk did it by having an "object table". Each
reference to an object is just an object ID - an index into the object
table. The object table then stores the pointers to the actual objects.
That means you can easily move the objects around so long as you update
the object table. (And you can potentially store additional information
in the object table too, for example to help the GC.) The down-side is
an extra level of indirection.
I think before you can talk about what "Java" does, you'd have to
specify which Java implementation you mean; presumably each JVM can
implement this stuff any way they choose.
As far as I understand it, what GHC does is this: Each object reference
is a machine address. When the GC moves everything, it updates all the
references. (I think that's what Darren is trying to say that Java does.)
The slight disadvantage is that this means that the GC can only be run
at certain moments. (E.g., when the mutator *isn't* in the middle of
doing something with a particular reference.) And since GHC has the
limitation that all Haskell threads must be paused before the GC can
run, that can sometimes mean that one rogue thread can take forever to
reach a "safe point" where it can be paused, causing all CPU cores
except one to sit idle while we wait for a GC cycle.
As you can imagine, this is an efficient way to destroy parallel
performance. The GHC team are working on a concurrent GC engine that
doesn't require the mutator to be paused before it can run, which should
fix this particular issue. (Don't hold your breath though...)
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Invisible wrote:
> As I understand it, Smalltalk did it by having an "object table". Each
> reference to an object is just an object ID - an index into the object
> table. The object table then stores the pointers to the actual objects.
> That means you can easily move the objects around so long as you update
> the object table. (And you can potentially store additional information
> in the object table too, for example to help the GC.) The down-side is
> an extra level of indirection.
Yes. I'm pretty sure the GC info was in the objects, tho. Having the object
table also made "become:" quite inexpensive. ( "A become: B" would swap
every reference to A to be one to B and vice versa. This was important in a
system where you could change code while things were running and you wanted
the new class declaration to replace the old one. It was also used to resize
arrays, which therefore didn't need the extra overhead of a hidden pointer
pointing to the "real" array.)
> I think before you can talk about what "Java" does, you'd have to
> specify which Java implementation you mean; presumably each JVM can
> implement this stuff any way they choose.
Yes, that's the problem. Altho they can't really add CoW simply because
nobody would take advantage of it by making a bunch of duplicate references
that they didn't expect to share state. Nor to make strings mutable.
> As far as I understand it, what GHC does is this: Each object reference
> is a machine address. When the GC moves everything, it updates all the
> references. (I think that's what Darren is trying to say that Java does.)
Yes. Or at least C#.
--
Darren New, San Diego CA, USA (PST)
Serving Suggestion:
"Don't serve this any more. It's awful."
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
On 18/10/2010 04:29 PM, Darren New wrote:
> Invisible wrote:
>> As I understand it, Smalltalk did it by having an "object table". Each
>> reference to an object is just an object ID - an index into the object
>> table. The object table then stores the pointers to the actual
>> objects. That means you can easily move the objects around so long as
>> you update the object table. (And you can potentially store additional
>> information in the object table too, for example to help the GC.) The
>> down-side is an extra level of indirection.
>
> Yes. I'm pretty sure the GC info was in the objects, tho.
Yeah, you might be right about that. I didn't look too closely. (I did
however spend an evening reading through all the GC documentation to see
how it has different areas for young objects, old objects, large
objects, persistent objects, object objects... It all seemed quite
sophisticated.)
> Having the
> object table also made "become:" quite inexpensive.
The thing that occurs to me is that it could potentially make it easy to
move objects to another address space, hell maybe even a different
physical machine. I've always thought that would be quite cool. (Of
course, as soon as you do this, you need to decide where the "best
place" for each object actually is.)
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Orchid XP v8 wrote:
> Yeah, you might be right about that. I didn't look too closely. (I did
> however spend an evening reading through all the GC documentation to see
> how it has different areas for young objects, old objects, large
> objects, persistent objects, object objects... It all seemed quite
> sophisticated.)
That was C#. I don't think Smalltalk-80 had any of that stuff. GC gets
really complicated when you start throwing in destructors, and even more so
when you start letting the user arbitrarily pin down bits of heap-allocated
memory and passing it to languages that use non-collectable pointers.
> move objects to another address space, hell maybe even a different
> physical machine. I've always thought that would be quite cool.
Yes. There were versions of Smalltalk that used library code to implement
demand-paging of objects, just catching the DoesNotUnderstand message to
page back in the object and re-issue the request. Paging out was creating a
small proxy and then become: on the proxy.
Lots of cool stuff you could do that way. Change the class declaration while
the program is running and suddenly all the instances have the new layout, etc.
--
Darren New, San Diego CA, USA (PST)
Serving Suggestion:
"Don't serve this any more. It's awful."
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|
 |