 |
 |
|
 |
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Darren New <dne### [at] san rr com> wrote:
> Warp wrote:
> > I assume that means that Haskell doesn't handle objects (if it even has
> > such a concept) by value, only by reference (which, in fact, isn't very
> > surprising to me).
> Semantically, this is a rather difficult question to answer for a purely
> functional language where the data can't be mutated.
Well, if eg. integers can be handled by value, why not user-defined
objects?
--
- Warp
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Warp wrote:
> Darren New <dne### [at] san rr com> wrote:
>> Warp wrote:
>>> I assume that means that Haskell doesn't handle objects (if it even has
>>> such a concept) by value, only by reference (which, in fact, isn't very
>>> surprising to me).
>
>> Semantically, this is a rather difficult question to answer for a purely
>> functional language where the data can't be mutated.
>
> Well, if eg. integers can be handled by value, why not user-defined
> objects?
There's no need to? Whether the language implements user objects as values
or as something pointed to makes no difference to the meaning of the
language. It's of course an efficiency trade-off, which is why I said it's
*semantically* difficult to answer.
Even if Haskell is passing around pointers to user objects, it's still not
pass-by-reference in the usual sense of the word, because the values are
immutable.
My comment might not have been 100% in line with what you were asking about.
I just thought it was an interesting point. In a language where you can't
modify memory once you've allocated it, is there any semantic difference
between allocating it on the heap or allocating it inline somewhere? I don't
think so.
--
Darren New, San Diego CA, USA (PST)
Serving Suggestion:
"Don't serve this any more. It's awful."
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
> My comment might not have been 100% in line with what you were asking
> about. I just thought it was an interesting point. In a language where
> you can't modify memory once you've allocated it, is there any semantic
> difference between allocating it on the heap or allocating it inline
> somewhere? I don't think so.
The Haskell language specification describes the syntax of the language
and approximately how to execute it. However, it is silent on
implementation details such as calling conventions. (This is not an
oversight.)
The language spec says nothing about what goes on the heap, what goes on
the stack, what goes in registers, etc. In Haskell, I can say "print (x
+ 1)". Now, whether the value of "x" is in a register, on the stack, on
the heap somewhere, and so on makes absolutely no difference to the code
I just wrote there. One Haskell implementation might do it one way, and
another might do it a completely different way, and the program will
still run the same way.
In practise, when people say "Haskell does X", what they really mean is
"the only currently available production-grade Haskell implementation
(i.e., GHC) does X". Which isn't quite the same statement.
The simplest and easiest way to implement Haskell is for *everything* to
be a heap-allocated dynamic object. Obviously this has quite laughable
performance, and so real Haskell implementations go out of their way to
be more efficient where possible. Haskell is particularly helpful in
this regard, since it has no specified calling convention that you have
to adhere to, and everything is immutable, so it doesn't matter where
it's stored or whether you copy it.
Darren is quite right: Since [almost] everything is immutable, you can't
distinguish between a thing and a reference to a thing. You can't
distinguish between two pointers to the same thing and two pointers to
things that are identical. (I.e., there is no difference between
value-equal and reference-equal.) You can't distinguish copies from each
other. Heck, you can't even distinguish an evaluated result from an
unevaluated result!
This gives people trying to implement Haskell considerable design freedom.
In fact, consider this:
repeat x = x : repeat x
Semantically, this generates an infinite list with all elements equal to
x. But how is it implemented? Does it return a circular list, or does it
actually *execute* an infinite function-call loop?
The answer is... it doesn't matter. From within Haskell, it is
impossible to tell the difference.
Note that this is different from saying that there *is* no difference.
For example, a circular list clearly takes less storage space (and less
CPU time to manage) than an infinite loop that allocates another new
list node on each cycle. (To say nothing of cache locality and so on.)
In other words, there are a whole raft of design decisions that make
absolutely no difference to Haskell (i.e., your programs will still work
right), but potentially make a very big difference to run-time performance.
Since it does affect performance, GHC (and probably other
implementations) allow you to have some limited control over some of
these things using compiler pragmas, command line switches and so on.
But for the most part, the compiler internally uses sophisticated
algorithms to attempt to determine the best choices automatically on a
case-by-case basis. (And if you doubt this, check out the extensive
published literature output of the GHC team.)
Final note: If I start dealing with mutable data structures, now it
*does* matter whether you're talking about value-equity or
reference-equity. Since, in a multi-threaded scenario, value-equity
could potentially change at any moment, all the mutable containers
implement reference-equity.
Comparing two things for equity is a pure operation, which you can do
anywhere, at any time. If you want to compare two mutable objects for
value-equity, you must do it by hand. This involves manually fetching
the contents of the container - and thereby fixing exactly which moment
in time the comparison occurs at, since fetching the contents of a
mutable structure is an impure operation which must be explicitly
sequenced with respect to other impure operations in the same thread.
In summary, once again Haskell's type system forces us to stop, think,
and do the semantically correct thing.
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Invisible <voi### [at] dev null> wrote:
> In Java, if you ask for an array of Pixel objects (i.e., "Pixel[]"),
> what you get is an array of *pointers* to Pixel objects.
I think that in Java it's not completely correct to call them "pointers".
A "pointer" is usually understood as being a raw memory address pointing
to the memory location which contains the binary representation of the object
(its "value"). This is what a pointer is in C, C++ and some other languages
which have pointers.
In Java a reference might internally be a true pointer, or it might not.
(Instead, it could be something else, such as an index to a table which
contains information about the object.)
There is a difference. In C/C++ you can compare pointers with an ordering
comparison (less than, etc), which allows you to eg. sort pointers (ie. sort
them by the values of the pointers themselves rather than the objects they
are pointing to). Pointer arithmetic is also a "side-effect" of this: You
can add an offset to a pointer and get another pointer which points that
much further in memory (which is used for raw arrays).
In Java you can't sort references (because references might not have a
well-defined ordering) nor perform "reference arithmetic" (because it would
be nonsensical if the references are something else than raw memory
addresses).
Moreover, in C and C++ objects cannot be moved in memory behind the scenes
because there may be pointers pointing to them, and hence moving an object
would break the pointers (and thus the entire program). In Java, however,
objects can be moved in memory without breaking any existing references to
them, if the references are not raw memory addresses but something smarter
(such as indices to some table, or whatever; of course this introduces an
additional indirection level to accessing the object through the reference,
but it might be worth it in some cases).
--
- Warp
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
>> In Java, if you ask for an array of Pixel objects (i.e., "Pixel[]"),
>> what you get is an array of *pointers* to Pixel objects.
>
> I think that in Java it's not completely correct to call them "pointers".
OK, fair enough. My point was simply that you don't get an array of
Pixel structures, you can an array of some kind of indirect reference to
Pixel structures. (As you rightly point out, they might be actual
pointers, or object table indices, or something else entirely...) My
point was merely that there's (at least one) level of indirection present.
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Warp wrote:
> Invisible <voi### [at] dev null> wrote:
>> In Java, if you ask for an array of Pixel objects (i.e., "Pixel[]"),
>> what you get is an array of *pointers* to Pixel objects.
>
> I think that in Java it's not completely correct to call them "pointers".
What Warp said is indeed the standard terminology in textbooks that talk
about the stuff.
Basically, an "address" is a location indicator the hardware understands.
(Phrased that way to make "disk address" and "virtual address" and "mapped
address" all fall under the name "address".)
A "pointer" is an address with a type. "int*" is a pointer, not an address.
A "reference" is an opaque pointer to an allocated object that may or may
not be an address.
> Moreover, in C and C++ objects cannot be moved in memory behind the scenes
> because there may be pointers pointing to them, and hence moving an object
> would break the pointers (and thus the entire program). In Java, however,
> objects can be moved in memory without breaking any existing references to
> them, if the references are not raw memory addresses but something smarter
This however is not *quite* true. C# and (I think) Java both store
references as pointers internally, without any additional per-reference
overhead. What lets objects be moved and GCed is the fact that the compiler
has generated information about where the pointers are in each type and that
information is reliable (i.e., no unions of references and integers, for
example). One doesn't need extra indirections. One simply needs to know
whether any given 4-byte field is a pointer or an integer, which can be
deduced from information left for the runtime by the compiler, or basically
what C++ might call RTTI.
The savings that C++ and C save are not storing pointer information for each
class (and in C there really isn't any corresponding bit of info), not
tracking auto pointers to the heap, and so on. I.e., there's less
compile-time space used, and it lets C++ classes with no vtable not need
extra info somewhere saying what bits are pointers.
The original versions of Smalltalk, however, did indeed use the extra
indirection. You could have 2^15 objects at a time, even if you had
megabytes of memory, and there was an extra table somewhere that held the
run-time pointer from the index to the object location. (The odd-numbered
pointers were for small integers, hence 2^15 instead of 2^16.)
--
Darren New, San Diego CA, USA (PST)
Serving Suggestion:
"Don't serve this any more. It's awful."
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
>> Moreover, in C and C++ objects cannot be moved in memory behind the
>> scenes
>> because there may be pointers pointing to them, and hence moving an
>> object
>> would break the pointers (and thus the entire program). In Java, however,
>> objects can be moved in memory without breaking any existing
>> references to
>> them, if the references are not raw memory addresses but something
>> smarter
>
> This however is not *quite* true. C# and (I think) Java both store
> references as pointers internally, without any additional per-reference
> overhead. What lets objects be moved and GCed is the fact that the
> compiler has generated information about where the pointers are in each
> type and that information is reliable (i.e., no unions of references and
> integers, for example). One doesn't need extra indirections. One simply
> needs to know whether any given 4-byte field is a pointer or an integer,
> which can be deduced from information left for the runtime by the
> compiler, or basically what C++ might call RTTI.
>
> The original versions of Smalltalk, however, did indeed use the extra
> indirection. You could have 2^15 objects at a time, even if you had
> megabytes of memory, and there was an extra table somewhere that held
> the run-time pointer from the index to the object location. (The
> odd-numbered pointers were for small integers, hence 2^15 instead of 2^16.)
As I understand it, Smalltalk did it by having an "object table". Each
reference to an object is just an object ID - an index into the object
table. The object table then stores the pointers to the actual objects.
That means you can easily move the objects around so long as you update
the object table. (And you can potentially store additional information
in the object table too, for example to help the GC.) The down-side is
an extra level of indirection.
I think before you can talk about what "Java" does, you'd have to
specify which Java implementation you mean; presumably each JVM can
implement this stuff any way they choose.
As far as I understand it, what GHC does is this: Each object reference
is a machine address. When the GC moves everything, it updates all the
references. (I think that's what Darren is trying to say that Java does.)
The slight disadvantage is that this means that the GC can only be run
at certain moments. (E.g., when the mutator *isn't* in the middle of
doing something with a particular reference.) And since GHC has the
limitation that all Haskell threads must be paused before the GC can
run, that can sometimes mean that one rogue thread can take forever to
reach a "safe point" where it can be paused, causing all CPU cores
except one to sit idle while we wait for a GC cycle.
As you can imagine, this is an efficient way to destroy parallel
performance. The GHC team are working on a concurrent GC engine that
doesn't require the mutator to be paused before it can run, which should
fix this particular issue. (Don't hold your breath though...)
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Invisible wrote:
> As I understand it, Smalltalk did it by having an "object table". Each
> reference to an object is just an object ID - an index into the object
> table. The object table then stores the pointers to the actual objects.
> That means you can easily move the objects around so long as you update
> the object table. (And you can potentially store additional information
> in the object table too, for example to help the GC.) The down-side is
> an extra level of indirection.
Yes. I'm pretty sure the GC info was in the objects, tho. Having the object
table also made "become:" quite inexpensive. ( "A become: B" would swap
every reference to A to be one to B and vice versa. This was important in a
system where you could change code while things were running and you wanted
the new class declaration to replace the old one. It was also used to resize
arrays, which therefore didn't need the extra overhead of a hidden pointer
pointing to the "real" array.)
> I think before you can talk about what "Java" does, you'd have to
> specify which Java implementation you mean; presumably each JVM can
> implement this stuff any way they choose.
Yes, that's the problem. Altho they can't really add CoW simply because
nobody would take advantage of it by making a bunch of duplicate references
that they didn't expect to share state. Nor to make strings mutable.
> As far as I understand it, what GHC does is this: Each object reference
> is a machine address. When the GC moves everything, it updates all the
> references. (I think that's what Darren is trying to say that Java does.)
Yes. Or at least C#.
--
Darren New, San Diego CA, USA (PST)
Serving Suggestion:
"Don't serve this any more. It's awful."
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
On 18/10/2010 04:29 PM, Darren New wrote:
> Invisible wrote:
>> As I understand it, Smalltalk did it by having an "object table". Each
>> reference to an object is just an object ID - an index into the object
>> table. The object table then stores the pointers to the actual
>> objects. That means you can easily move the objects around so long as
>> you update the object table. (And you can potentially store additional
>> information in the object table too, for example to help the GC.) The
>> down-side is an extra level of indirection.
>
> Yes. I'm pretty sure the GC info was in the objects, tho.
Yeah, you might be right about that. I didn't look too closely. (I did
however spend an evening reading through all the GC documentation to see
how it has different areas for young objects, old objects, large
objects, persistent objects, object objects... It all seemed quite
sophisticated.)
> Having the
> object table also made "become:" quite inexpensive.
The thing that occurs to me is that it could potentially make it easy to
move objects to another address space, hell maybe even a different
physical machine. I've always thought that would be quite cool. (Of
course, as soon as you do this, you need to decide where the "best
place" for each object actually is.)
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Orchid XP v8 wrote:
> Yeah, you might be right about that. I didn't look too closely. (I did
> however spend an evening reading through all the GC documentation to see
> how it has different areas for young objects, old objects, large
> objects, persistent objects, object objects... It all seemed quite
> sophisticated.)
That was C#. I don't think Smalltalk-80 had any of that stuff. GC gets
really complicated when you start throwing in destructors, and even more so
when you start letting the user arbitrarily pin down bits of heap-allocated
memory and passing it to languages that use non-collectable pointers.
> move objects to another address space, hell maybe even a different
> physical machine. I've always thought that would be quite cool.
Yes. There were versions of Smalltalk that used library code to implement
demand-paging of objects, just catching the DoesNotUnderstand message to
page back in the object and re-issue the request. Paging out was creating a
small proxy and then become: on the proxy.
Lots of cool stuff you could do that way. Change the class declaration while
the program is running and suddenly all the instances have the new layout, etc.
--
Darren New, San Diego CA, USA (PST)
Serving Suggestion:
"Don't serve this any more. It's awful."
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|
 |