|
|
|
|
|
|
| |
| |
|
|
From: Warp
Subject: Question about garbage collection (not a flame)
Date: 30 Mar 2008 08:51:43
Message: <47ef9aee@news.povray.org>
|
|
|
| |
| |
|
|
Suppose that we have a dynamically allocated object A, which has as
member a reference to a dynamically allocated object B (allocated by A),
which has a reference to a dynamically allocated object C (allocated by B),
and so on. That is, a long chain of objects, each allocating and owning the
next one.
Now the reference to A in the actual program code dies, so A is not owned
by anything anymore. The GC engine kicks in at some point. Does it free all
those objects, or does it free A only (and when it's run again at some point
in the future, it frees B only, etc)?
I assume that it frees all the objects, not just A, but how does it do
it efficiently? (I assume this because else in a circular ownership
situation it would never free any of the objects.)
Assume that this chain of objects is very large, consisting of thousands
of objects.
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
From: Orchid XP v8
Subject: Re: Question about garbage collection (not a flame)
Date: 30 Mar 2008 10:03:00
Message: <47efaba4$1@news.povray.org>
|
|
|
| |
| |
|
|
Warp wrote:
> I assume that it frees all the objects, not just A, but how does it do
> it efficiently? (I assume this because else in a circular ownership
> situation it would never free any of the objects.)
> Assume that this chain of objects is very large, consisting of thousands
> of objects.
The simplest (non-broken) GC implementation is the mark/sweep type.
That is, you mark all objects as "dead". Then, starting from a set of
"roots" (typically what's referenced from the execution stack, global
variables, etc.), you mark each object you touch as "alive", and then
recurse over all the objects it references. (And if any of those are
already marked as live, you've obviously "been here" before, so ignore
that path this time around.)
When you've followed every reference in the system and marked everything
as live, all objects not marked as live must in fact be dead. These can
now be deleted.
(This is the theory. In practice, it's common for a compacting GC to
actually *move* each object as it finds it live, and then simply
deallocate the space used by the remaining objects. Various other
optimisations are possible too.)
Therefore, if nobody references A, then A will never be marked live, and
none of the objects it links to will be marked live either. So a single
pass of the GC will elide them all.
Trouble with mark/sweep is that you have to stop the program executing
while you do your sweep, or some object X might be unlinked from one
place and then relinked from somewhere else, and the GC might scan past
just as this happens, resulting in an object that's actually live being
marked as dead and deallocated.
Obviously, stopping your entire program (all parallel threads!) so you
can run the GC isn't a fantastic idea. So various more sophisticated
schemes have been invented (e.g., generational GC). In any case, for
most designs of GC, the entire chain of objects will be reclaimed in a
single pass of the GC. (Although it might be the next "major" rather
than "minor" pass or something, depending on the exact design details.)
[Note that I do not consider "reference counting" to be a valid GC
algorithm, since it doesn't actually work properly...]
Post a reply to this message
|
|
| |
| |
|
|
From: John VanSickle
Subject: Re: Question about garbage collection (not a flame)
Date: 30 Mar 2008 11:37:48
Message: <47efc1dc@news.povray.org>
|
|
|
| |
| |
|
|
Orchid XP v8 wrote:
> Obviously, stopping your entire program (all parallel threads!) so you
> can run the GC isn't a fantastic idea. So various more sophisticated
> schemes have been invented (e.g., generational GC). In any case, for
> most designs of GC, the entire chain of objects will be reclaimed in a
> single pass of the GC. (Although it might be the next "major" rather
> than "minor" pass or something, depending on the exact design details.)
I could imagine that there is an optimization available when individual
references go out of scope (although simply deallocating those objects
should suffice), and in a language that checks these things, there may
be opportunities whenever a reference is changed.
Not a full-blown check of every dynamic object (which would hurt
performance), but a limited check. But I'll leave the details to
whoever implements it.
In a language that doesn't have built-in GC, properly encapsulating
references, and properly building destructors, could do most of the GC
work; but then the point of GC is to allow the programmer to forget
these things. But at least I have something to consider if I ever do a
ground-upwards rewrite of my modeler (which is in C++).
Regards,
John
Post a reply to this message
|
|
| |
| |
|
|
From: Darren New
Subject: Re: Question about garbage collection (not a flame)
Date: 30 Mar 2008 14:56:28
Message: <47eff06c$1@news.povray.org>
|
|
|
| |
| |
|
|
Orchid XP v8 wrote:
> Trouble with mark/sweep is that you have to stop the program executing
> while you do your sweep,
The other "problem" is that the process is recursive just as you've run
out of memory. You have to be clever about walking through the data
structures in a way that uses the pointers you're following as stack
space. Which, of course, means it's difficult to run the GC in parallel
with the main processing. :-)
On the other hand, calling malloc() or free() from two separate threads
in the same heap is likely problematic too, unless you're clever about it.
The way the original Smalltalk-80 worked was to have a 5-bit reference
count in each object. Things got collected when the reference count went
to zero. If the reference count went >30, it never got collected until
you did a mark-sweep, which indeed stopped the whole program.
Note too that reference counting can lock up a program just as bad as
mark-sweep does, if done foolishly. Ever have a program where exiting it
made the disk thrash wildly? Probably using a library that freed
everything you allocated before exiting. Think how long you'll thrash
cleaning up (say) a level of a game with reference counting when you
dispose the last pointer to a few hundred meg of AI, textures, geometry,
etc.
> Obviously, stopping your entire program (all parallel threads!) so you
> can run the GC isn't a fantastic idea.
Erlang has one heap per process, where a program not uncommonly has tens
of thousands of "processes", not unlike a thread but much lower
overhead. (Managed co-routines, perhaps?)
> most designs of GC, the entire chain of objects will be reclaimed in a
> single pass of the GC. (Although it might be the next "major" rather
> than "minor" pass or something, depending on the exact design details.)
Yep.
A lot of the newer languages designed to run on massively-parallel
machines (distributed or not) basically have no aliased values, so GC
becomes a lot easier when you can only have one pointer. At that point,
reference counting (or destroying when it goes out of scope) becomes a
lot easier to design in.
--
Darren New / San Diego, CA, USA (PST)
"That's pretty. Where's that?"
"It's the Age of Channelwood."
"We should go there on vacation some time."
Post a reply to this message
|
|
| |
| |
|
|
From: Warp
Subject: Re: Question about garbage collection (not a flame)
Date: 31 Mar 2008 01:20:11
Message: <47f0829a@news.povray.org>
|
|
|
| |
| |
|
|
Orchid XP v8 <voi### [at] devnull> wrote:
> Warp wrote:
> > I assume that it frees all the objects, not just A, but how does it do
> > it efficiently?
> The simplest (non-broken) GC implementation is the mark/sweep type.
That really didn't answer my question. I didn't ask for the simplest
solution.
> [Note that I do not consider "reference counting" to be a valid GC
> algorithm, since it doesn't actually work properly...]
Yet many languages use it, such as for example PHP and Objective C.
It just sounds to me that people object about reference counting because
in theory it's problematic, but some people just go ahead and use it anyways
without problems.
(Yes, I know that in PHP you can actually get circular references which
get never freed, and that's a known problem, but it doesn't seem to cause
too many problems in practice nevertheless, seeing how popular PHP is.
Circular references seem to be a bit like goto: You can come up with all
kinds of situations where it's "necessary", but in practice it happens
rarely if at all.)
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
From: Darren New
Subject: Re: Question about garbage collection (not a flame)
Date: 31 Mar 2008 11:33:52
Message: <47f11270$1@news.povray.org>
|
|
|
| |
| |
|
|
Warp wrote:
> That really didn't answer my question. I didn't ask for the simplest
> solution.
Well, you're asking about a very complex field that's been evolving for
30 years. And the GC for each language is different.
>> [Note that I do not consider "reference counting" to be a valid GC
>> algorithm, since it doesn't actually work properly...]
>
> Yet many languages use it, such as for example PHP and Objective C.
I'm not sure PHP lets you have a pointer to another object visible at
the user layer. If you do
$x['hello'] = $y
as far as the user is concerned, $x['hello'] and $y are two different
values. So $y['hello'] = $y doesn't form a circular reference, as far as
I can remember. I'll try it out as soon as I boot over to Linux.
> It just sounds to me that people object about reference counting because
> in theory it's problematic, but some people just go ahead and use it anyways
> without problems.
Not without problems. Firefox, for example, is redoing the entire
plug-in memory management thing because it's too hard for extension
writers to get right.
> (Yes, I know that in PHP you can actually get circular references which
> get never freed, and that's a known problem, but it doesn't seem to cause
> too many problems in practice nevertheless, seeing how popular PHP is.
PHP tends to run for one web page, then exit. That's part of the
solution. Indeed, I ran across this, trying to use PHP in a loop to
insert a few million records from a file into a database.
Since PHP doesn't actually have pointers, it's possible the runtime is
doing things to patch up as well.
But saying "it's good enough, who cares if one out of 20 programs leaks
memory" is a very PHP approach, yah.
> Circular references seem to be a bit like goto: You can come up with all
> kinds of situations where it's "necessary", but in practice it happens
> rarely if at all.)
I disagree. Doubly-linked lists, trees with pointers up the nodes,
widgets with child widgets that have parent pointers, etc. Pretty much
everything "OO" is good at winds up generating circular references.
--
Darren New / San Diego, CA, USA (PST)
"That's pretty. Where's that?"
"It's the Age of Channelwood."
"We should go there on vacation some time."
Post a reply to this message
|
|
| |
| |
|
|
From: Nicolas Alvarez
Subject: Re: Question about garbage collection (not a flame)
Date: 31 Mar 2008 11:41:05
Message: <47f11421$1@news.povray.org>
|
|
|
| |
| |
|
|
> I'm not sure PHP lets you have a pointer to another object visible at
> the user layer. If you do
> $x['hello'] = $y
> as far as the user is concerned, $x['hello'] and $y are two different
> values. So $y['hello'] = $y doesn't form a circular reference, as far as
> I can remember. I'll try it out as soon as I boot over to Linux.
PHP has "references":
$a =& $b;
This behaves like Unix hardlinks. "$a and $b are completely equal here,
that's not $a pointing to $b or vice versa, that's $a and $b pointing to
the same place"
Post a reply to this message
|
|
| |
| |
|
|
From: Darren New
Subject: Re: Question about garbage collection (not a flame)
Date: 31 Mar 2008 11:46:51
Message: <47f1157b$1@news.povray.org>
|
|
|
| |
| |
|
|
Nicolas Alvarez wrote:
> This behaves like Unix hardlinks. "$a and $b are completely equal here,
> that's not $a pointing to $b or vice versa, that's $a and $b pointing to
> the same place"
Right. I suspect either they don't get GCed if you wind up with a
reference to $B inside $B, or there's a mark/sweep mechanism that kicks
in. Given my experiences, I'd expect it just leaks. :-)
--
Darren New / San Diego, CA, USA (PST)
"That's pretty. Where's that?"
"It's the Age of Channelwood."
"We should go there on vacation some time."
Post a reply to this message
|
|
| |
| |
|
|
From: Warp
Subject: Re: Question about garbage collection (not a flame)
Date: 31 Mar 2008 12:06:52
Message: <47f11a2b@news.povray.org>
|
|
|
| |
| |
|
|
Darren New <dne### [at] sanrrcom> wrote:
> > Circular references seem to be a bit like goto: You can come up with all
> > kinds of situations where it's "necessary", but in practice it happens
> > rarely if at all.)
> I disagree. Doubly-linked lists, trees with pointers up the nodes,
> widgets with child widgets that have parent pointers, etc. Pretty much
> everything "OO" is good at winds up generating circular references.
Yet, as I said, languages like PHP and Objective C seem to use
reference counting without too many problems.
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
From: Warp
Subject: Re: Question about garbage collection (not a flame)
Date: 31 Mar 2008 12:08:48
Message: <47f11aa0@news.povray.org>
|
|
|
| |
| |
|
|
Darren New <dne### [at] sanrrcom> wrote:
> Right. I suspect either they don't get GCed if you wind up with a
> reference to $B inside $B, or there's a mark/sweep mechanism that kicks
> in. Given my experiences, I'd expect it just leaks. :-)
http://www.derickrethans.nl/circular_references.php
It seems they are working on something called "cyclic tracing" to
alleviate the problem, whatever that may mean.
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|