 |
 |
|
 |
|
 |
|  |
|  |
|
 |
From: Darren New
Subject: Re: One thing which puzzles me about unix/linux
Date: 7 Apr 2008 17:02:02
Message: <47fa8bca@news.povray.org>
|
|
 |
|  |
|  |
|
 |
Tom Austin wrote:
> I would check to see if there is a 'file' for the specific process that
> reports the current memory usage.
The problem with that approach is you're just polling. There's no good
way to find the max memory usage if you only use that much for 1/100th
of a second.
--
Darren New / San Diego, CA, USA (PST)
"That's pretty. Where's that?"
"It's the Age of Channelwood."
"We should go there on vacation some time."
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Darren New <dne### [at] san rr com> wrote:
> Tom Austin wrote:
> > I would check to see if there is a 'file' for the specific process that
> > reports the current memory usage.
> The problem with that approach is you're just polling. There's no good
> way to find the max memory usage if you only use that much for 1/100th
> of a second.
This is a real bummer. I would have wanted accurate memory usage
statistics for the small project I have been making, but it seems to
be impossible.
If anyone happens to be interested, it's this:
http://warp.povusers.org/FSBAllocator/
From all the tests I have made regarding the memory usage I'm pretty
confident that with the list-of-ints example my allocator uses half the
memory as the default allocator of libc, but I can't tell for sure
because of all these unknowns, as well as some erratic behavior I noticed
during my tests. Thus I couldn't add this statistic to the page,
unfortunately.
It's interesting, though. Sometimes C++ *can* actually be slower than
some other languages, and the reason may be in the memory allocator used
by the compiler (or, more specifically, by the standard C libraries).
It's not really so much a fault of C++ per se, but more a fault of the
system library tasked on the memory management.
Using a more efficient memory allocator can cause amazing speedups, as
I wrote on that page.
--
- Warp
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Warp wrote:
> It's not really so much a fault of C++ per se, but more a fault of the
> system library tasked on the memory management.
Yah. It *can* be a fault of the standard for a language, tho. For
example, doesn't C++ require that "destroy" and "free()" basically do
the same things to the memory allocation? In the sense that "new" isn't
allowed to look at the type and (say) overallocate a block and do its
own sub-block management type stuff? I.e., in the sense that the
standard library isn't allowed to do some of the stuff you might do
yourself by overriding "new"?
> Using a more efficient memory allocator can cause amazing speedups, as
> I wrote on that page.
That's one of the ways GC can be faster than manual memory management.
Many, many memory management libraries really suck bigtime, while a
tremendous amount of time is spent making GC fast.
--
Darren New / San Diego, CA, USA (PST)
"That's pretty. Where's that?"
"It's the Age of Channelwood."
"We should go there on vacation some time."
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Darren New <dne### [at] san rr com> wrote:
> > Using a more efficient memory allocator can cause amazing speedups, as
> > I wrote on that page.
> That's one of the ways GC can be faster than manual memory management.
> Many, many memory management libraries really suck bigtime, while a
> tremendous amount of time is spent making GC fast.
I don't see how GC would help that at all.
The problem with the default allocator is that it has to be prepared to
allocate blocks of *any* size, and it should do it as memory-efficiently
as possible (ie. wasting as little memory as possible). This makes it
pretty complicated. It needs quite a lot of bookkeeping data, and keeping
track of free blocks of memory is rather complicated.
This is even further complicated by the fact that the default memory
allocator must be thread-safe. This means that if two threads try to
allocate memory at the same time, the allocator must use mutual exclusion.
Mutual exclusion locks cause considerable overhead. In single-threaded
programs this overhead is useless, but there isn't an easy way around it.
Why would GC help this? Even GC'd memory must be prepared to allocate
blocks of any size, and it must be thread-safe. I don't think GC would
help these problems.
If I'm not completely mistaken, the major reasons why my allocator is
so much faster than the default allocator in libc is because it uses a
lot less bookkeeping data (because it only allocates blocks of fixed
size), the bookkeeping is much simpler and a lot faster to manage, and
my allocator is not thread-safe (which can be a big minus in some cases,
of course).
--
- Warp
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Warp wrote:
> The problem with the default allocator is that it has to be prepared to
> allocate blocks of *any* size, and it should do it as memory-efficiently
> as possible (ie. wasting as little memory as possible).
I suspect this latter part is what makes it slower than it could be. Not
that it's bad. It's just a trade-off.
> This is even further complicated by the fact that the default memory
> allocator must be thread-safe.
Yeah. If you're willing to over-allocate, you can reduce the contention.
> Mutual exclusion locks cause considerable overhead. In single-threaded
> programs this overhead is useless, but there isn't an easy way around it.
Linking with a different allocator that doesn't try to lock anything?
:-) Of course, you'd have to *tell* it you're thread-safe. And of
course, threads aren't part of the language (are they?) while memory
allocation is, so you're kind of hosed right there.
In Ada, for example, you could have a different heap for each thread, so
you don't get that kind of lock contention, because Ada knows about
threads. (And IIRC, the language rules forbid returning a pointer
allocated in one thread and using it in another, which I guess is the
bit that actually helps.)
> Why would GC help this? Even GC'd memory must be prepared to allocate
> blocks of any size, and it must be thread-safe. I don't think GC would
> help these problems.
Allocations can be much faster, because you don't go looking for free
space until you run out of space. If you tend to allocate a bunch of
stuff, then free almost all of it, assuming you don't have
finalizers/destructors/whateveryoucallthem to worry about, you can clean
up lots of memory without ever looking at it. (This is not uncommon in
languages where everything is actually a heap-based object, like
Smalltalk. You can measure down in the single-digit-percentages of
objects that last more than a couple of seconds on such systems. Of
course, in a hybrid system like C++ or Ada or something, one probably
makes much less garbage because most of the temporary stuff *is* on the
stack.)
So, no running thru free chains to find memory, and no need to touch
free memory when you free it, are the primary sources of speed. I.e.,
except for the GC itself, it's much simpler code than malloc and free.
Erlang, for example, has a bunch of different algorithms and a bunch of
different allocators it switches between on the fly, depending on the
behavior of your programs. And if I understand correctly, it'll use
different allocators for different threads in the same process. Since
threads don't share heaps (at least, not the fast-changing heaps) the
threading problem isn't there. (They do apparently share heaps of large
binary objects, as well as the "in-memory database" tables, which are
also GCed.) And apparently it's too expensive to actually track which
executable codes are pointed to from the stack, so you can only have two
versions of the same code at the same time, and if you purge the old
version, any process running the old version gets aborted first. But
otherwise, you could just use the return addresses to keep the
executable code live, except for the inefficiency of that.
So Erlang winds up with stuff like this,
http://erlang.org/doc/man/erts_alloc.html
because it can tell what type it's allocating for. :-) Which is why I
thought it's a shame the C++ standard says it's essentially not allowed
to take advantage of type information in "new". Of course, Erlang is
instrumented out the wazoo, because it's designed for systems you never
ever turn off, so you really do want to get messages if some program
starts using too much memory.
> If I'm not completely mistaken, the major reasons why my allocator is
> so much faster than the default allocator
Please understand that I wasn't commenting specifically on your
allocator. Earlier you asked some stuff about GC, so I thought I'd just
mention some generic GC wisdom here. Nothing in this thread should
specifically be about any particular implementation of GC or manual
allocators.
--
Darren New / San Diego, CA, USA (PST)
"That's pretty. Where's that?"
"It's the Age of Channelwood."
"We should go there on vacation some time."
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Darren New <dne### [at] san rr com> wrote:
> > Why would GC help this? Even GC'd memory must be prepared to allocate
> > blocks of any size, and it must be thread-safe. I don't think GC would
> > help these problems.
> Allocations can be much faster, because you don't go looking for free
> space until you run out of space.
So GC is basically a selfish and greedy memory allocation engine which
cares only about itself, not anything else in the same system which might
have some use for that wasted memory.
(Not that this would be anything new to me.)
Doesn't this also mean that the GC runs a lot more often than would be
strictly necessary? Isn't that kind of counter-productive? Not only is
memory wasted (and thus away from other processes), but part of the speed
benefits in doing so are lost by having to run GC sweeps more often.
--
- Warp
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Warp wrote:
> So GC is basically a selfish and greedy memory allocation engine which
> cares only about itself, not anything else in the same system which might
> have some use for that wasted memory.
You could say it's a time-space trade-off. You know, I almost never
have a situation where I have two programs running and I'm paging
because of it. :-)
> Doesn't this also mean that the GC runs a lot more often than would be
> strictly necessary?
Assuming you mean "allocating everything on the heap" when you say
"this"....
> Isn't that kind of counter-productive?
Only if your language has a stack and isn't functional and is
single-threaded, I suppose. If your language definition is that the
stack frames are stored on the heap (for, say, closure purposes), or you
need to allocate a separate stack of some size for each thread anyway,
or you tend to overwrite the same memory address with lots of different
things at different times (unlike, say, LISP, where you tend to build
new structures reusing parts of old ones instead of overwriting what you
already have), then allocating lots of short-lived stuff on the heap
makes sense.
Of course, systems like Smalltalk *do* really and truly have a stack
(now), exactly for performance reasons. It just doesn't look that way,
because it'll copy stack frames into the heap if you try to access them
as "objects". (Then again, Smalltalk didn't have conditional
statements, either, in the semantics of the language. Just messages sent
to booleans along with a couple blocks of text, and a method dispatch
that evaluated the first block in the True class and the second block in
the False class, which I thought was pretty cool. :-)
> Not only is memory wasted (and thus away from other processes),
Remember that the original GC stuff was done on systems (Lisp,
Smalltalk, etc) where the interpreter *was* the OS, so "taking away from
other processes" doesn't always make sense.
But yes, some memory is "wasted" for some meaning of the word "wasted".
Not unlike stack space is "wasted" if you're not about to overflow the
stack.
It's also "wasted" if you are willing to spend more time on expert
programmers who never make a mistake and hence never corrupt memory,
rather than spending more money to buy some RAM. And indeed, when RAM is
expensive (like in embedded devices), one tends not to see too many
GC'ed systems.
And with some OS help, you can actually get pretty decent performance
during collections. The ability to mark pages "manually" to say which
are more likely to be needed soon, the ability to trap writes to
particular pages so you can mark them as "the GC will need to examine
this on the next sweep", the ability to pre-fetch paged-out pages while
you're processing this one (i.e., read-ahead for page faults), etc.
> but part of the speed
> benefits in doing so are lost by having to run GC sweeps more often.
Running GCs too often is indeed a performance-limiting problem. That's
one of the reasons that people writing collectors spend so much time
trying to make them fast, and why there are so many different types of
collectors (compared to, say, only a couple different kinds of stacks).
--
Darren New / San Diego, CA, USA (PST)
"That's pretty. Where's that?"
"It's the Age of Channelwood."
"We should go there on vacation some time."
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|  |
|
 |
Darren New wrote:
>> Isn't that kind of counter-productive?
>
> Only if your language has a stack and isn't functional and is
> single-threaded, I suppose.
And yes, I agree, Java in this respect is kind of suckful,
efficiency-wise. I think it's at least as much because they're trying to
be portable and dynamic loading.
--
Darren New / San Diego, CA, USA (PST)
"That's pretty. Where's that?"
"It's the Age of Channelwood."
"We should go there on vacation some time."
Post a reply to this message
|
 |
|  |
|  |
|
 |
|
 |
|  |
|
 |