|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
http://lwn.net/Articles/250967/
An interesting read, I thought.
[See what I did there?]
Certainly much of this document is pretty much news to me. Last time I
looked into this stuff, the bus from the CPU to RAM ran at the same
speed as the rest of the system, and if you had a 16-bit address space,
you had 16 physical wires.
Also, last time I looked into this, CPUs ran at about 1 MHz. ;-)
I was aware that these days the RAM is about 20x slower than the CPU,
and that's why the CPU has to have a cache the size of a small planet to
achieve any performance whatsoever. But it's certainly news to me that
these days the communication involves a complex *protocol* instead of
just a direct passive electronic connection. Or that the refresh signal
has to come from outside the memory subsystem. Now I see why a "memory
controller" needs to exist.
And now they tell me that the latest designs use a serial link... this
seems entirely counter-intuitive. Sure, a serial link can be driven at a
higher clock rate more easily, but it would have to be *orders of
magnitude* higher to overcome the inherant slowness of a serial link in
the first place! Hmm, well I guess these guys must know what they're
doing, even if it doesn't make any sense. ;-)
I must confess to being comfused about "fully associative" and "set
associative".
The info on TLBs was quite interesting too. And all that NUMA stuff too.
But you begin to wonder what us programmers can actually *do* about all
this stuff.
Most especially, when the document talks about laying out data in memory
in a certain way, and performing loops in a certain order to optimise
cache usage, I can't help thinking that although that might be quite
easy in C or C++, it's going to be 100% impossible in any moderately
high-level language. (Can you spell "garbage collection"?)
Obviously, the idea of high-level languages is that you work at a higher
level of abstraction. And that means, for example, that you might
implement some O(log n) algorithm instead of an O(n^3) algorithm,
[thereby making your code *millions* of times after], rather than
fussing about cache hits and maybe gaining a few *hundred* percent speed
increase. But even so...
Also, this explains some of the weird benchmark results I've been
seeing. I wrote a merge-sort and made it run in parallel, and was rather
perplexed to see that this is *slower* than a purely sequential
algorithm. Now I begin to see why: sorting integers involves grabbing a
pair of data values, performing a simple numerical comparison, and then
moving them to somewhere else. I would imagine it's bounded by RAM
bandwidth rather than CPU power. Maybe trying to run two sorts in
parallel just saturates the FSB bandwidth twice as fast, so each CPU
spends longer waiting around?
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
|
| |
| |
|
|
From: Gail Shaw
Subject: Re: "What every programmer should know about RAM"
Date: 21 Apr 2008 12:14:14
Message: <480cbd56@news.povray.org>
|
|
|
| |
| |
|
|
"Invisible" <voi### [at] devnull> wrote in message
news:480cb7fe$1@news.povray.org...
>
> The info on TLBs was quite interesting too. And all that NUMA stuff too.
> But you begin to wonder what us programmers can actually *do* about all
> this stuff.
NUMA's pretty important when writing server-type apps. The main point around
Numa is that memory local to the node is orders of magnitude faster to
access than remote memory.
I don't know anything about NUMA support in the unix/Mac OS. I'm sure there
must be some.
In windows, the memory allocation APIs must be called differently when
working with NUMA than with symetric multiprocessor archtectures. If you're
interested, I can dig some details out of a book I have. It's at the office,
so I can only check tomorrow.
Post a reply to this message
|
|
| |
| |
|
|
From: Gail Shaw
Subject: Re: "What every programmer should know about RAM"
Date: 21 Apr 2008 12:25:17
Message: <480cbfed@news.povray.org>
|
|
|
| |
| |
|
|
"Invisible" <voi### [at] devnull> wrote in message
news:480cb7fe$1@news.povray.org...
>
> I was aware that these days the RAM is about 20x slower than the CPU,
> and that's why the CPU has to have a cache the size of a small planet to
> achieve any performance whatsoever.
Cache misses are the bane of modern processors. It's one reason why the
hyperthreaded CPUs performed really well for some scenarios and badly for
others (eg SQL server). The two virtual cores share L1 cache, so if one is
moving lots of data around, the other one suffers repeated cache misses.
Dual cores have seperate L1 cache and sometimes seperate L2 cache as well.
Sometimes the L2 is shared, but has usage limits for the two cores.
I can't remember which specific CPUs had which architecture. It's been
several months since I last looked at this.
Post a reply to this message
|
|
| |
| |
|
|
From: Warp
Subject: Re: "What every programmer should know about RAM"
Date: 21 Apr 2008 13:06:17
Message: <480cc988@news.povray.org>
|
|
|
| |
| |
|
|
Invisible <voi### [at] devnull> wrote:
> I was aware that these days the RAM is about 20x slower than the CPU,
> and that's why the CPU has to have a cache the size of a small planet to
> achieve any performance whatsoever. But it's certainly news to me that
> these days the communication involves a complex *protocol* instead of
> just a direct passive electronic connection. Or that the refresh signal
> has to come from outside the memory subsystem. Now I see why a "memory
> controller" needs to exist.
It becomes even more complicated when more than one processor (or core)
needs to access the *same* RAM, which is the whole idea in SMP. Nobody
gives it a second thought, but if you think about it, you'll quickly
notice that it's a very HARD problem. The problem is made extremely
harder because of the fact that all the processors/cores usually have
their own L1 cache, and this L1 cache must be in sync with the L1 cache
of all the other processors/cores if those have the same data.
There's at least one opcode, more precisely cmpxchg, which Intel
guarantees to be atomic. That means that no two processors/cores will
compare&exchange the same memory location precisely at the same time.
Now, try to figure out *how* they do that, given that the memory location
being compared&exchanged may be in the processor's/core's local L1 cache.
You'll quickly see this requires quite a lot more than simple dumb
passive electronic connections.
> And now they tell me that the latest designs use a serial link... this
> seems entirely counter-intuitive.
It's a cost-effective necessity. The theoretical optimal situation
would be if each core was directly connected to the memory controller
with 64 (or 128 or whatever) wires. However, that would make the
memory controller *very* complicated and *huge* (just imagine having,
for example 8 cores, each connecting directly to the memory controller
with 64 or 128 wires each), which translates to expensive and, in some
cases, counter-productive.
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
From: Orchid XP v8
Subject: Re: "What every programmer should know about RAM"
Date: 21 Apr 2008 14:36:08
Message: <480cde98$1@news.povray.org>
|
|
|
| |
| |
|
|
Warp wrote:
> It becomes even more complicated when more than one processor (or core)
> needs to access the *same* RAM, which is the whole idea in SMP.
Oh yeah - that whole cache coherancy thing is a pretty big deal. (And
one which the article goes into quite a bit of detail about later on.)
Even more fun is trying to provide "atomic" access operations - which,
as far as I can tell, absolutely *must* be done at the hardware level.
There's just no way you could implement it in software.
> You'll quickly see this requires quite a lot more than simple dumb
> passive electronic connections.
Bus arbitrartion, if nothing else...
>> And now they tell me that the latest designs use a serial link... this
>> seems entirely counter-intuitive.
>
> It's a cost-effective necessity. The theoretical optimal situation
> would be if each core was directly connected to the memory controller
> with 64 (or 128 or whatever) wires. However, that would make the
> memory controller *very* complicated and *huge* (just imagine having,
> for example 8 cores, each connecting directly to the memory controller
> with 64 or 128 wires each), which translates to expensive and, in some
> cases, counter-productive.
I still kinda wish you could actually build a PC that had several GB of
RAM running at the same speed as the CPU - but I'm guessing it might be,
uh, slightly expensive?
[Presumably the only way it could be even remotely possible is if all
the RAM was on the same die as the CPU. Damn, the die area would have to
be *vast*! I hypothesize it would also eat electricity like candy, and
perhaps get slightly warm too...]
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
|
| |
| |
|
|
From: Orchid XP v8
Subject: Re: "What every programmer should know about RAM"
Date: 21 Apr 2008 14:38:06
Message: <480cdf0e$1@news.povray.org>
|
|
|
| |
| |
|
|
>> The info on TLBs was quite interesting too. And all that NUMA stuff too.
>> But you begin to wonder what us programmers can actually *do* about all
>> this stuff.
>
> NUMA's pretty important when writing server-type apps. The main point around
> Numa is that memory local to the node is orders of magnitude faster to
> access than remote memory.
Yeah, NUMA is one area where it's pretty obvious that you should be
explicitly writing your application to take into account the [vast]
differences in access speed.
I was thinking more about all the cache coherancy stuff. Unless you're
writing in C or assembly, you really don't have much influence on where
your data gets put and in what arrangement. Taking a language like
Java... damn, the CPU must spend *so much* time chasing all those
pointers that 100% cannot be pre-fetched... sheesh!
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
"Orchid XP v8" <voi### [at] devnull> wrote
> [Presumably the only way it could be even remotely possible is if all
> the RAM was on the same die as the CPU. Damn, the die area would have to
> be *vast*! I hypothesize it would also eat electricity like candy, and
> perhaps get slightly warm too...]
Better yet, move the CPU(s) over to RAM. That's how neural nets work anyway
(including the brain).
Post a reply to this message
|
|
| |
| |
|
|
From: Orchid XP v8
Subject: Re: "What every programmer should know about RAM"
Date: 21 Apr 2008 15:02:11
Message: <480ce4b3$1@news.povray.org>
|
|
|
| |
| |
|
|
>> [Presumably the only way it could be even remotely possible is if all
>> the RAM was on the same die as the CPU. Damn, the die area would have to
>> be *vast*! I hypothesize it would also eat electricity like candy, and
>> perhaps get slightly warm too...]
>
> Better yet, move the CPU(s) over to RAM. That's how neural nets work anyway
> (including the brain).
I've had the self-same thought myself - several times. The problem, I
suspect, would be moving data around fast enough...
(A wise man once said "a super-computer is a device for turning a
compute-bound problem into an I/O-bound problem".)
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
|
| |
| |
|
|
From: Gail Shaw
Subject: Re: "What every programmer should know about RAM"
Date: 21 Apr 2008 15:45:21
Message: <480ceed1@news.povray.org>
|
|
|
| |
| |
|
|
"Orchid XP v8" <voi### [at] devnull> wrote in message
news:480cde98$1@news.povray.org...
>
> I still kinda wish you could actually build a PC that had several GB of
> RAM running at the same speed as the CPU - but I'm guessing it might be,
> uh, slightly expensive?
Very slightly...
To give you an idea, I've jsut bought components for a new PC. CPU is a quad
core running at 2.4 GHz.
I can get 4 GB of 800 MHz DDR2 memory for around R250. I can get 2 GB of
1066 MHz memory (same speed as the FSB) for around R800
Post a reply to this message
|
|
| |
| |
|
|
From: Orchid XP v8
Subject: Re: "What every programmer should know about RAM"
Date: 21 Apr 2008 16:03:20
Message: <480cf308$1@news.povray.org>
|
|
|
| |
| |
|
|
>> I still kinda wish you could actually build a PC that had several GB of
>> RAM running at the same speed as the CPU - but I'm guessing it might be,
>> uh, slightly expensive?
>
> Very slightly...
> To give you an idea, I've jsut bought components for a new PC. CPU is a quad
> core running at 2.4 GHz.
> I can get 4 GB of 800 MHz DDR2 memory for around R250. I can get 2 GB of
> 1066 MHz memory (same speed as the FSB) for around R800
Except that - as the guy pointed out - it's *not* the same speed as the
CPU. It's actually 566 MHz, double-pumped. And the RAM itself is running
at a lowly 266 MHz. No wonder it takes > 200 cycles to access main
memory... :-S
Still, if you had 2 GB of true multi-GHz Static RAM, you'd need an
address decoder the size of a small planet. The ripple time would
probably slow it right back down anyway...
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|