|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Kind of like what Warp talks about w.r.t. cache lines, except with VM pages.
http://queue.acm.org/detail.cfm?id=1814327
--
Darren New, San Diego CA, USA (PST)
Eiffel - The language that lets you specify exactly
that the code does what you think it does, even if
it doesn't do what you wanted.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Darren New wrote:
> Kind of like what Warp talks about w.r.t. cache lines, except with VM
> pages.
It's all just different levels of the memory hierachy.
It used to be that you had a block of RAM, all accessible in O(1) time.
Now you have virtual memory (which pretends to be RAM, but is orders of
magnitude slower), maybe three levels of cache (which pretends to be
RAM, but is orders of magnitude faster - except if it's another core's
cache, in which case it's *slower* than RAM again due to coherance
issues), and you could regard the CPU registers themselves as the
ultimate level-0 "cache".
The sad fact is, processors are now 1000x faster than they used to be,
but RAM is still roughly the same speed. And that means that if you
don't want your expensive new CPU to just spend 1000x more cycles doing
nothing, you need to have this memory hierachy. And *that* means that
the time complexity of your algorithm is now *irrelevant*. The *only*
thing that matters now is cache behaviour.
As the paper says, a stupid O(n^2) algorithm could be *faster* than a
sophisticated algorithm with O(log n) or even O(1) time-complexity, all
due to cache effects that shouldn't be there in the first place. Decades
of research and development ruined. Now we have no way to predict
performance, because it depends on cache behaviour, which intimately
depends on the specifics of a particular implementation, not on the
abstract properties of the algorithm used.
Possibly the worst part is that, unless you program in some horribly
low-level language, you have no control over any of this.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
> As the paper says, a stupid O(n^2) algorithm could be *faster* than a
> sophisticated algorithm with O(log n) or even O(1) time-complexity, all
> due to cache effects that shouldn't be there in the first place. Decades
> of research and development ruined.
What you are referring to are mathematical papers, they do not consider
actual implementation on a machine, they are not meant to. Computers have
always had different amounts of memory with different speeds and access
patterns, there are plenty of papers that take this into account. You
cannot just take a mathematical paper and expect it to work as described on
a computer, you never have been able to.
> Now we have no way to predict performance, because it depends on cache
> behaviour, which intimately depends on the specifics of a particular
> implementation, not on the abstract properties of the algorithm used.
Sure, but it's always been like that and always will, there's nothing you
can do about it.
> Possibly the worst part is that, unless you program in some horribly
> low-level language, you have no control over any of this.
If your language/compiler is clever enough to switch around the order and
storage of your code, then it should be clever enough to optimise for the
memory structure of the machine.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
>> As the paper says, a stupid O(n^2) algorithm could be *faster* than a
>> sophisticated algorithm with O(log n) or even O(1) time-complexity,
>> all due to cache effects that shouldn't be there in the first place.
>> Decades of research and development ruined.
>
> You cannot just take a mathematical paper and expect it to
> work as described on a computer, you never have been able to.
Back in the days of the 8-bit home computer, memory access really was a
constant-time operation that always took 1 clock cycle. Unfortunately it
seems that the larger a memory bank is, the slower it is to access. What
this means is that a 2.2 GHz CPU only runs at 2.2 GHz if you use a tiny
amount of data. If you try to random-access a large chunk of data,
suddenly the program slows down to the speed of the RAM bus.
I guess the only consolation is that my CPU's L3 cache is probably
bigger than the entire address space of the 6502...
> Sure, but it's always been like that and always will, there's nothing
> you can do about it.
It wasn't always like that - but, as you say, there's nothing you can do
about it.
>> Possibly the worst part is that, unless you program in some horribly
>> low-level language, you have no control over any of this.
>
> If your language/compiler is clever enough to switch around the order
> and storage of your code, then it should be clever enough to optimise
> for the memory structure of the machine.
Sure. But if it isn't, you can't do anything about it. You can't rewrite
the compiler, after all...
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Am 14.06.2010 16:41, schrieb Invisible:
>> You cannot just take a mathematical paper and expect it to work as
>> described on a computer, you never have been able to.
>
> Back in the days of the 8-bit home computer, memory access really was a
> constant-time operation that always took 1 clock cycle. Unfortunately it
> seems that the larger a memory bank is, the slower it is to access. What
> this means is that a 2.2 GHz CPU only runs at 2.2 GHz if you use a tiny
> amount of data. If you try to random-access a large chunk of data,
> suddenly the program slows down to the speed of the RAM bus.
(1) Home computers were toys.
(2) Capacity of memory basically has nothing to do with access speed.
The effect you describe for that 2.2 GHz CPU is the effects due to L1
and L2 caches - which aren't faster because they are smaller, but
because (a) they're closer to the CPU core and (b) they're using faster
(but more space-consuming) technology.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
clipka wrote:
> (1) Home computers were toys.
I saw at least one employee using a company-purchased C64 to run
accounting software to do actual, productive work.
They might be considered toys *now*...
> (2) Capacity of memory basically has nothing to do with access speed.
Really? So having more bits to decode requiring more address decode
logic doesn't affect speed in any way? And neither does a larger die
area necessasitating longer traces?
> The effect you describe for that 2.2 GHz CPU is the effects due to L1
> and L2 caches - which aren't faster because they are smaller, but
> because (a) they're closer to the CPU core and (b) they're using faster
> (but more space-consuming) technology.
I gather that part of the problem is that having traces on the
motherboard operating into the GHz frequency range isn't very feasible.
(For reasons such as trace length and electronic interference.) But that
would still mean that theoretically you could make a single chip that
contains a couple of CPU cores plus (say) 2GB of RAM. The fact that
nobody has ever done this indicates that it isn't possible for whatever
reason.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
>> You cannot just take a mathematical paper and expect it to work as
>> described on a computer, you never have been able to.
>
> Back in the days of the 8-bit home computer, memory access really was a
> constant-time operation that always took 1 clock cycle.
Sure, but just as an example, what's the fastest way to sort a 10 KB array
when you have 16 KB or RAM? The mathematically efficient way won't
necessarily be the quickest if you need to read/write to tape/disc rather
than just using RAM. Also some operations were free (eg shifting when
adding, other obscure things like that), a general mathematical paper won't
take that into account when telling you the "best" algorithm for a task.
>> Sure, but it's always been like that and always will, there's nothing you
>> can do about it.
>
> It wasn't always like that -
So your early 8bit computer could write to the tape at the same speed as
RAM? :-O
>> If your language/compiler is clever enough to switch around the order and
>> storage of your code, then it should be clever enough to optimise for the
>> memory structure of the machine.
>
> Sure. But if it isn't, you can't do anything about it. You can't rewrite
> the compiler, after all...
If your compiler changes the order that data is stored, or the order you
access it, but doesn't take into account how this affects performance, then
IMO it's broken. Use a different compiler that either leaves the order
alone (so you can optimise it yourself) or optmises it for you.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
>>> You cannot just take a mathematical paper and expect it to work as
>>> described on a computer, you never have been able to.
>>
>> Back in the days of the 8-bit home computer, memory access really was
>> a constant-time operation that always took 1 clock cycle.
>
> Sure, but just as an example, what's the fastest way to sort a 10 KB
> array when you have 16 KB or RAM? The mathematically efficient way
> won't necessarily be the quickest if you need to read/write to tape/disc
> rather than just using RAM.
Then you use a specialised external-sort. More to the point, you *know*
that you're going to need an external-sort. Your program doesn't just
magically run 500x slower than you'd expected.
> Also some operations were free (eg shifting
> when adding, other obscure things like that), a general mathematical
> paper won't take that into account when telling you the "best" algorithm
> for a task.
Sure. But if one algorithm takes 5,000 operations (some of which might
be free) and another takes 72,000,000,000 operations, which one is
fastest? On the computers of old, the answer was self-evident. Today, it
might well be that the latter is actually faster.
>>> Sure, but it's always been like that and always will, there's nothing
>>> you can do about it.
>>
>> It wasn't always like that -
>
> So your early 8bit computer could write to the tape at the same speed as
> RAM? :-O
More like in the early days, if it didn't fit in RAM, you just couldn't
do it at all. Today if it doesn't fit in RAM, the OS makes the computer
pretend it has more RAM than it does (with the *minor issue* of a vast
slowdown and halving the life of your harddrive).
>>> If your language/compiler is clever enough to switch around the order
>>> and storage of your code, then it should be clever enough to optimise
>>> for the memory structure of the machine.
>>
>> Sure. But if it isn't, you can't do anything about it. You can't
>> rewrite the compiler, after all...
>
> If your compiler changes the order that data is stored, or the order you
> access it, but doesn't take into account how this affects performance,
> then IMO it's broken. Use a different compiler that either leaves the
> order alone (so you can optimise it yourself) or optmises it for you.
How about if the order just plain isn't defined in the first place? Or
if the runtime system is storing auxhilery data which you don't know about?
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Invisible wrote:
> but RAM is still roughly the same speed.
No it's not. RAM used to take several microseconds. Drum memory was on the
order of miliseconds.
> the time complexity of your algorithm is now *irrelevant*. The *only*
> thing that matters now is cache behaviour.
Only irrelevant if all the data you need fits in cache. If you have a 300Gb
data set like the paper says, your L1 cache isn't going to be nearly as
important.
--
Darren New, San Diego CA, USA (PST)
Eiffel - The language that lets you specify exactly
that the code does what you think it does, even if
it doesn't do what you wanted.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
clipka wrote:
> (2) Capacity of memory basically has nothing to do with access speed.
Of course it does. You have speed-of-light delay and address line cascading
to decode. The slowness isn't due primarily to accessing the memory cells.
It's due to the cascade of gates you have to go thru to decode the address
to figure out which cell. That's why video memory can be so much faster than
CPU memory - there's an access pattern (the video beam scan) that can be
used to speed up address decoding.
--
Darren New, San Diego CA, USA (PST)
Eiffel - The language that lets you specify exactly
that the code does what you think it does, even if
it doesn't do what you wanted.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|