POV-Ray: Newsgroups: povray.off-topic: Interesting performance paper

POV-Ray : Newsgroups : povray.off-topic : Interesting performance paper		Server Time 4 Sep 2024 09:14:51 EDT (-0400)

Goto Latest 10 Messages

Next 10 Messages >>>

From: Darren New
Subject: Interesting performance paper
Date: 12 Jun 2010 12:51:49
Message: <4c13bb25$1@news.povray.org>

Kind of like what Warp talks about w.r.t. cache lines, except with VM pages.

http://queue.acm.org/detail.cfm?id=1814327

-- 
Darren New, San Diego CA, USA (PST)
    Eiffel - The language that lets you specify exactly
    that the code does what you think it does, even if
    it doesn't do what you wanted.

Post a reply to this message

From: Invisible
Subject: Re: Interesting performance paper
Date: 14 Jun 2010 05:24:52
Message: <4c15f564$1@news.povray.org>

Darren New wrote:
> Kind of like what Warp talks about w.r.t. cache lines, except with VM 
> pages.

It's all just different levels of the memory hierachy.

It used to be that you had a block of RAM, all accessible in O(1) time. 
Now you have virtual memory (which pretends to be RAM, but is orders of 
magnitude slower), maybe three levels of cache (which pretends to be 
RAM, but is orders of magnitude faster - except if it's another core's 
cache, in which case it's *slower* than RAM again due to coherance 
issues), and you could regard the CPU registers themselves as the 
ultimate level-0 "cache".

The sad fact is, processors are now 1000x faster than they used to be, 
but RAM is still roughly the same speed. And that means that if you 
don't want your expensive new CPU to just spend 1000x more cycles doing 
nothing, you need to have this memory hierachy. And *that* means that 
the time complexity of your algorithm is now *irrelevant*. The *only* 
thing that matters now is cache behaviour.

As the paper says, a stupid O(n^2) algorithm could be *faster* than a 
sophisticated algorithm with O(log n) or even O(1) time-complexity, all 
due to cache effects that shouldn't be there in the first place. Decades 
of research and development ruined. Now we have no way to predict 
performance, because it depends on cache behaviour, which intimately 
depends on the specifics of a particular implementation, not on the 
abstract properties of the algorithm used.

Possibly the worst part is that, unless you program in some horribly 
low-level language, you have no control over any of this.

Post a reply to this message

From: scott
Subject: Re: Interesting performance paper
Date: 14 Jun 2010 10:16:16
Message: <4c1639b0@news.povray.org>

> As the paper says, a stupid O(n^2) algorithm could be *faster* than a 
> sophisticated algorithm with O(log n) or even O(1) time-complexity, all 
> due to cache effects that shouldn't be there in the first place. Decades 
> of research and development ruined.

What you are referring to are mathematical papers, they do not consider 
actual implementation on a machine, they are not meant to.  Computers have 
always had different amounts of memory with different speeds and access 
patterns, there are plenty of papers that take this into account.  You 
cannot just take a mathematical paper and expect it to work as described on 
a computer, you never have been able to.

> Now we have no way to predict performance, because it depends on cache 
> behaviour, which intimately depends on the specifics of a particular 
> implementation, not on the abstract properties of the algorithm used.

Sure, but it's always been like that and always will, there's nothing you 
can do about it.

> Possibly the worst part is that, unless you program in some horribly 
> low-level language, you have no control over any of this.

If your language/compiler is clever enough to switch around the order and 
storage of your code, then it should be clever enough to optimise for the 
memory structure of the machine.

Post a reply to this message

From: Invisible
Subject: Re: Interesting performance paper
Date: 14 Jun 2010 10:41:08
Message: <4c163f84$1@news.povray.org>

>> As the paper says, a stupid O(n^2) algorithm could be *faster* than a 
>> sophisticated algorithm with O(log n) or even O(1) time-complexity, 
>> all due to cache effects that shouldn't be there in the first place. 
>> Decades of research and development ruined.
> 
> You cannot just take a mathematical paper and expect it to 
> work as described on a computer, you never have been able to.

Back in the days of the 8-bit home computer, memory access really was a 
constant-time operation that always took 1 clock cycle. Unfortunately it 
seems that the larger a memory bank is, the slower it is to access. What 
this means is that a 2.2 GHz CPU only runs at 2.2 GHz if you use a tiny 
amount of data. If you try to random-access a large chunk of data, 
suddenly the program slows down to the speed of the RAM bus.

I guess the only consolation is that my CPU's L3 cache is probably 
bigger than the entire address space of the 6502...

> Sure, but it's always been like that and always will, there's nothing 
> you can do about it.

It wasn't always like that - but, as you say, there's nothing you can do 
about it.

>> Possibly the worst part is that, unless you program in some horribly 
>> low-level language, you have no control over any of this.
> 
> If your language/compiler is clever enough to switch around the order 
> and storage of your code, then it should be clever enough to optimise 
> for the memory structure of the machine.

Sure. But if it isn't, you can't do anything about it. You can't rewrite 
the compiler, after all...

Post a reply to this message

From: clipka
Subject: Re: Interesting performance paper
Date: 14 Jun 2010 11:07:58
Message: <4c1645ce$1@news.povray.org>

Am 14.06.2010 16:41, schrieb Invisible:

>> You cannot just take a mathematical paper and expect it to work as
>> described on a computer, you never have been able to.
>
> Back in the days of the 8-bit home computer, memory access really was a
> constant-time operation that always took 1 clock cycle. Unfortunately it
> seems that the larger a memory bank is, the slower it is to access. What
> this means is that a 2.2 GHz CPU only runs at 2.2 GHz if you use a tiny
> amount of data. If you try to random-access a large chunk of data,
> suddenly the program slows down to the speed of the RAM bus.

(1) Home computers were toys.

(2) Capacity of memory basically has nothing to do with access speed. 
The effect you describe for that 2.2 GHz CPU is the effects due to L1 
and L2 caches - which aren't faster because they are smaller, but 
because (a) they're closer to the CPU core and (b) they're using faster 
(but more space-consuming) technology.

Post a reply to this message

From: Invisible
Subject: Re: Interesting performance paper
Date: 14 Jun 2010 11:12:55
Message: <4c1646f7@news.povray.org>

clipka wrote:

> (1) Home computers were toys.

I saw at least one employee using a company-purchased C64 to run 
accounting software to do actual, productive work.

They might be considered toys *now*...

> (2) Capacity of memory basically has nothing to do with access speed. 

Really? So having more bits to decode requiring more address decode 
logic doesn't affect speed in any way? And neither does a larger die 
area necessasitating longer traces?

> The effect you describe for that 2.2 GHz CPU is the effects due to L1 
> and L2 caches - which aren't faster because they are smaller, but 
> because (a) they're closer to the CPU core and (b) they're using faster 
> (but more space-consuming) technology.

I gather that part of the problem is that having traces on the 
motherboard operating into the GHz frequency range isn't very feasible. 
(For reasons such as trace length and electronic interference.) But that 
would still mean that theoretically you could make a single chip that 
contains a couple of CPU cores plus (say) 2GB of RAM. The fact that 
nobody has ever done this indicates that it isn't possible for whatever 
reason.

Post a reply to this message

From: scott
Subject: Re: Interesting performance paper
Date: 14 Jun 2010 11:37:32
Message: <4c164cbc@news.povray.org>

>> You cannot just take a mathematical paper and expect it to work as 
>> described on a computer, you never have been able to.
>
> Back in the days of the 8-bit home computer, memory access really was a 
> constant-time operation that always took 1 clock cycle.

Sure, but just as an example, what's the fastest way to sort a 10 KB array 
when you have 16 KB or RAM?  The mathematically efficient way won't 
necessarily be the quickest if you need to read/write to tape/disc rather 
than just using RAM.  Also some operations were free (eg shifting when 
adding, other obscure things like that), a general mathematical paper won't 
take that into account when telling you the "best" algorithm for a task.

>> Sure, but it's always been like that and always will, there's nothing you 
>> can do about it.
>
> It wasn't always like that -

So your early 8bit computer could write to the tape at the same speed as 
RAM? :-O

>> If your language/compiler is clever enough to switch around the order and 
>> storage of your code, then it should be clever enough to optimise for the 
>> memory structure of the machine.
>
> Sure. But if it isn't, you can't do anything about it. You can't rewrite 
> the compiler, after all...

If your compiler changes the order that data is stored, or the order you 
access it, but doesn't take into account how this affects performance, then 
IMO it's broken.  Use a different compiler that either leaves the order 
alone (so you can optimise it yourself) or optmises it for you.

Post a reply to this message

From: Invisible
Subject: Re: Interesting performance paper
Date: 14 Jun 2010 11:45:55
Message: <4c164eb3$1@news.povray.org>

>>> You cannot just take a mathematical paper and expect it to work as 
>>> described on a computer, you never have been able to.
>>
>> Back in the days of the 8-bit home computer, memory access really was 
>> a constant-time operation that always took 1 clock cycle.
> 
> Sure, but just as an example, what's the fastest way to sort a 10 KB 
> array when you have 16 KB or RAM?  The mathematically efficient way 
> won't necessarily be the quickest if you need to read/write to tape/disc 
> rather than just using RAM.

Then you use a specialised external-sort. More to the point, you *know* 
that you're going to need an external-sort. Your program doesn't just 
magically run 500x slower than you'd expected.

> Also some operations were free (eg shifting 
> when adding, other obscure things like that), a general mathematical 
> paper won't take that into account when telling you the "best" algorithm 
> for a task.

Sure. But if one algorithm takes 5,000 operations (some of which might 
be free) and another takes 72,000,000,000 operations, which one is 
fastest? On the computers of old, the answer was self-evident. Today, it 
might well be that the latter is actually faster.

>>> Sure, but it's always been like that and always will, there's nothing 
>>> you can do about it.
>>
>> It wasn't always like that -
> 
> So your early 8bit computer could write to the tape at the same speed as 
> RAM? :-O

More like in the early days, if it didn't fit in RAM, you just couldn't 
do it at all. Today if it doesn't fit in RAM, the OS makes the computer 
pretend it has more RAM than it does (with the *minor issue* of a vast 
slowdown and halving the life of your harddrive).

>>> If your language/compiler is clever enough to switch around the order 
>>> and storage of your code, then it should be clever enough to optimise 
>>> for the memory structure of the machine.
>>
>> Sure. But if it isn't, you can't do anything about it. You can't 
>> rewrite the compiler, after all...
> 
> If your compiler changes the order that data is stored, or the order you 
> access it, but doesn't take into account how this affects performance, 
> then IMO it's broken.  Use a different compiler that either leaves the 
> order alone (so you can optimise it yourself) or optmises it for you.

How about if the order just plain isn't defined in the first place? Or 
if the runtime system is storing auxhilery data which you don't know about?

Post a reply to this message

From: Darren New
Subject: Re: Interesting performance paper
Date: 14 Jun 2010 11:56:00
Message: <4c165110$1@news.povray.org>

Invisible wrote:
> but RAM is still roughly the same speed.

No it's not. RAM used to take several microseconds. Drum memory was on the 
order of miliseconds.

> the time complexity of your algorithm is now *irrelevant*. The *only* 
> thing that matters now is cache behaviour.

Only irrelevant if all the data you need fits in cache. If you have a 300Gb 
data set like the paper says, your L1 cache isn't going to be nearly as 
important.

-- 
Darren New, San Diego CA, USA (PST)
    Eiffel - The language that lets you specify exactly
    that the code does what you think it does, even if
    it doesn't do what you wanted.

Post a reply to this message

From: Darren New
Subject: Re: Interesting performance paper
Date: 14 Jun 2010 11:58:31
Message: <4c1651a7$1@news.povray.org>

clipka wrote:
> (2) Capacity of memory basically has nothing to do with access speed. 

Of course it does. You have speed-of-light delay and address line cascading 
to decode. The slowness isn't due primarily to accessing the memory cells. 
It's due to the cascade of gates you have to go thru to decode the address 
to figure out which cell. That's why video memory can be so much faster than 
CPU memory - there's an access pattern (the video beam scan) that can be 
used to speed up address decoding.

-- 
Darren New, San Diego CA, USA (PST)
    Eiffel - The language that lets you specify exactly
    that the code does what you think it does, even if
    it doesn't do what you wanted.

Post a reply to this message

Goto Latest 10 Messages

Next 10 Messages >>>