POV-Ray: Newsgroups: povray.off-topic: Processor speed

POV-Ray : Newsgroups : povray.off-topic : Processor speed		Server Time 6 Jul 2025 03:25:10 EDT (-0400)

<<< Previous 10 Messages

Goto Initial 10 Messages

From: Warp
Subject: Re: Processor speed
Date: 28 Oct 2008 19:27:34
Message: <49079fe6@news.povray.org>

Orchid XP v8 <voi### [at] devnull> wrote:
> Well, no. You can use large arrays and access them in a specific order 
> in an attempt to improve cache coherancy. But beyond that, in a GC 
> language where the runtime randomly allocates and rearranges data in RAM 
> from time to time, you really have very little control.

  If the GC'd language doesn't reorder elements in memory to be more
cache-friendly, that can have a rather large penalty in speed.

  The speed difference between cache-friendly and cache-unfriendly code
can be surprisingly large, even if both do the exact same thing otherwise.
We are talking about something of an order of magnitude slower at worst.
Cache misses are *very* expensive.

-- 
                                                          - Warp

Post a reply to this message

From: Darren New
Subject: Re: Processor speed
Date: 28 Oct 2008 19:52:02
Message: <4907a5a2$1@news.povray.org>

Warp wrote:
>   Intel processors have reordered instructions automatically since the
> Pentium Pro (AFAIR).

I think getting the compiler to do this was one of the basic ideas 
behind RISC, until they figured out that it meant you had to recompile 
your source code for every minor rev of your CPU. :-) I even remember 
that the compiler was supposed to know how long instructions took, so 
you didn't have any hardware to wait for an instruction to complete 
before you started using its results.

A lot of the RISC stuff that sounded real cool actually turned out to be 
a bad idea if you wanted to run the same code on multiple compatible CPUs.

 >   If the GC'd language doesn't reorder elements in memory to be more
 > cache-friendly, that can have a rather large penalty in speed.

Hmmmm... I've never heard of that. I wonder how easy it is to figure out 
how to arrange things (based on compile-time or run-time information 
rather than intuitive understanding of the algorithm) to make cache 
misses mless common.

-- 
Darren New / San Diego, CA, USA (PST)

Post a reply to this message

From: Warp
Subject: Re: Processor speed
Date: 28 Oct 2008 20:29:08
Message: <4907ae54@news.povray.org>

Darren New <dne### [at] sanrrcom> wrote:
>  >   If the GC'd language doesn't reorder elements in memory to be more
>  > cache-friendly, that can have a rather large penalty in speed.

> Hmmmm... I've never heard of that. I wonder how easy it is to figure out 
> how to arrange things (based on compile-time or run-time information 
> rather than intuitive understanding of the algorithm) to make cache 
> misses mless common.

  I would say that in the general case it's extremely difficult. It could
be rather easy in simple cases, though.

  For example, if you are eg. traversing a list from the beginning to the
end, and doing that a lot of times, then if the list elements are in
sequential order in memory (and one right after the other), it will be
much more cache-friendly than if the elements are located randomly in
memory. Theoretically a compiler could perhaps be able to optimize such
lists (even if it means reordering memory blocks for that to be possible,
as memory fragmentation can naturally cause list elements to be scattered
randomly in memory).

  The absolute worst case scenario would be that the list is so large that
it does not fit entirely in the cache, the elements are scattered randomly
in memory, far apart from each other (they don't even have to be a lot
apart from each other; I don't remember how large a typical cache block
is, but it was something like 32 bytes or so), and the list is traversed
several times. This causes each single list element access to cause a
cache miss, which makes it really slow.

  Sometimes you could make your program faster by simply doing things a
bit differently. For example, if you are running an algorithm on a list
several times, and the algorithm doesn't actually require you to traverse
the list completely in order to get partial results, you could run the
algorithm on only a part of the list the necessary number of times, after
which you run it on the next part of the list, etc. If the size of the
partial list is small enough, all of its elements will fit in the cache
and the subsequent passes will become a lot faster. This should be so
even if all the list elements are scattered randomly in memory (it causes
the first traversal of the partial list to be slow, but subsequent
traversals should be fast because all the elements will be in the cache).

-- 
                                                          - Warp

Post a reply to this message

From: Invisible
Subject: Re: Processor speed
Date: 29 Oct 2008 05:13:48
Message: <4908294c$1@news.povray.org>

Warp wrote:

>   The absolute worst case scenario would be that the list is so large that
> it does not fit entirely in the cache, the elements are scattered randomly
> in memory, far apart from each other (they don't even have to be a lot
> apart from each other; I don't remember how large a typical cache block
> is, but it was something like 32 bytes or so), and the list is traversed
> several times. This causes each single list element access to cause a
> cache miss, which makes it really slow.

I'm told there are special processor intructions for accessing memory 
locations where you "know" that the data will only be read once, so 
there's no point attempting to cache it. This way you can save the cache 
for stuff that *will* be accessed multiple times.

Of course, unless you're programming in assembly, this isn't much help.

>   Sometimes you could make your program faster by simply doing things a
> bit differently. For example, if you are running an algorithm on a list
> several times, and the algorithm doesn't actually require you to traverse
> the list completely in order to get partial results, you could run the
> algorithm on only a part of the list the necessary number of times, after
> which you run it on the next part of the list, etc. If the size of the
> partial list is small enough, all of its elements will fit in the cache
> and the subsequent passes will become a lot faster. This should be so
> even if all the list elements are scattered randomly in memory (it causes
> the first traversal of the partial list to be slow, but subsequent
> traversals should be fast because all the elements will be in the cache).

This is the kind of thing that Haskell does to you automatically. Of 
course, Haskell "lists" are linked lists, scattered over all of the heap 
(and periodically moving around). I don't know much about how CPU caches 
work, but I imagine having the data scattered over several memory pages 
isn't very cache-friendly.

Post a reply to this message

From: Darren New
Subject: Re: Processor speed
Date: 29 Oct 2008 13:19:34
Message: <49089b26$1@news.povray.org>

Invisible wrote:
> Of course, unless you're programming in assembly, this isn't much help.

Or if your compiler is smart enough. For example, C's memset(), which 
just assigns a specific value to a range of locations (say, to zero out 
an array) can often be coded to use instructions that say "I'm going to 
clobber this whole cache line, so don't read it."  I guess that counts 
as writing in assembly if you consider the authors of the routine, but 
you don't need to do that to use it.

> I don't know much about how CPU caches 
> work, but I imagine having the data scattered over several memory pages 
> isn't very cache-friendly.

I don't think it's a matter of being spread over several pages, but a 
matter of having small pieces (much smaller than a cache line) being 
randomly accessed.

I.e., when you read xyz[37] as an integer in C (that's the 38'th element 
of the array of integers), the cache may pull in 
xyz[35],xyz[36],...,xyz[39] in one memory cycle. If you're walking 
sequentially through the array, this is helpful. If you're jumping all 
over, this isn't helpful (altho maybe not harmful, depending on what 
your access just replaces).  If you're using linked lists, even going 
sequentially will cause the cache accesses to be unhelpful, because 
xyz[37] isn't next to xyz[38].

-- 
Darren New / San Diego, CA, USA (PST)

Post a reply to this message

From: Invisible
Subject: Re: Processor speed
Date: 31 Oct 2008 04:52:44
Message: <490ac75c@news.povray.org>

Darren New wrote:

>> I don't know much about how CPU caches work, but I imagine having the 
>> data scattered over several memory pages isn't very cache-friendly.
> 
> I don't think it's a matter of being spread over several pages, but a 
> matter of having small pieces (much smaller than a cache line) being 
> randomly accessed.

How wide is a typical cache line?

IIRC, a Haskell linked list node is 3 pointers wide.

> If you're using linked lists, even going 
> sequentially will cause the cache accesses to be unhelpful, because 
> xyz[37] isn't next to xyz[38].

Although, if they happen to be allocated sequentially, they might 
possibly be...

Post a reply to this message

From: Darren New
Subject: Re: Processor speed
Date: 31 Oct 2008 13:50:03
Message: <490b454b$1@news.povray.org>

Invisible wrote:
> How wide is a typical cache line?


Warp said 32 bytes. I thought that was way smaller than I remembered, 
but I went and variously tracked it down, and indeed, most of the modern 
machines seem to have 32-byte cache lines.  I think they used to be 
wider, like 128 bytes, but maybe I'm misremembering, or thinking of 
mainframes, or something.

-- 
Darren New / San Diego, CA, USA (PST)

Post a reply to this message

From: Orchid XP v8
Subject: Re: Processor speed
Date: 31 Oct 2008 14:21:34
Message: <490b4cae$1@news.povray.org>

>> How wide is a typical cache line?
> 
> 
> Warp said 32 bytes. I thought that was way smaller than I remembered, 
> but I went and variously tracked it down, and indeed, most of the modern 
> machines seem to have 32-byte cache lines.  I think they used to be 
> wider, like 128 bytes, but maybe I'm misremembering, or thinking of 
> mainframes, or something.

Right. So just to make sure I understand this... you're saying the cache 
is managed in chunks of 32 bytes at a time? Like, if you fetch 1 byte of 
data from memory, 32 bytes will be loaded into the least recently used 
cache line? (Or whatever the cache policy is.)

Are the cache lines aligned in any sort of way? (I'm wondering if it 
grabs 32 bytes at the nearest 32-byte boundary or not.)

-- 
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*

Post a reply to this message

From: Darren New
Subject: Re: Processor speed
Date: 31 Oct 2008 15:59:14
Message: <490b6392$1@news.povray.org>

Orchid XP v8 wrote:
> Right. So just to make sure I understand this... you're saying the cache 
> is managed in chunks of 32 bytes at a time? Like, if you fetch 1 byte of 
> data from memory, 32 bytes will be loaded into the least recently used 
> cache line? (Or whatever the cache policy is.)

That's how I understand it. I believe it's more accurate to say that the 
bus between the memory and the registers is actually 32-bytes wide. I 
may be mistaken in that, tho.

> Are the cache lines aligned in any sort of way? (I'm wondering if it 
> grabs 32 bytes at the nearest 32-byte boundary or not.)

I'm also pretty certain it is on 32-byte boundaries. Otherwise, the 
addressing logic winds up being slower than the time you save caching stuff.

-- 
Darren New / San Diego, CA, USA (PST)

Post a reply to this message

From: Warp
Subject: Re: Processor speed
Date: 31 Oct 2008 17:55:04
Message: <490b7eb8@news.povray.org>

Orchid XP v8 <voi### [at] devnull> wrote:
> Right. So just to make sure I understand this... you're saying the cache 
> is managed in chunks of 32 bytes at a time? Like, if you fetch 1 byte of 
> data from memory, 32 bytes will be loaded into the least recently used 
> cache line? (Or whatever the cache policy is.)

> Are the cache lines aligned in any sort of way? (I'm wondering if it 
> grabs 32 bytes at the nearest 32-byte boundary or not.)

  You should read this article series if you want to know everything there
is to know about memory and cache:

http://lwn.net/Articles/250967/

-- 
                                                          - Warp

Post a reply to this message

<<< Previous 10 Messages

Goto Initial 10 Messages