POV-Ray: Newsgroups: povray.off-topic: Processor speed: Re: Processor speed

POV-Ray : Newsgroups : povray.off-topic : Processor speed : Re: Processor speed		Server Time 5 Jul 2025 15:44:24 EDT (-0400)

From: Warp
Date: 28 Oct 2008 20:29:08
Message: <4907ae54@news.povray.org>

Darren New <dne### [at] sanrrcom> wrote:
>  >   If the GC'd language doesn't reorder elements in memory to be more
>  > cache-friendly, that can have a rather large penalty in speed.

> Hmmmm... I've never heard of that. I wonder how easy it is to figure out 
> how to arrange things (based on compile-time or run-time information 
> rather than intuitive understanding of the algorithm) to make cache 
> misses mless common.

  I would say that in the general case it's extremely difficult. It could
be rather easy in simple cases, though.

  For example, if you are eg. traversing a list from the beginning to the
end, and doing that a lot of times, then if the list elements are in
sequential order in memory (and one right after the other), it will be
much more cache-friendly than if the elements are located randomly in
memory. Theoretically a compiler could perhaps be able to optimize such
lists (even if it means reordering memory blocks for that to be possible,
as memory fragmentation can naturally cause list elements to be scattered
randomly in memory).

  The absolute worst case scenario would be that the list is so large that
it does not fit entirely in the cache, the elements are scattered randomly
in memory, far apart from each other (they don't even have to be a lot
apart from each other; I don't remember how large a typical cache block
is, but it was something like 32 bytes or so), and the list is traversed
several times. This causes each single list element access to cause a
cache miss, which makes it really slow.

  Sometimes you could make your program faster by simply doing things a
bit differently. For example, if you are running an algorithm on a list
several times, and the algorithm doesn't actually require you to traverse
the list completely in order to get partial results, you could run the
algorithm on only a part of the list the necessary number of times, after
which you run it on the next part of the list, etc. If the size of the
partial list is small enough, all of its elements will fit in the cache
and the subsequent passes will become a lot faster. This should be so
even if all the list elements are scattered randomly in memory (it causes
the first traversal of the partial list to be slow, but subsequent
traversals should be fast because all the elements will be in the cache).

-- 
                                                          - Warp

Post a reply to this message