POV-Ray: Newsgroups: povray.general: Random thoughts about povray and xml: Re: Random thoughts about povray and xml

POV-Ray : Newsgroups : povray.general : Random thoughts about povray and xml : Re: Random thoughts about povray and xml		Server Time 30 Jun 2025 18:59:38 EDT (-0400)

From: Fredrik Eriksson
Date: 23 Mar 2004 18:25:30
Message: <opr5b8utlzzjc5hb@news.povray.org>

On Tue, 23 Mar 2004 22:24:46 +0100, Thorsten Froehlich <tho### [at] trfde> 
wrote:
> In article <opr5b0xzcvzjc5hb@news.povray.org> , Fredrik Eriksson
> <noo### [at] nowherecom>  wrote:
>> Integer multiplies are slower than FP multiplies.
>
> No, they aren't in general.  It all depends on the number of bits you
> multiply (well, not really as an multiplication can be fully pipelined so
> "only" latency increases), and thus a 32 bit integer multiplication will 
> be faster than the multiplications needed to handle 64 bit floats.

I'm not so sure.
Without any dependencies (i.e. we ignore latency), IMUL has a throughput
of either 3 or 5 (depending on the instruction form). FMUL has a throughput
of 2, making it the winnner here.
Taking dependencies into account, IMUL has a latency of 14-18 (again,
depending on instruction form). The latency of FMUL is 7.
Note that this is regardless of operand size. Only one form of IMUL is
listed as having variable latency: 15-18 for the implicit-operand form.

Integer multiplies simply have horrible performance on pre-Prescott P4s.

For those readers who are unfamiliar with the terminology used:
- Throughput is how many cycles it takes before another, similar 
instruction
can start executing.
- Latency is how many cycles it takes before the result of the operation
becomes available (i.e. the instruction completes).

> Anyway, you are right, we are a bit off-topic :-)

Then let's get back on topic...

How do the changes from Northwood to Prescott affect the performance of
POV-Ray?

First, check out http://www.anandtech.com/cpu/showdoc.html?i=1956&p=22
It shows Northwood beating Prescott when rendering with 3DSMax and
Lightwave, even with a handicap of 0.2 GHz.

Looking through the instruction timings, I see only a few instructions
being "faster" in Prescott: integer multiplies and bitwise shifts/rotates,
none of which (I assume) are significant for POV-Ray's performance.
All other instructions are either the same or slower, in some cases
a lot slower.
FMUL is a little slower (latency 8 instead of 7) with equal throughput.
Double precision FDIV is also slower, both in latency and throughput
(40 from 38).
FSQRT has the exact same timings as FDIV.
The corresponding SSE2 instructions have suffered the same fate,
i.e. latencies slightly increased, throughput the same or increased.

This all suggests that POV-Ray "suffers" from the changes in Prescott.
I very much doubt (and the link above strenghtens this doubt) that the
increased L2 cache size can make up for slower instructions.

Of course, this will all become moot once Prescott reaches 4 GHz and
above...

---
FE (who promises to try not to drift so off-topic in the future)

Post a reply to this message