POV-Ray: Newsgroups: povray.general: Hyperthreading benchmark: Re: Hyperthreading benchmark

POV-Ray : Newsgroups : povray.general : Hyperthreading benchmark : Re: Hyperthreading benchmark		Server Time 15 Dec 2025 17:52:35 EST (-0500)

From: Chris Jeppesen
Date: 23 Jul 2004 04:43:35
Message: <4100cfb7@news.povray.org>

Nicolas Calimet wrote:

>     Since there is a single physical CPU, I'd say the 25% speedup
> is actually quite good.
>     Now the question is: why does a single instance run slower
> in the first place ?  Or, IOW: where do the two instances really gain
> speed (file i/o operations or elsewhere) ?
> 
>     - NC

The benchmark renders directly into the bitbucket, so after the SDL is 
parsed and files loaded, there is almost no I/O at all. The system has 
1GB of physical memory and the entire OS and all processes in it never 
used more than 300MB of it during these tests, so it should not be 
swapping. I think it really is hyperthreading helping out at this point.

The theory behind hyperthreading is that decoding instructions is the 
current bottleneck in a chip. The decoder and execution units are in a 
pipeline, but there is only one decoder and it can only keep some of the 
execution units busy. For instance, if the program is running a series 
of floating point ops, the integer units are idle.

Hyperthreading addresses this by adding another decoder, visible to the 
OS and programs as another set of registers and instruction pointer. It 
really does look like another processor. If one decoder is running a 
series of floating point instructions, another program running on the 
other decoder could run a bunch of integer instructions, keep the 
integer units busy, use the chip more efficiently, and crank out twice 
as much work in the same amount of time.

This is of course theoretical, since real programs such as Pov use both 
kinds of instructions, and keep them fairly evenly mixed. In Pov, all 
the ray calculations are done in floating point, but all the address 
calculations, array indexing, pointer arithmetic, tree traversal, etc, 
use the integer units. If two instances of Pov happened to be running 
opposite kinds of instructions, they would run very efficiently. If they 
happened to both be crunching integers at the same time, one of the 
decoders would not be able to use the integer units since the other 
decoder was. It would have to pause until the other was finished. In 
this case the decoders run almost in series, and there is little to no 
gain in efficiency.

My test suggests that over a long render, Pov is somewhere in between, 
but closer to the second case.

It would be interesting to run this same test on a single-processor 
non-hyperthreaded system. I suspect that if it really is hyperthreading, 
and not working through I/O waits, that causes the speed increase, then 
running two instances simultaneously would take more than twice as long 
as a single instance.

Post a reply to this message