|
|
Nicolas Calimet wrote:
> Since there is a single physical CPU, I'd say the 25% speedup
> is actually quite good.
> Now the question is: why does a single instance run slower
> in the first place ? Or, IOW: where do the two instances really gain
> speed (file i/o operations or elsewhere) ?
>
> - NC
The benchmark renders directly into the bitbucket, so after the SDL is
parsed and files loaded, there is almost no I/O at all. The system has
1GB of physical memory and the entire OS and all processes in it never
used more than 300MB of it during these tests, so it should not be
swapping. I think it really is hyperthreading helping out at this point.
The theory behind hyperthreading is that decoding instructions is the
current bottleneck in a chip. The decoder and execution units are in a
pipeline, but there is only one decoder and it can only keep some of the
execution units busy. For instance, if the program is running a series
of floating point ops, the integer units are idle.
Hyperthreading addresses this by adding another decoder, visible to the
OS and programs as another set of registers and instruction pointer. It
really does look like another processor. If one decoder is running a
series of floating point instructions, another program running on the
other decoder could run a bunch of integer instructions, keep the
integer units busy, use the chip more efficiently, and crank out twice
as much work in the same amount of time.
This is of course theoretical, since real programs such as Pov use both
kinds of instructions, and keep them fairly evenly mixed. In Pov, all
the ray calculations are done in floating point, but all the address
calculations, array indexing, pointer arithmetic, tree traversal, etc,
use the integer units. If two instances of Pov happened to be running
opposite kinds of instructions, they would run very efficiently. If they
happened to both be crunching integers at the same time, one of the
decoders would not be able to use the integer units since the other
decoder was. It would have to pause until the other was finished. In
this case the decoders run almost in series, and there is little to no
gain in efficiency.
My test suggests that over a long render, Pov is somewhere in between,
but closer to the second case.
It would be interesting to run this same test on a single-processor
non-hyperthreaded system. I suspect that if it really is hyperthreading,
and not working through I/O waits, that causes the speed increase, then
running two instances simultaneously would take more than twice as long
as a single instance.
Post a reply to this message
|
|