POV-Ray: Newsgroups: povray.general: Hyperthreading benchmark

POV-Ray : Newsgroups : povray.general : Hyperthreading benchmark		Server Time 4 Jul 2025 08:25:21 EDT (-0400)

Goto Latest 10 Messages

Next 8 Messages >>>

From: Chris Jeppesen
Subject: Hyperthreading benchmark
Date: 22 Jul 2004 17:12:29
Message: <41002dbd$1@news.povray.org>

Given: One Pentium 4 with hyperthreading (2 virtual processors)
Find:  Does running two instances of POV-Ray make sense?

Experimental setup:

1 Dell Dimension Precision 360
    1 Pentium 4 3.2GHz processor (2 virtual processors)
    1.00GB main memory
Windows XP Service Pack 1
Cygwin execution environment
Megapov 1.01 (Megapov 1.0 with a couple of my own patches added)
   compiled with GCC 3.3.1-cygwin
POV-Ray 3.5 standard benchmark.pov version 1.02,
   and standard benchmark ini file

Procedure:

1. Run the benchmark on a single instance of pov.
A single instance never uses more than 50% cpu time, according to task 
manager
2. Run the benchmark simultaneously on two instances of pov.
The two instances are run in two seperate xterm instances, in the same 
directory, with the same command line, manually started less than 1 
second apart. Neither instance uses more than 50%, and the combined 
total is usually 99% or 100%.
3. Note the real time of completion for all runs. POV-Ray reports this 
in its statistics page as total time. This is distinct from the CPU time 
used as reported by task manager.

Data:

Single instance: Complete in       37min12sec
Double instance: First complete    58min50sec
                  Second complete   59min17sec

Analysis and discussion:

This is a custom build of megapov, which itself is a distinct version of 
POV-ray. The purpose of this experiment is not to compare this version 
with any other benchmark. The only variable in this experiment is number 
of simultaneously running instances.

To compare the runs most easily, compute pixels per second. On the 
single instance, divide the number of pixels rendered by the number of 
seconds in the render

384*384=   147456 pixels
37:12=       2232 seconds

147456/2232=66.06pixels/second.

On the double instance, the number of pixels is twice as much, since two 
images were rendered, and the render time is the longer of the two runs, 
since they were started simultaneously and what matters to me is the 
amount of real time between start and end of render.

384*384*2= 294912 pixels
59:17=       3557 seconds

294912/3557=82.91pixels/second

Results:

Running two instances results in an improvement of (82.91/66.06)-1=25.5% 
speed improvement. Two identical independent computers can reasonably 
expect a 100% improvement.

Conclusion:

Hyperthreading does result in an improvement of rendering speed, but 
only 25%. This is small, but speed is speed, and any improvement is 
good. It does make sense two run two instances, but do not expect twice 
the rendering speed.

Post a reply to this message

From: Nicolas Calimet
Subject: Re: Hyperthreading benchmark
Date: 22 Jul 2004 17:18:41
Message: <41002f31$1@news.povray.org>

> Hyperthreading does result in an improvement of rendering speed, but 
> only 25%. This is small, but speed is speed, and any improvement is 
> good. It does make sense two run two instances, but do not expect twice 
> the rendering speed.

	Since there is a single physical CPU, I'd say the 25% speedup
is actually quite good.
	Now the question is: why does a single instance run slower
in the first place ?  Or, IOW: where do the two instances really gain
speed (file i/o operations or elsewhere) ?

	- NC

Post a reply to this message

From: Ger
Subject: Re: Hyperthreading benchmark
Date: 22 Jul 2004 18:32:33
Message: <41004081@news.povray.org>

On Thursday 22 July 2004 23:12, Chris Jeppesen wrote :

> Given: One Pentium 4 with hyperthreading (2 virtual processors)
> Find:  Does running two instances of POV-Ray make sense?
> 
> Experimental setup:
> 
> 1 Dell Dimension Precision 360
>     1 Pentium 4 3.2GHz processor (2 virtual processors)
>     1.00GB main memory
> Windows XP Service Pack 1
> Cygwin execution environment
> Megapov 1.01 (Megapov 1.0 with a couple of my own patches added)
>    compiled with GCC 3.3.1-cygwin
> POV-Ray 3.5 standard benchmark.pov version 1.02,
>    and standard benchmark ini file
> 
> Procedure:
> 
> 1. Run the benchmark on a single instance of pov.
> A single instance never uses more than 50% cpu time, according to task
> manager
> 2. Run the benchmark simultaneously on two instances of pov.
> The two instances are run in two seperate xterm instances, in the same
> directory, with the same command line, manually started less than 1
> second apart. Neither instance uses more than 50%, and the combined
> total is usually 99% or 100%.
> 3. Note the real time of completion for all runs. POV-Ray reports this
> in its statistics page as total time. This is distinct from the CPU time
> used as reported by task manager.
> 
> Data:
> 
> Single instance: Complete in       37min12sec
> Double instance: First complete    58min50sec
>                   Second complete   59min17sec
> 
> Analysis and discussion:
> 
> This is a custom build of megapov, which itself is a distinct version of
> POV-ray. The purpose of this experiment is not to compare this version
> with any other benchmark. The only variable in this experiment is number
> of simultaneously running instances.
> 
> To compare the runs most easily, compute pixels per second. On the
> single instance, divide the number of pixels rendered by the number of
> seconds in the render
> 
> 384*384=   147456 pixels
> 37:12=       2232 seconds
> 
> 147456/2232=66.06pixels/second.
> 
> On the double instance, the number of pixels is twice as much, since two
> images were rendered, and the render time is the longer of the two runs,
> since they were started simultaneously and what matters to me is the
> amount of real time between start and end of render.
> 
> 384*384*2= 294912 pixels
> 59:17=       3557 seconds

Add to this the second time
58:50 = 3530
makes a total of 7087 seconds
294912 / 7087 = 41.6 pixels per second
which makes 66.06 / 41.6 = 1.59 speed factor

So a single instance is 1.59 times faster.

And the reason for this is that all non-Povray stuff is done by the other
processor so Povray has one processor all to itself.

> 
> 294912/3557=82.91pixels/second
> 
> Results:
> 
> Running two instances results in an improvement of (82.91/66.06)-1=25.5%
> speed improvement. Two identical independent computers can reasonably
> expect a 100% improvement.
> 
> Conclusion:
> 
> Hyperthreading does result in an improvement of rendering speed, but
> only 25%. This is small, but speed is speed, and any improvement is
> good. It does make sense two run two instances, but do not expect twice
> the rendering speed.

Your conclusion is a little of because you never considered the total time
needed to do the two simultaneous renders.
I have run simular tests on real dual proc computers and the show the same
results (with not such big differences)
-- 
Ger

Post a reply to this message

From: Chris Jeppesen
Subject: Re: Hyperthreading benchmark
Date: 23 Jul 2004 04:18:54
Message: <4100c9ee@news.povray.org>

Ger wrote:

> Add to this the second time
> 58:50 = 3530
> makes a total of 7087 seconds
> 294912 / 7087 = 41.6 pixels per second
> which makes 66.06 / 41.6 = 1.59 speed factor
> 
> So a single instance is 1.59 times faster.
> 
> And the reason for this is that all non-Povray stuff is done by the other
> processor so Povray has one processor all to itself.
> 
I stand by my original conclusion. What matters to me is real live wall 
clock time from beginning of render to end. All 58 minutes of one render 
were within the 59 minutes of the other.

The single test starts (for instance) at 1:00:00 and finishes at 
1:37:12. The double test starts both instances simultaneously at 
2:00:00, and ends at 2:59:17 (when the second render finishes). It takes 
more time, but not twice as much time. When it is done, it has rendered 
twice as many pixels.

I like to do animations, so how I would generalize this is that the 
first render is running all the odd frames, and the second is running 
all the even frames. While it is true that it will take longer to run 
the odd frames simultaneous with the even frames than without, the total 
time for all the frames is less when simultaneous than when not.

Post a reply to this message

From: Chris Jeppesen
Subject: Re: Hyperthreading benchmark
Date: 23 Jul 2004 04:43:35
Message: <4100cfb7@news.povray.org>

Nicolas Calimet wrote:

>     Since there is a single physical CPU, I'd say the 25% speedup
> is actually quite good.
>     Now the question is: why does a single instance run slower
> in the first place ?  Or, IOW: where do the two instances really gain
> speed (file i/o operations or elsewhere) ?
> 
>     - NC

The benchmark renders directly into the bitbucket, so after the SDL is 
parsed and files loaded, there is almost no I/O at all. The system has 
1GB of physical memory and the entire OS and all processes in it never 
used more than 300MB of it during these tests, so it should not be 
swapping. I think it really is hyperthreading helping out at this point.

The theory behind hyperthreading is that decoding instructions is the 
current bottleneck in a chip. The decoder and execution units are in a 
pipeline, but there is only one decoder and it can only keep some of the 
execution units busy. For instance, if the program is running a series 
of floating point ops, the integer units are idle.

Hyperthreading addresses this by adding another decoder, visible to the 
OS and programs as another set of registers and instruction pointer. It 
really does look like another processor. If one decoder is running a 
series of floating point instructions, another program running on the 
other decoder could run a bunch of integer instructions, keep the 
integer units busy, use the chip more efficiently, and crank out twice 
as much work in the same amount of time.

This is of course theoretical, since real programs such as Pov use both 
kinds of instructions, and keep them fairly evenly mixed. In Pov, all 
the ray calculations are done in floating point, but all the address 
calculations, array indexing, pointer arithmetic, tree traversal, etc, 
use the integer units. If two instances of Pov happened to be running 
opposite kinds of instructions, they would run very efficiently. If they 
happened to both be crunching integers at the same time, one of the 
decoders would not be able to use the integer units since the other 
decoder was. It would have to pause until the other was finished. In 
this case the decoders run almost in series, and there is little to no 
gain in efficiency.

My test suggests that over a long render, Pov is somewhere in between, 
but closer to the second case.

It would be interesting to run this same test on a single-processor 
non-hyperthreaded system. I suspect that if it really is hyperthreading, 
and not working through I/O waits, that causes the speed increase, then 
running two instances simultaneously would take more than twice as long 
as a single instance.

Post a reply to this message

From: Warp
Subject: Re: Hyperthreading benchmark
Date: 23 Jul 2004 08:12:57
Message: <410100c9@news.povray.org>

I agree with Chris. There's a flaw in your thinking.

-- 
#macro M(A,N,D,L)plane{-z,-9pigment{mandel L*9translate N color_map{[0rgb x]
[1rgb 9]}scale<D,D*3D>*1e3}rotate y*A*8}#end M(-3<1.206434.28623>70,7)M(
-1<.7438.1795>1,20)M(1<.77595.13699>30,20)M(3<.75923.07145>80,99)// - Warp -

Post a reply to this message

From: Warp
Subject: Re: Hyperthreading benchmark
Date: 23 Jul 2004 08:14:24
Message: <41010120@news.povray.org>

Chris Jeppesen <pov### [at] kwansystemsorg> wrote:
> Hyperthreading addresses this by adding another decoder, visible to the 
> OS and programs as another set of registers and instruction pointer. It 
> really does look like another processor. If one decoder is running a 
> series of floating point instructions, another program running on the 
> other decoder could run a bunch of integer instructions, keep the 
> integer units busy, use the chip more efficiently, and crank out twice 
> as much work in the same amount of time.

  So in theory if a program needs to perform floating point calculations
and (independent) integer calculations, it should run those calculations
in separate threads for maximum speed advantage in a P4?

-- 
plane{-x+y,-1pigment{bozo color_map{[0rgb x][1rgb x+y]}turbulence 1}}
sphere{0,2pigment{rgbt 1}interior{media{emission 1density{spherical
density_map{[0rgb 0][.5rgb<1,.5>][1rgb 1]}turbulence.9}}}scale
<1,1,3>hollow}text{ttf"timrom""Warp".1,0translate<-1,-.1,2>}//  - Warp -

Post a reply to this message

From: Ger
Subject: Re: Hyperthreading benchmark
Date: 23 Jul 2004 08:39:15
Message: <410106f3@news.povray.org>

That would be?
-- 
Ger

Post a reply to this message

From: Warp
Subject: Re: Hyperthreading benchmark
Date: 23 Jul 2004 08:49:23
Message: <41010953@news.povray.org>

Ger <ger### [at] hotmailcom> wrote:
> That would be?

  You are basically saying that rendering two benchmarks using two
threads should be faster than rendering one benchmark using one thread.

  This is, of course, just flawed thinking. There's no way of getting
two benchmarks rendered faster even when using two distinct processors
than one benchmark in one processor.

  What he is saying is that rendering two benchmars at the same time
gets rendered 25% faster than rendering first one benchmark and then
the other. The end result of both test is identical, but the first
test produced it 25% faster.

-- 
plane{-x+y,-1pigment{bozo color_map{[0rgb x][1rgb x+y]}turbulence 1}}
sphere{0,2pigment{rgbt 1}interior{media{emission 1density{spherical
density_map{[0rgb 0][.5rgb<1,.5>][1rgb 1]}turbulence.9}}}scale
<1,1,3>hollow}text{ttf"timrom""Warp".1,0translate<-1,-.1,2>}//  - Warp -

Post a reply to this message

From: Tom Austin
Subject: Re: Hyperthreading benchmark
Date: 23 Jul 2004 09:03:59
Message: <41010cbf$1@news.povray.org>

Chris Jeppesen wrote:
> Given: One Pentium 4 with hyperthreading (2 virtual processors)
> Find:  Does running two instances of POV-Ray make sense?

Your testing is very interesting.

How about running the exact same tests with hyperthreading turned OFF?

So you'll have 4 results on the exact same hardware
	single - with hyperthreading
	double - with hyperthreading
	single - no hyperthreading
	double - no hyperthreading

It would be interesting to see the difference between the hyperthreading 
and non-hyperthreading results.

This might help show what you have found through your testing.

Tom

Post a reply to this message

Goto Latest 10 Messages

Next 8 Messages >>>