POV-Ray: Newsgroups: povray.windows: WinXP & intel P4 only 50% used: Re: WinXP & intel P4 only 50% used

POV-Ray : Newsgroups : povray.windows : WinXP & intel P4 only 50% used : Re: WinXP & intel P4 only 50% used		Server Time 12 Jul 2025 16:27:58 EDT (-0400)

From: Mienai
Date: 2 Feb 2005 01:55:01
Message: <web.420076a2d47b0c5bb8e850f60@news.povray.org>

Warp <war### [at] tagpovrayorg> wrote:
> Mienai <Mienai> wrote:
> > Actual benchmarks have shown that if you run 2 instances of POVRay on a
> > hyperthreading machine (one instance on each logical processor, each
> > rendering half your image) you will have the completed image in around half
> > the time (assuming the two halfs take about the same time to render).
>
>   I find that hard to believe. Either that is not true, or the claim
> that two processes can't use the FPU at the same time is not true.
>
>   If I'm not mistaken, one POV-Ray thread could perform integer math
> at the same time the other POV-Ray thread is performing FPU math. But
> when the first one needs the FPU it has to wait for the second. Since
> POV-Ray uses the FPU quite heavily, I find it quite hard to believe
> that running it in two threads would drop the rendering time to half
> (unless the P4 *really* can run two FPU threads at the same time).
>   I am ready to believe that the total rendering time drops by
> some percentage (because POV-Ray naturally performs other operations
> than just FPU opcodes, naturally), but I would be surprised if this
> percentage would be anything close to 50%.
>   If it really is close to 50%, then someone has to explain me how
> is that possible.
>
> --
> #macro M(A,N,D,L)plane{-z,-9pigment{mandel L*9translate N color_map{[0rgb x]
> [1rgb 9]}scale<D,D*3D>*1e3}rotate y*A*8}#end M(-3<1.206434.28623>70,7)M(
> -1<.7438.1795>1,20)M(1<.77595.13699>30,20)M(3<.75923.07145>80,99)// - Warp -

So I ran the POVRay benchmark today on my P4 system and here's the results:
 single thread running entire benchmark:
  average 71 PPS in 0d 00h 34m 35s

 two simulataneous threads, each running half benchmark (verticle split):
  thread 1: average 52 PPS in 0d 00h 09m 45s
  thread 2: average 48 PPS in 0d 00h 10m 49s

 two simulataneous threads, each running half benchmark (verticle split),
 photons precalculated:
  thread 1: average 60 PPS in 0d 00h 08m 27s
  thread 2: average 54 PPS in 0d 00h 09m 30s

 single thread, hyperthreading disabled:
  average 74 PPS in 0d 00h 33m 05s

So looking at those results it would appear that I was wrong, that it's
actually closer to a third the time.  Speaking from experience though I
generally find that it's closer to 50% on most the larger files (a 16hr
render taking closer to 8hr than 5).  I took a graduate class on super
scalar architecture last year but we didn't talk about FPU's specifically
much but if I had to guess it has to do with the way it's pipelined, plus
if I remember right the FPU is opperated twice as fast as the CPU core (it
can do a calculation every half tic).  I don't know how much you know about
the subject but pipelining increases efficiency, it sucks for doing small
opperations but when your doing a whole series of operations it kicks ass.
You can start the next before the first is complete.  I hope that answers
your questions, if you have more feel free to ask.

If you do something like this I highly recommend precalculating photons and
loading the map file so you don't have to calculate it for each thread you
run (decreases overhead)

Post a reply to this message