POV-Ray: Newsgroups: povray.beta-test: Radiosity performance: thread count anomaly: Radiosity performance: thread count anomaly

POV-Ray : Newsgroups : povray.beta-test : Radiosity performance: thread count anomaly : Radiosity performance: thread count anomaly		Server Time 13 Jul 2025 02:40:37 EDT (-0400)

From: clipka
Date: 19 Jan 2009 07:50:00
Message: <web.497475dd467b0290b2c85f720@news.povray.org>

Something *REALLY* weird is going on with the beta.30-rad1. Look at these
performance measurements (done with Linux "time" command - I find the builtin
timing stats a bit off). The scene was the "rad-def-test.pov" sample scene,
modified to use the "2Bounce" settings from "rad_def.inc".

Timings are in seconds; "real" is wall clock time; "user" and "sys" is CPU time
spent in user and kernel mode, respectively; version used was actually not
exactly beta.30-rad1 but a slightly modified one, but I guess we see the same
effect:


beta.29 on 4 cores (just for reference):

real 7.00
user 25.71
sys 0.03


beta.30-rad1 on 4 cores:

real 53.99
user 195.49
sys 0.11


beta.30-rad1 throttled to use 1 core only:

real 292.78
user 292.48
sys 0.03


Uh - so running on 4 cores, the beta.30-rad1 is not just faster, but actually
MORE EFFICIENT than running on a single core...?!? I guess that would qualify
for a nobel prize in informatics (if there was such a thing)...

I actually made the effort of cross-checking with a classic (analog) wristwatch
to make sure the "time" command isn't broken, but I got similar wall-clock
values, and CPU time look very much plausible.


Some more values:

+WT real   user    sys
1   292.78 292.48  0.03
2   93.00  184.70  0.21
3   82.16  239.41  0.08
4   53.99  195.49  0.11
5   57.66  209.04  0.11
6   54.81  200.58  0.17

Interestingly enough, there seems to be no clear correlation between number of
threads and efficiency - but for some reasons particularly inefficient
operation seems to correlate with particularly few time spent in kernel mode.

Initially I observed the effect while I had four separate 1-threaded instances
of POV working on different scenes in parallel. However, the effect shows
independent of total CPU workload.


Something's utterly wrong; I even thought whether I might have messed up
parameter order on some function call, causing the thread count (or thread
number) to drive some quality parameter... but then again the stats - number of
rays shot, samples gathered and what-have-you - stay pretty much the same
(except for the execution time of course), varying by less than 1% margin. Only
the distribution of radiosity samples per pretrace step vary by anything more,
but even there we have less than 10% deviation.


Someone with *any* idea what might go wrong here? (Even the weirdest ideas
welcome, as they might happen to trigger some inspiration.)

Post a reply to this message