|
![](/i/fill.gif) |
Something *REALLY* weird is going on with the beta.30-rad1. Look at these
performance measurements (done with Linux "time" command - I find the builtin
timing stats a bit off). The scene was the "rad-def-test.pov" sample scene,
modified to use the "2Bounce" settings from "rad_def.inc".
Timings are in seconds; "real" is wall clock time; "user" and "sys" is CPU time
spent in user and kernel mode, respectively; version used was actually not
exactly beta.30-rad1 but a slightly modified one, but I guess we see the same
effect:
beta.29 on 4 cores (just for reference):
real 7.00
user 25.71
sys 0.03
beta.30-rad1 on 4 cores:
real 53.99
user 195.49
sys 0.11
beta.30-rad1 throttled to use 1 core only:
real 292.78
user 292.48
sys 0.03
Uh - so running on 4 cores, the beta.30-rad1 is not just faster, but actually
MORE EFFICIENT than running on a single core...?!? I guess that would qualify
for a nobel prize in informatics (if there was such a thing)...
I actually made the effort of cross-checking with a classic (analog) wristwatch
to make sure the "time" command isn't broken, but I got similar wall-clock
values, and CPU time look very much plausible.
Some more values:
+WT real user sys
1 292.78 292.48 0.03
2 93.00 184.70 0.21
3 82.16 239.41 0.08
4 53.99 195.49 0.11
5 57.66 209.04 0.11
6 54.81 200.58 0.17
Interestingly enough, there seems to be no clear correlation between number of
threads and efficiency - but for some reasons particularly inefficient
operation seems to correlate with particularly few time spent in kernel mode.
Initially I observed the effect while I had four separate 1-threaded instances
of POV working on different scenes in parallel. However, the effect shows
independent of total CPU workload.
Something's utterly wrong; I even thought whether I might have messed up
parameter order on some function call, causing the thread count (or thread
number) to drive some quality parameter... but then again the stats - number of
rays shot, samples gathered and what-have-you - stay pretty much the same
(except for the execution time of course), varying by less than 1% margin. Only
the distribution of radiosity samples per pretrace step vary by anything more,
but even there we have less than 10% deviation.
Someone with *any* idea what might go wrong here? (Even the weirdest ideas
welcome, as they might happen to trigger some inspiration.)
Post a reply to this message
|
![](/i/fill.gif) |