|
![](/i/fill.gif) |
Warp <war### [at] tag povray org> wrote:
> Actually shared/non-shared caches can have a big effect and make a
> notable difference between multiprocessor and multicore systems.
Sure, there is a performance impact related to caching; however, comparing it to
the ideal "N cores = N-fold performance" situation, shared cache is a
non-hindrance at best. Just like non-shared cache is. Seen this way, neither
gives a performance *benefit* - they both add overhead, which varies with how
it is used.
> With POV-Ray I must assume that it benefits from a shared cache, or
> at worst it is not hindered by it. (Given that most data POV-Ray 3.7
> uses is read-only, it wouldn't make too much of a difference if each
> core had its own independent cache.)
If we're talking about either N*X MB for all threads or X MB for N threads, then
I guess you're right in that shared N*X MB are of benefit for POV, due to more
stuff fitting into it. However, when talking about X MB for all threads vs. X
MB for N threads each, then the separate caches are probably of benefit,
because each thread does have its local data structures - stack, buffers for
optimization, and so on - that would reduce the space available for common data
in a shared cache.
> > Look again at the figures above:
>
> > 1 core -> 293 seconds
> > 4 cores -> 54 seconds
>
> > Either my math is rusty, or this is a speed gain by more than the number of
> > cores...
>
> How many times was the test run? Was there lot of variation?
Variation between different scenes - yes, lots of. Some rendered almost
identical (talking about CPU time) regardless of number of CPUs.
Variations in the render times itself - not significantly. Something like a
swing of 5%, maybe 10%.
> It would be interesting it the test was made with something which takes
> significantly longer to render (eg. 15 minutes with 1 core or so.).
3 hours 47 minutes enough for your taste?
Compare the stats for rad_def_test.pov using the "IndoorHQ" settings:
****************************************************************************
4 cores:
Render Statistics
Image Resolution 800 x 600
----------------------------------------------------------------------------
Pixels: 550205 Samples: 71514 Smpls/Pxl: 0.13
Rays: 25547811 Saved: 0 Max Level: 800/600
----------------------------------------------------------------------------
Ray->Shape Intersection Tests Succeeded Percentage
----------------------------------------------------------------------------
Box 12875803 9499052 73.77
Cone/Cylinder 13638055 2543768 18.65
CSG Intersection 4454973 3421296 76.80
CSG Union 4454973 4034232 90.56
Plane 25547811 9317330 36.47
Sphere 26254590 25970944 98.92
Torus 4542688 4039987 88.93
Torus Bound 4542688 4265423 93.90
Bounding Box 413047062 60880325 14.74
----------------------------------------------------------------------------
Roots tested: 4265423 eliminated: 3179024
----------------------------------------------------------------------------
Radiosity samples calculated: 86116 (0.63 %)
Radiosity samples reused: 13643598
----------------------------------------------------------------------------
Radiosity (final) calculated: 44237 (0.48 %)
Radiosity (final) reused: 9152963
----------------------------------------------------------------------------
Pass Depth 0 Depth 1 Depth 2 Total
----------------------------------------------------------------------------
1 130 3440 2882 6452
2 475 3815 408 4698
3 1900 4762 247 6909
4 6386 4451 149 10986
5+ 9611 2894 329 12834
Final 35129 484 8624 44237
----------------------------------------------------------------------------
Total 53631 19846 12639 86116
----------------------------------------------------------------------------
----------------------------------------------------------------------------
Render Time:
Photon Time: No photons
Radiosity Time: 0 hours 4 minutes 24 seconds (264.683 seconds)
using 20 thread(s) with 1577.354 CPU-seconds total
Trace Time: 0 hours 36 minutes 40 seconds (2200.203 seconds)
using 4 thread(s) with 7994.706 CPU-seconds total
POV-Ray finished
real 2595.37
user 9559.26
sys 7.86
****************************************************************************
1 core:
Render Statistics
Image Resolution 800 x 600
----------------------------------------------------------------------------
Pixels: 550205 Samples: 70785 Smpls/Pxl: 0.13
Rays: 25425517 Saved: 0 Max Level: 800/600
----------------------------------------------------------------------------
Ray->Shape Intersection Tests Succeeded Percentage
----------------------------------------------------------------------------
Box 12838763 9455865 73.65
Cone/Cylinder 13605760 2543618 18.70
CSG Intersection 4434377 3401653 76.71
CSG Union 4434377 4014406 90.53
Plane 25425517 9258652 36.41
Sphere 26132908 25858301 98.95
Torus 4495444 4001169 89.00
Torus Bound 4495444 4224994 93.98
Bounding Box 411171127 60673334 14.76
----------------------------------------------------------------------------
Roots tested: 4224994 eliminated: 3145389
----------------------------------------------------------------------------
Radiosity samples calculated: 86020 (0.63 %)
Radiosity samples reused: 13542466
----------------------------------------------------------------------------
Radiosity (final) calculated: 43905 (0.48 %)
Radiosity (final) reused: 9055291
----------------------------------------------------------------------------
Pass Depth 0 Depth 1 Depth 2 Total
----------------------------------------------------------------------------
1 130 3398 2844 6372
2 475 3775 387 4637
3 1900 4829 290 7019
4 6372 4463 472 11307
5+ 9590 2896 294 12780
Final 34818 490 8597 43905
----------------------------------------------------------------------------
Total 53285 19851 12884 86020
----------------------------------------------------------------------------
----------------------------------------------------------------------------
Render Time:
Photon Time: No photons
Radiosity Time: 0 hours 39 minutes 13 seconds (2353.549 seconds)
using 5 thread(s) with 3330.858 CPU-seconds total
Trace Time: 3 hours 47 minutes 46 seconds (13666.809 seconds)
using 1 thread(s) with 13666.880 CPU-seconds total
POV-Ray finished
real 16998.36
user 16997.89
sys 0.43
****************************************************************************
Factor >6 here, instead of the expected 4.
I have to note however that in this case, the results cannot be compared 100%:
The multi-core render was run with the fix for the mapped-and-transformed
texture issue, which turned out to have some impact on runtime, while the
single-core render was run before applying the fix, and I didn't bother to
re-run it yet. It doesn't change the general tendency though.
Post a reply to this message
|
![](/i/fill.gif) |