POV-Ray : Newsgroups : povray.beta-test : Radiosity performance: thread count anomaly : Re: Radiosity performance: thread count anomaly Server Time
28 Jul 2024 12:35:51 EDT (-0400)
  Re: Radiosity performance: thread count anomaly  
From: clipka
Date: 19 Jan 2009 20:30:00
Message: <web.497528612965dbd09b482c50@news.povray.org>
Warp <war### [at] tagpovrayorg> wrote:
>   Actually shared/non-shared caches can have a big effect and make a
> notable difference between multiprocessor and multicore systems.

Sure, there is a performance impact related to caching; however, comparing it to
the ideal "N cores = N-fold performance" situation, shared cache is a
non-hindrance at best. Just like non-shared cache is. Seen this way, neither
gives a performance *benefit* - they both add overhead, which varies with how
it is used.


>   With POV-Ray I must assume that it benefits from a shared cache, or
> at worst it is not hindered by it. (Given that most data POV-Ray 3.7
> uses is read-only, it wouldn't make too much of a difference if each
> core had its own independent cache.)

If we're talking about either N*X MB for all threads or X MB for N threads, then
I guess you're right in that shared N*X MB are of benefit for POV, due to more
stuff fitting into it. However, when talking about X MB for all threads vs. X
MB for N threads each, then the separate caches are probably of benefit,
because each thread does have its local data structures - stack, buffers for
optimization, and so on - that would reduce the space available for common data
in a shared cache.


> > Look again at the figures above:
>
> > 1 core  -> 293 seconds
> > 4 cores ->  54 seconds
>
> > Either my math is rusty, or this is a speed gain by more than the number of
> > cores...
>
>   How many times was the test run? Was there lot of variation?

Variation between different scenes - yes, lots of. Some rendered almost
identical (talking about CPU time) regardless of number of CPUs.

Variations in the render times itself - not significantly. Something like a
swing of 5%, maybe 10%.


>   It would be interesting it the test was made with something which takes
> significantly longer to render (eg. 15 minutes with 1 core or so.).

3 hours 47 minutes enough for your taste?

Compare the stats for rad_def_test.pov using the "IndoorHQ" settings:

****************************************************************************
4 cores:

Render Statistics
Image Resolution 800 x 600
----------------------------------------------------------------------------
Pixels:           550205   Samples:           71514   Smpls/Pxl: 0.13
Rays:           25547811   Saved:                 0   Max Level: 800/600
----------------------------------------------------------------------------
Ray->Shape Intersection          Tests       Succeeded  Percentage
----------------------------------------------------------------------------
Box                           12875803         9499052     73.77
Cone/Cylinder                 13638055         2543768     18.65
CSG Intersection               4454973         3421296     76.80
CSG Union                      4454973         4034232     90.56
Plane                         25547811         9317330     36.47
Sphere                        26254590        25970944     98.92
Torus                          4542688         4039987     88.93
Torus Bound                    4542688         4265423     93.90
Bounding Box                 413047062        60880325     14.74
----------------------------------------------------------------------------
Roots tested:               4265423   eliminated:              3179024
----------------------------------------------------------------------------
Radiosity samples calculated:            86116 (0.63 %)
Radiosity samples reused:             13643598
----------------------------------------------------------------------------
Radiosity (final) calculated:            44237 (0.48 %)
Radiosity (final) reused:              9152963
----------------------------------------------------------------------------
  Pass     Depth 0    Depth 1    Depth 2           Total
----------------------------------------------------------------------------
  1            130       3440       2882            6452
  2            475       3815        408            4698
  3           1900       4762        247            6909
  4           6386       4451        149           10986
  5+          9611       2894        329           12834
  Final      35129        484       8624           44237
----------------------------------------------------------------------------
  Total      53631      19846      12639           86116
----------------------------------------------------------------------------
----------------------------------------------------------------------------
Render Time:
  Photon Time:      No photons
  Radiosity Time:   0 hours  4 minutes 24 seconds (264.683 seconds)
              using 20 thread(s) with 1577.354 CPU-seconds total
  Trace Time:       0 hours 36 minutes 40 seconds (2200.203 seconds)
              using 4 thread(s) with 7994.706 CPU-seconds total
POV-Ray finished

real 2595.37
user 9559.26
sys 7.86

****************************************************************************
1 core:

Render Statistics
Image Resolution 800 x 600
----------------------------------------------------------------------------
Pixels:           550205   Samples:           70785   Smpls/Pxl: 0.13
Rays:           25425517   Saved:                 0   Max Level: 800/600
----------------------------------------------------------------------------
Ray->Shape Intersection          Tests       Succeeded  Percentage
----------------------------------------------------------------------------
Box                           12838763         9455865     73.65
Cone/Cylinder                 13605760         2543618     18.70
CSG Intersection               4434377         3401653     76.71
CSG Union                      4434377         4014406     90.53
Plane                         25425517         9258652     36.41
Sphere                        26132908        25858301     98.95
Torus                          4495444         4001169     89.00
Torus Bound                    4495444         4224994     93.98
Bounding Box                 411171127        60673334     14.76
----------------------------------------------------------------------------
Roots tested:               4224994   eliminated:              3145389
----------------------------------------------------------------------------
Radiosity samples calculated:            86020 (0.63 %)
Radiosity samples reused:             13542466
----------------------------------------------------------------------------
Radiosity (final) calculated:            43905 (0.48 %)
Radiosity (final) reused:              9055291
----------------------------------------------------------------------------
  Pass     Depth 0    Depth 1    Depth 2           Total
----------------------------------------------------------------------------
  1            130       3398       2844            6372
  2            475       3775        387            4637
  3           1900       4829        290            7019
  4           6372       4463        472           11307
  5+          9590       2896        294           12780
  Final      34818        490       8597           43905
----------------------------------------------------------------------------
  Total      53285      19851      12884           86020
----------------------------------------------------------------------------
----------------------------------------------------------------------------
Render Time:
  Photon Time:      No photons
  Radiosity Time:   0 hours 39 minutes 13 seconds (2353.549 seconds)
              using 5 thread(s) with 3330.858 CPU-seconds total
  Trace Time:       3 hours 47 minutes 46 seconds (13666.809 seconds)
              using 1 thread(s) with 13666.880 CPU-seconds total
POV-Ray finished

real 16998.36
user 16997.89
sys 0.43

****************************************************************************

Factor >6 here, instead of the expected 4.

I have to note however that in this case, the results cannot be compared 100%:
The multi-core render was run with the fix for the mapped-and-transformed
texture issue, which turned out to have some impact on runtime, while the
single-core render was run before applying the fix, and I didn't bother to
re-run it yet. It doesn't change the general tendency though.


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.