POV-Ray: Newsgroups: povray.beta-test: 3.7 SMP documentation: Re: 3.7 SMP documentation

POV-Ray : Newsgroups : povray.beta-test : 3.7 SMP documentation : Re: 3.7 SMP documentation		Server Time 13 Mar 2026 01:29:48 EDT (-0400)

From: clipka
Date: 9 Feb 2009 16:25:00
Message: <web.49909ef24bf053094e63d9990@news.povray.org>

"Electrowolf" <nomail@nomail> wrote:
> I am currently doing some research on multi-cpu/core systems since they are
> becoming more and more a common thing in the modern world. In order to get some
> "real life" results I'm toying around with the benchmarks option from 3.7 beta
> 30. One of the major improvements of 3.7 is the SMP support. Unfortunately I
> can't find any documentation about how this is being realized in 3.7. Is there
> simply no documentation for this or have I been looking in the wrong places?

The POV 3.7 approach at SMP is quite simple:

- Parsing is still done single-threaded for simplicity. The result is some
static, shared data (object tree, bounding box hierarchy etc).

- A number of worker threads is started (typically = number of CPU's), each of
which knows about the static shared data and has his own set of dynamic data
(e.g. caches for speedup)

- Each worker thread "pulls" a 32x32 tile of the image, renders it, "pushes" its
results, then pulls the next tile to render. When there are no more tiles to
render, the thread terminates.

- When all worker threads have terminated, statistical data is consolidated.

I'm not into details on photon shooting, but it seems to me that each thread
will simply do a separate photon pass, just shooting less photons.

So in general, interaction between threads in POV 3.7 is very limited.

Radiosity is an exception here, because the above approach would cause lots of
duplicate radiosity data to be gathered. Therefore, the radiosity data cache is
shared between all worker threads, and mutexes are used to avoid race conditions
that could mess up the cache.

Other race conditions could cause multiple threads to gather radiosity data for
almost the same spot, but as these are a mere waste of processing power and
don't lead to any inconsistencies, they are accepted, as any counter-measures
would probably not be worth the pain.

Another issue identified with the use of a common radiosity cache is the
reproducibility of image output: As the order of pixels traced varies with task
scheduling, radiosity data may be gathered for different spots and re-used for
others. Future versions of POV-Ray may provide a mode in which more waste of
processing power is accepted to achieve higher reproducibility in image output
(although it seems difficult to achieve full 100% reproducibility). I described
the approach and the first experimental results a few days earlier in this
newsgroup.

> Another minor question: Is there any way to force the benchmark to work on
> a set amount of threads? Currently you can run it on one or on all cpu's, but I
> would also like to test it a set amount of threads.

For example, +WT3 gives you three worker threads.

Note that the beta.30-rad1 gives misleading thread statistics for radiosity
pretrace, as it starts a new set of worker threads for each pretrace step. Like
with the main render, +WT3 will give you three worker threads running
simultaneously at any time during pretrace, but the statistics will report e.g.
12 threads if you had four pretrace steps.

Post a reply to this message