|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
I am currently doing some research on multi-cpu/core systems since they are
becoming more and more a common thing in the modern world. In order to get some
"real life" results I'm toying around with the benchmarks option from 3.7 beta
30. One of the major improvements of 3.7 is the SMP support. Unfortunately I
can't find any documentation about how this is being realized in 3.7. Is there
simply no documentation for this or have I been looking in the wrong places?
Another minor question: Is there any way to force the benchmark to work on
a set amount of threads? Currently you can run it on one or on all cpu's, but I
would also like to test it a set amount of threads.
Thanks in advance,
Electrowolf
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Electrowolf <nomail@nomail> wrote:
> Another minor question: Is there any way to force the benchmark to work on
> a set amount of threads?
+wt or Work_Threads=
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
"Electrowolf" <nomail@nomail> wrote:
> I am currently doing some research on multi-cpu/core systems since they are
> becoming more and more a common thing in the modern world. In order to get some
> "real life" results I'm toying around with the benchmarks option from 3.7 beta
> 30. One of the major improvements of 3.7 is the SMP support. Unfortunately I
> can't find any documentation about how this is being realized in 3.7. Is there
> simply no documentation for this or have I been looking in the wrong places?
The POV 3.7 approach at SMP is quite simple:
- Parsing is still done single-threaded for simplicity. The result is some
static, shared data (object tree, bounding box hierarchy etc).
- A number of worker threads is started (typically = number of CPU's), each of
which knows about the static shared data and has his own set of dynamic data
(e.g. caches for speedup)
- Each worker thread "pulls" a 32x32 tile of the image, renders it, "pushes" its
results, then pulls the next tile to render. When there are no more tiles to
render, the thread terminates.
- When all worker threads have terminated, statistical data is consolidated.
I'm not into details on photon shooting, but it seems to me that each thread
will simply do a separate photon pass, just shooting less photons.
So in general, interaction between threads in POV 3.7 is very limited.
Radiosity is an exception here, because the above approach would cause lots of
duplicate radiosity data to be gathered. Therefore, the radiosity data cache is
shared between all worker threads, and mutexes are used to avoid race conditions
that could mess up the cache.
Other race conditions could cause multiple threads to gather radiosity data for
almost the same spot, but as these are a mere waste of processing power and
don't lead to any inconsistencies, they are accepted, as any counter-measures
would probably not be worth the pain.
Another issue identified with the use of a common radiosity cache is the
reproducibility of image output: As the order of pixels traced varies with task
scheduling, radiosity data may be gathered for different spots and re-used for
others. Future versions of POV-Ray may provide a mode in which more waste of
processing power is accepted to achieve higher reproducibility in image output
(although it seems difficult to achieve full 100% reproducibility). I described
the approach and the first experimental results a few days earlier in this
newsgroup.
> Another minor question: Is there any way to force the benchmark to work on
> a set amount of threads? Currently you can run it on one or on all cpu's, but I
> would also like to test it a set amount of threads.
For example, +WT3 gives you three worker threads.
Note that the beta.30-rad1 gives misleading thread statistics for radiosity
pretrace, as it starts a new set of worker threads for each pretrace step. Like
with the main render, +WT3 will give you three worker threads running
simultaneously at any time during pretrace, but the statistics will report e.g.
12 threads if you had four pretrace steps.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
clipka wrote:
> I'm not into details on photon shooting, but it seems to me that each
> thread will simply do a separate photon pass, just shooting less photons.
In the current implementation (afaik!), each thread shoots photons from a
different light source. If you have only one light, you won't take
advantage of multiple cores while photon-shooting.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
"clipka" wrote:
> [radiosity]
> ... (although it seems difficult to achieve full 100% reproducibility) ...
Possibly the only truly reprocucible (and thread-safe) way is to iteratively
process the whole scene, all threads working on the same iteration, then
waiting for all threads to stop before moving on to the next iteration. One
might need to back-propagate results (Iteration 1: depth 1; iteration 2: depth
2,1; iteration 3: depth 3,2,1; etc.)
I realise this largely prevents optimised and dynamic sampling, but it's safe.
I've not done any analysis on how efficient it would be. On the other hand,
maybe it's possible to add a little bit of dynamic radiosity data: when there's
an integrity clash, add the data, and then 'rewind' the older thread to use the
new data. Only works with small additions at a time, to avoid circular
triggering of the 'rewind'.
Hoping the above might provide some ideas, assuming I've not completely
misunderstood the problem.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
"MessyBlob" <nomail@nomail> wrote:
> Possibly the only truly reprocucible (and thread-safe) way is to iteratively
> process the whole scene ...
The essential point being that this way, you're not trying to arbitrate many
threads reading and writing the same structure: instead, you're logically
reading from one, and writing to another, which is thread-safe, and can be run
in parallel.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
"MessyBlob" <nomail@nomail> wrote:
> "clipka" wrote:
> > [radiosity]
> > ... (although it seems difficult to achieve full 100% reproducibility) ...
>
> Possibly the only truly reprocucible (and thread-safe) way is to iteratively
> process the whole scene, all threads working on the same iteration, then
> waiting for all threads to stop before moving on to the next iteration. One
> might need to back-propagate results (Iteration 1: depth 1; iteration 2: depth
> 2,1; iteration 3: depth 3,2,1; etc.)
>
> I realise this largely prevents optimised and dynamic sampling, but it's safe.
> I've not done any analysis on how efficient it would be. On the other hand,
> maybe it's possible to add a little bit of dynamic radiosity data: when there's
> an integrity clash, add the data, and then 'rewind' the older thread to use the
> new data. Only works with small additions at a time, to avoid circular
> triggering of the 'rewind'.
>
> Hoping the above might provide some ideas, assuming I've not completely
> misunderstood the problem.
The algorithm I devised *should* do the trick as well: It's 100% deterministic
regardless of thread count or scheduling "jitter". Theoretically. However, I
guess there are still a few variables not properly reset when a thread starts
working on the next SMP cell. Some caching mechanisms possibly. But those are
beyond the scope of radiosity.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|