POV-Ray : Newsgroups : povray.beta-test : Radiosity and SMP Server Time
9 Jan 2025 14:19:29 EST (-0500)
  Radiosity and SMP (Message 1 to 6 of 6)  
From: MessyBlob
Subject: Radiosity and SMP
Date: 6 Apr 2009 16:25:00
Message: <web.49da63c3b2c7b079addfbead0@news.povray.org>
Observations about Beta 32 on quad-core Q6600:

1. Radiosity pre-trace is much slower than Beta 31. Often only 25% to 30% total
processor time is being used by POV-Ray (spread over all four cores), as if
there's a race condition or integrity hold-up, such that only one thread can
work at a time. NB: I checked that nothing else is eating CPU; the system-idle
thread gets the rest.

2. Crackle pattern with radiosity renders about 15x slower than Beta 31 (guess -
I've not benchmarked it). Perhaps the cacheing mentioned in the notes needs some
tuning, or point 1 (above) is coming into play. This note raised alarm bells
with me ("cache per thread") - see my paragraph A (below) to understand why.

3. The (radiosity?) memory management errors of Beta 29 and 30 (not in 31) have
returned: Where a significant amount of memory is used, starting the render
again produces an 'internal error' (yellow message), and aborts gracefully.
Restarting POV (with significant delay on exit while memory is freed) solves
the provlem.

A. I have found the previous Betas (31 and earlier) achieve inconsistent
radiosity samples for each SMP block, such that hard lines are visible at the
block edges - a deal-breaker in most cases. This occurs on pattern-based
normals, and implicit functions with high gradient. Increasing the 'radiosity
end pretrace' reduces this problem a bit, but I can't get away from the fact
that each SMP block uses different lighting data. I'm testing this
(incidentally) on a scene at the moment, so will be able to report more soon.

I'll be able to report with more authority on the above with more testing, but
thought to post as-is now, in case it was of use, or already known.

-- JV.


Post a reply to this message

From: clipka
Subject: Re: Radiosity and SMP
Date: 6 Apr 2009 17:55:00
Message: <web.49da798f70f61477c28d169c0@news.povray.org>
"MessyBlob" <nomail@nomail> wrote:
> Observations about Beta 32 on quad-core Q6600:
>
> 1. Radiosity pre-trace is much slower than Beta 31. Often only 25% to 30% total
> processor time is being used by POV-Ray (spread over all four cores), as if
> there's a race condition or integrity hold-up, such that only one thread can
> work at a time. NB: I checked that nothing else is eating CPU; the system-idle
> thread gets the rest.

I find this hard to believe, as radiosity code has not changed from beta 31 to
32, except for some performance counters that are kept local to each thread.

Are you sure you're not comparing to an earlier version's performance?

(In case your scene should happen to contain only a single, small,
"radiosity-intensive" detail, such effects might occur when POV is waiting for
the last cell to complete at the end of each pretrace step.)

Furthermore, I have never seen such an effect in the radiosity code benchmarks I
run on a regular basis - although admittedly those run on Linux, not Windows, so
there *may* be a Windows specific issue; still, there's no reason to expect
different behavior between beta.31 and beta.32.

Are you sure this is not some other issue, which just happens to be "multiplied"
by radiosity? Is it just one particular scene, or a kind of invariant? Can you
boil it down to a "minimal" scene showing the slowdown?


> 2. Crackle pattern with radiosity renders about 15x slower than Beta 31 (guess -
> I've not benchmarked it). Perhaps the cacheing mentioned in the notes needs some
> tuning, or point 1 (above) is coming into play. This note raised alarm bells
> with me ("cache per thread") - see my paragraph A (below) to understand why.
>
> 3. The (radiosity?) memory management errors of Beta 29 and 30 (not in 31) have
> returned: Where a significant amount of memory is used, starting the render
> again produces an 'internal error' (yellow message), and aborts gracefully.
> Restarting POV (with significant delay on exit while memory is freed) solves
> the provlem.

Both of these are very unlikely to be radiosity-related, as no changes have been
made to radiosity except for those performance counters already mentioned. So
the best bet (for both, actually) would be problems with the crackle cache.


> A. I have found the previous Betas (31 and earlier) achieve inconsistent
> radiosity samples for each SMP block, such that hard lines are visible at the
> block edges - a deal-breaker in most cases. This occurs on pattern-based
> normals, and implicit functions with high gradient. Increasing the 'radiosity
> end pretrace' reduces this problem a bit, but I can't get away from the fact
> that each SMP block uses different lighting data. I'm testing this
> (incidentally) on a scene at the moment, so will be able to report more soon.

The effect you see is simply an inevitability if your pretrace is poorly tuned
(or you have "always_sample" set to "on", which unfortunately is the default);
every(!) sample taken during final render *will* produce an artifact; it's just
a matter of the pixels rendering order to determine how it manifests:
Horizontally "smeared" blotches are telltale signs of a
horizontal-line-oriented approach (like in POV 3.6, and within each SMP box in
POV 3.7), while prominent brightness differences between SMP blocks are -
obviously - the result of a block-oriented aproach.

The basic problem occurs when during the final render a pixel *needs* an
additional sample, while pixels in the vicinity *could* make use of that
additional sample but by themselves wouldn't call for it. Unfortunately, on
average half of these will have been rendered already by the time the new
sample is found to be inevitable and is therefore computed. The pixels not yet
rendered will pick up that sample, while those already rendered will not be
able to do so.

As you can see, there is no SMP involved so far. It's just the block-oriented
pixel rendering order you see, nothing more.

A good rule of thumb for pretrace tuning is that at least half the samples
should be gathered during pretrace already; unless you're using exotic
settings, this is usually sufficient to keep the artifacts below visibility.
Provided you switch "always_sample" off, that is.

Make sure to use "low_error_factor" and "nearest_count" - they're the chief
weapons to make sure pretrace gives a good sample coverage for the main render.


Post a reply to this message

From: MessyBlob
Subject: Re: Radiosity and SMP
Date: 6 Apr 2009 23:10:01
Message: <web.49dac10b70f61477addfbead0@news.povray.org>
Summary of below: my reported problems are nothing new to worry about in Beta
32. I'll report back if I find anything.

"clipka" <nomail@nomail> wrote:
> "MessyBlob" <nomail@nomail> wrote:
> I find this hard to believe, as radiosity code has not changed from
> beta 31 to 32, except for some performance counters that are kept local
> to each thread. Are you sure you're not comparing to an earlier
> version's performance?

OK - will check - my observation is based on having a render running quickly in
Beta 31, then a few minutes later running very slowly after an update to Beta
32.

> (In case your scene should happen to contain only a single, small,
> "radiosity-intensive" detail, such effects might occur when POV is waiting
> for the last cell to complete at the end of each pretrace step.)

Wasn't that; it was throughout the passes (3 radiosity pretrace passes in
total). See my last comment (below).

> As you can see, there is no SMP involved so far. It's just the block-
> oriented pixel rendering order you see, nothing more.

Yes, I see. I understood the 3.6 horizontal artefact, and I'll admit that block
artefacts are perceptually more acceptable.

> A good rule of thumb for pretrace tuning is that at least half the samples
> should be gathered during pretrace already; unless you're using exotic
> settings, this is usually sufficient to keep the artifacts below visibility.
> Provided you switch "always_sample" off, that is.

always_sample had slipped through the net. D'oh! :o)

> Make sure to use "low_error_factor" and "nearest_count" - they're the
> chief weapons to make sure pretrace gives a good sample coverage for the
> main render.

Good advice, which I was 'bending' to make the render fit in memory: I think I'm
making radiosity work too hard, with 'normal on', and using a crackle pattern as
modifier to an implicit surface.


Post a reply to this message

From: clipka
Subject: Re: Radiosity and SMP
Date: 7 Apr 2009 00:40:00
Message: <web.49dad7f470f61477c28d169c0@news.povray.org>
"MessyBlob" <nomail@nomail> wrote:
> Good advice, which I was 'bending' to make the render fit in memory: I think I'm
> making radiosity work too hard, with 'normal on', and using a crackle pattern as
> modifier to an implicit surface.

How's performance when replacing that cracke with some other pattern, e.g. bozo?

I'd really expect the trouble to be there:

- The crackle cache has been changed from beta.31 to beta.32

- The crackle cache is global to all threads IIUC, so heavy crackle use may
cause stalls as threads compete for access

- Radiosity may contribute to heavy crackle use due to the sheer number of rays
shot; with "normals on" and crackle used for normals, it may even "focus" the
workload on the parts using the crackle due to a higher number of radiosity
samples required there

- The crackle cache using some sophisticated algorithm for memory allocation,
any bug in there may well cause memory leaks, or crashes in an attempt to free
non-allocated memory.


Post a reply to this message

From: Chris Cason
Subject: Re: Radiosity and SMP
Date: 8 Apr 2009 01:14:55
Message: <49dc32cf@news.povray.org>
clipka wrote:
> - The crackle cache is global to all threads IIUC, so heavy crackle use may
> cause stalls as threads compete for access

At the moment it's per-thread. Making it global to all threads is something I
have stated I want to do if it's possible without too much performance impact on
updates, since getting good results from it requires a fair chunk of RAM
per-thread - thus for high thread counts will chew a lot of main memory (as well
as reduce CPU cache effectiveness since the data is basically duplicated).

-- Chris


Post a reply to this message

From: MessyBlob
Subject: Re: Radiosity and SMP
Date: 8 Apr 2009 20:40:01
Message: <web.49dd43b970f61477addfbead0@news.povray.org>
Just changing the pattern from crackle to bumps gave a significant speed
improvement (4 mins instead of about 3 hours to render).


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.