|
|
So here is what came to my mind some minutes ago:
We have several "generations" of samples in the tree, depending on the radiosity
"bounce" depth.
(For those who are not too deep into radiosity's inner workings: The "bounce"
depth is what you control with the "recursion limit" parameter; if you set this
to 1, radiosity-based illumination of other objects is ignored when taking a
radiosity sample. if you set it to 2, there will be a "1st generation" of
samples taken ignoring the radiosity-based illumination of other objects, and a
"2nd generation" of samples using "1st generation" samples to estimate other
objects' radiosity-based illumination.)
As it seems, these generations are independent from the pretrace passes: During
any pretrace pass, samples of all generations may be created. These generations
are just a way to take into account how radiosity-based illumination of objects
affects the radiosity-based illumination of others, without running into a
"hen-and-egg problem".
Now let's look at the problems we have with radiosity in an SMP environment:
The core issue is that when radiosity is on, we cannot just split up the scene
into many subsections and render each one separately - because radiosity is
more about modeling the interactions between subsets of the scene than it is
about anything else.
More precisely, the main problem is in how we decide where to take a sample and
where to interpolate existing ones; this currently depends on the order in
which samples are taken, which gets messed up with multithreading.
But wait a minute - can we get that in more detail? We just mentioned those
generations. How does this affect our problem?
Let's look at the "1st generation" samples first:
When we have decided to take some "1st generation" sample, there is actually not
a problem: To compute it, we don't take into account any radiosity-based
lighting at all, so it will never require us to take any other samples.
This means that it's totally irrelevant which task actually *takes* a "1st
generation" sample, provided that we have decided that we *will* take it.
Deciding *if* to take a "1st-generation" sample *is* problematic though, because
in case that two tasks encounter almost the same point during their jobs, task
scheduling may affect which of these is encountered first.
How about "later-generation" samples then?
Deciding *if* to take a "later-generation" sample is just as problematic as with
"1st-generation" samples for just the same reason.
*Taking* a "later-generation" sample, however, is just as bad: We want to take
into account "previous-generation" samples, and may have to decide whether to
take more of those.
If we look closely, we see that the problem in *taking* a sample is actually the
problem of deciding *if* to take previous-generation samples. If we could
de-couple this - if we could safely have multiple tasks take samples in
parallel - then our issues would be solved.
We *could* de-couple this if we didn't need the new sample *instantly*: We could
"order" the sample, see what other "orders" come in from other tasks, and use
some sly algorithm to eliminate (near-) duplicates, using some rules that don't
depend on when or by which task the "order" was placed.
Unfortunately we need the samples to build the next-generation samples... or do
we?
Well, the *last*-generation samples are actually not needed before the final
trace. So they don't need to be accurate until the end of pretrace.
If the last-generation samples only need to be accurate by the end of pretrace,
then the sample generation they are based on only needs to be accurate before
the final pretrace pass. And so on.
So if we have N generations of samples, then the 1st generation samples need not
be accurate before the Nth-to-last pretrace step. And for 1st generation, being
accurate means exactly the same as being sampled.
Unfortunately, this doesn't mean that we can just start with the 1st generation
samples, then sample the previous ones, and so on. We first need to chose which
1st generations samples to take. Which means we need to know which ones we will
need. Which means that we will have to take the 2nd generation samples first...
well, not actually take them, but make an attempt to do so. Which in turn means
that we have to make an attempt at the 3rd generation samples before that. And
so on...
However, this doesn't render the approach infeasible, because now we do not have
to actually take samples during the first radiosity passes, but just make an
attempt to do so. And do the actual sampling later.
So here's how it can actually be done (let's assume we have 3 generations of
samples):
- For the first pretrace step, the picture is divided into multiple blocks, each
being treated as a separate "job". Each block is traced (probably at low
resolution) to find out which 3rd-generation samples *may* be needed. Within
each "job", this information is alread consolidated while tracing, sieving out
any near-duplicates we really don't want (or not now at least); when all jobs
have been finished, the data from all jobs is consolidated into a single tree,
again sieving out near-dupes.
- The result is a list (well, a tree actually) of 3rd-generation samples we
*really* want to take. Again we group multiple samples into "jobs" so we can
feed them to different tasks. We cannot take them right now because we don't
have any 2nd-generation samples we could use as a basis, but we'll trace them
nonetheless to find out what 2nd-generation samples we really want. Again, we
consolidate on a per-job basis first, and finally merge the results from all
the jobs.
- Now we have a list of 2nd-generation samples we really want to take. Same
procedure here: Chop it up into jobs, trace them to see which 1st-generation
samples would be of interest, consolidate on job-level while tracing, and do a
final consolidation of all data.
- At last, we have a list of 1st-generation samples that we will want. Now we
can do some real work: Again multiple jobs, but this time we will just trace,
and actually use the results to build our 1st-generation samples. No need for
any consolidation of data, because there will be no new desired sample
locations.
- We can now re-visit our 2nd-generation list, trace them again, and actually
use the results because we will have all the 1st generation samples we
initially needed.
- Now we can re-visit our 3rd-generation list, and actually sample them.
- Finally, we *could* re-trace the pretrace image, but we don't. Who cares about
pretrace results? Instead, we do another pretrace with a higher resolution.
Again this will give us some additional 3rd-generation sample locations we
*may* want, and ultimately we will consolidate it into a list of samples we
*do* want.
- We repeat the above process, but this time we already have some samples -
quite a lot actually. Suppose we had a 100x100 first pretrace and a "count"
value of 100, we would now have up to 10,000 3rd-level samples and up to
1,000,000 2nd level ones, if we didn't filter out close ones. And chances are
that we already have almost all 1st-level samples we'll ever need.
- We repeat this over and over again until we have reached the pretrace_end.
- When we do the final trace, chances are that we also have enough 3rd
generation samples. If we don't, however, it is probably cheaper to have every
"job" compute any missing samples himself right away (storing them in a
job-local cache). The pretrace didn't collect these samples, so chances are
that other jobs won't bother about them either. If this behaviour would be
undesired, the user could simply go for a lower pretrace_end.
There are a few things to consider with this approach:
When deciding on the initial set of samples to take, we will not have any
information yet on the distance to other objects (because this info is only
collected by the actual sampling), so the first set of samples needs to be
taken somewhat "blind" in this respect. However:
- The number of initial samples is comparatively small, so chances are they will
not turn out too close. In addition, an adaptive minimum_reuse could be used.
- If the initial set of samples would be subdivided into "jobs" by geometry (the
octree structure comes to mind here), further sieving could be done in the
context of a "job" as it collects samples. This would leave an increased "risk"
of near-dupes only at the "job borders".
- A further postprocessing step could be added to again "sieve" out near-dupes
(although I have no idea whatsoever why anyone would want to discard samples
once they have been taken).
The approach outlined above can be further improved.
Obviously, it involves a comparatively high number of traces done "for nothing",
except finding where to take more samples.
These traces, however, *can* be used for good after all:
For starters, when tracing for the first 3rd-generation samples (to find out
which 2nd-generation samples we actually need), we don't have any
2nd-generation samples at all, so the trace results seem to be invalid.
However, what we actually do here is collecting samples without any information
about other objects' radiosity-based illumination - so we can actually use the
samples as 1st-generation.
When tracing for the first 2nd-generation samples to find out which
1st-generation we actually need, again we have not many 1st-generation samples.
If we deliberately ignore them in the trace results, we can re-use them likewise
for more 1st generation samples.
I think we can actually do this through the whole pretrace steps until we're
quite confident that we have 1st generation samples everywhere we need them.
Then, after all pretraces have been done, we could just go through the
designated 2nd generation sample locations all over again (straightforward
multithreaded, as we already know where we want them), sample them for 2nd
generation, then move on to do the same for the 3rd generation. We'd already
know that we have all previous-generation samples for this.
So what we'd actually do then during pretrace would be to set up a network of
3rd, 2nd and 1st generation sample *locations*, but actually just sample them
as 1st generation for now, and later "update" the chosen ones to 2nd and
ultimately 3rd generation.
Note that to collect the necessary information to decide whether additional
samples are needed for a generation, it is (as far as I know) irrelevant
whether they are sampled for 1st, 2nd or 3rd generation: The geometry remains
the same.
By the way, this approach might also lend itself to impement an adaptive bounce
depth:
All 3rd generation samples are taken at the same location as 2nd generation
samples, which are in turn taken at the same location as 1st generation
samples.
If after taking the 2nd generation sample we find that brightness (or color) has
chaned less than a certain threshold, we could presume that 3rd generation will
not give any significant difference either.
This mechanism could - in addition to the recursion_limit - be governed by some
"minimum_recursion" and a kind of "recursion_bailout".
Yes, I think this approach is both feasible and good. Comments, anyone?
Post a reply to this message
|
|