|
|
On 6/11/24 03:43, Thorsten wrote:
> Hi Bill,
>
> the other issue to consider is that while there is no user interface for
> it, in theory multiple renders of the same scene can run in parallel.
> The actual solution to the whole problem is to keep the data needed not
> only thread-local but look carefully at what is actually cached and then
> ideally have it block local (also meaning, as with thread-local storage,
> that the pattern changes with render block size) or even better pixel
> local (no change with block size). To avoid the access to thread-local
> storage, the whole rendering actually could be overhauled (which would
> be good anyway) to move from a recursive to a stack based approach. That
> way the needed local data could be (more easily) passed as argument down
> to patterns ... but expect half a year full time to implement something
> like this.
>
> Thorsten
Hi Thorsten,
Thank you for your thoughts about the situation.
One thing I've not done is think about all patterns / perturbations /
shape, thread caching with respect to overlapping in-thread storage use.
In other words, what other problems like this might be sitting in the
code today...
On the blocking, you got me thinking one nearer term option with crackle
and facets might be to track the pattern / perturbation pointers
themselves alongside the usual cube centers. In cases where we get a
hit, but the pointers themselves don't match, we'd act like we missed
and create a new cache entry. Rather than stick that 'overlapping hit'
entry in the cache, we'd do the distance measures locally and discard
the entry. Not optimal, more storage for the cache, but it would be
better than just turning the cache off.
On storage block or pixel local/thread storage. Better I'd say, but so
long as the patterns might share the storage, I think it still leaves us
exposed given how the crackle / facets patterns work today.
Overhauling the rendering approach. Yeah, likely due and good, but not
at all trivial as you say. I'm not myself sure how such a restructuring
should look in total.
With the solver work I did now 5-6 years ago, I came to the conclusion a
fused shape/solver approach would be far better given we are
ray-tracing. See:
https://news.povray.org/povray.programming/thread/%3C5d0f64ff%241%40news.povray.org%3E/
When I think about really implementing that approach, I also start to
think about how a different approach to parallelism than our block based
approach could be good. One where we spin up the combined
shape/solver(s) as processes to which we'd send batches of rays at a
time and get back batches of intersections... Yeah, I'm practically
dreaming, but pretty sure that sort of set up would be best for the
merged uni-variate, polynomial solver/shape approach. How it well that
structure would work overall - I'm not at all sure. :-)
As a practical near term solution, one thing I want to try is similar to
what I did with the four ripple/wave value-pattern/normal-perturbation
re-writes. I dumped already calculated locations for ones always
calculated on the fly. At the default source location count of 10, the
hit for not storing the locations was 20% give or take - IIRC.
If I can figure out a way to re-write the crackle and facets at that
sort of performance hit, I'll probably just dump all the caching /
thread local storage in total for local stack based storage.
Whether I can accomplish such a re-write - at a performance hit not too
bad -is an open question at the moment. Not the least for the reason
it's a chunk of work which well might not work out as a solution in the
end - so I'm procrastinating.
Bill P.
Post a reply to this message
|
|