|
|
|
|
|
|
| |
| |
|
|
From: Thorsten
Subject: Re: v3.8+ crackle instability (facets?) with >1 uses per thread.
Date: 11 Jun 2024 03:43:56
Message: <6668003c$1@news.povray.org>
|
|
|
| |
| |
|
|
On 10.06.2024 13:51, William F Pokorny wrote:
> On 6/9/24 00:21, William F Pokorny wrote:
>> Ah, and what about facets.
>
> FWIW. The caching mechanism is simpler (older) for facets. As with
> crackle I experimented some with forcing 100% misses. The slow down in
> the heavy AA case is +195% as opposed to the +335% seen with crackle.
> The difference likely comes down to the overhead for the simpler facets
> cache being smaller. The facets cache comes close to what I wanted to
> try with the crackle cache.
>
> Going to let ideas to rattle around in my head for a while as to what to
> do. ( 1. Limit use to one crackle and one facets use in any given scene.
> 2. A cache per crackle/facets use / per thread. 3. ...)
Hi Bill,
the other issue to consider is that while there is no user interface for
it, in theory multiple renders of the same scene can run in parallel.
The actual solution to the whole problem is to keep the data needed not
only thread-local but look carefully at what is actually cached and then
ideally have it block local (also meaning, as with thread-local storage,
that the pattern changes with render block size) or even better pixel
local (no change with block size). To avoid the access to thread-local
storage, the whole rendering actually could be overhauled (which would
be good anyway) to move from a recursive to a stack based approach. That
way the needed local data could be (more easily) passed as argument down
to patterns ... but expect half a year full time to implement something
like this.
Thorsten
Post a reply to this message
|
|
| |
| |
|
|
From: William F Pokorny
Subject: Re: v3.8+ crackle instability (facets?) with >1 uses per thread.
Date: 11 Jun 2024 11:39:12
Message: <66686fa0$1@news.povray.org>
|
|
|
| |
| |
|
|
On 6/11/24 03:43, Thorsten wrote:
> Hi Bill,
>
> the other issue to consider is that while there is no user interface for
> it, in theory multiple renders of the same scene can run in parallel.
> The actual solution to the whole problem is to keep the data needed not
> only thread-local but look carefully at what is actually cached and then
> ideally have it block local (also meaning, as with thread-local storage,
> that the pattern changes with render block size) or even better pixel
> local (no change with block size). To avoid the access to thread-local
> storage, the whole rendering actually could be overhauled (which would
> be good anyway) to move from a recursive to a stack based approach. That
> way the needed local data could be (more easily) passed as argument down
> to patterns ... but expect half a year full time to implement something
> like this.
>
> Thorsten
Hi Thorsten,
Thank you for your thoughts about the situation.
One thing I've not done is think about all patterns / perturbations /
shape, thread caching with respect to overlapping in-thread storage use.
In other words, what other problems like this might be sitting in the
code today...
On the blocking, you got me thinking one nearer term option with crackle
and facets might be to track the pattern / perturbation pointers
themselves alongside the usual cube centers. In cases where we get a
hit, but the pointers themselves don't match, we'd act like we missed
and create a new cache entry. Rather than stick that 'overlapping hit'
entry in the cache, we'd do the distance measures locally and discard
the entry. Not optimal, more storage for the cache, but it would be
better than just turning the cache off.
On storage block or pixel local/thread storage. Better I'd say, but so
long as the patterns might share the storage, I think it still leaves us
exposed given how the crackle / facets patterns work today.
Overhauling the rendering approach. Yeah, likely due and good, but not
at all trivial as you say. I'm not myself sure how such a restructuring
should look in total.
With the solver work I did now 5-6 years ago, I came to the conclusion a
fused shape/solver approach would be far better given we are
ray-tracing. See:
https://news.povray.org/povray.programming/thread/%3C5d0f64ff%241%40news.povray.org%3E/
When I think about really implementing that approach, I also start to
think about how a different approach to parallelism than our block based
approach could be good. One where we spin up the combined
shape/solver(s) as processes to which we'd send batches of rays at a
time and get back batches of intersections... Yeah, I'm practically
dreaming, but pretty sure that sort of set up would be best for the
merged uni-variate, polynomial solver/shape approach. How it well that
structure would work overall - I'm not at all sure. :-)
As a practical near term solution, one thing I want to try is similar to
what I did with the four ripple/wave value-pattern/normal-perturbation
re-writes. I dumped already calculated locations for ones always
calculated on the fly. At the default source location count of 10, the
hit for not storing the locations was 20% give or take - IIRC.
If I can figure out a way to re-write the crackle and facets at that
sort of performance hit, I'll probably just dump all the caching /
thread local storage in total for local stack based storage.
Whether I can accomplish such a re-write - at a performance hit not too
bad -is an open question at the moment. Not the least for the reason
it's a chunk of work which well might not work out as a solution in the
end - so I'm procrastinating.
Bill P.
Post a reply to this message
|
|
| |
| |
|
|
From: Thorsten
Subject: Re: v3.8+ crackle instability (facets?) with >1 uses per thread.
Date: 11 Jun 2024 14:17:07
Message: <666894a3$1@news.povray.org>
|
|
|
| |
| |
|
|
On 11.06.2024 17:39, William F Pokorny wrote:
> When I think about really implementing that approach, I also start to
> think about how a different approach to parallelism than our block based
> approach could be good. One where we spin up the combined
> shape/solver(s) as processes to which we'd send batches of rays at a
> time and get back batches of intersections... Yeah, I'm practically
> dreaming, but pretty sure that sort of set up would be best for the
> merged uni-variate, polynomial solver/shape approach. How it well that
> structure would work overall - I'm not at all sure. 😄
Well, yes, an intersection based approach would probably offer the most
potential performance on a shared memory system. You would also end up
with a stack-based approach automatically that way. However, you hit
sort of a wall once you get to the really big multi-die systems like
Epyc and newer Xeons because they only share the last level cache with
all cores. So in the end the best performance probably hides somewhere
in a hybrid of the two with blocks still offering some benefit for large
multi-core systems. That is, of course, assuming the ray order doesn't
disrupt first and second level caches too much. It is impossible to
predict the complexity with modern CPU, I think.
The benefit would be that the "texturing" would become a completely
separate task, and could actually be done (sans reflection and
refraction) after tracing, which, if nothing else, would lead to a cool
looking render preview. The other effect would be that at least bounding
optimisations and mesh intersection testing could be done on a GPU.
Yet another benefit you get from separating the tracing and the
texturing is that you end up with a sort of frame buffer that contains
object data. An idea I never pursued to the end 20 or so years ago was
that this gives rise to the ability to edit a ray-traced scene on the
fly because you have access to the objects making up an individual pixel
and can separate objects in and out of the scene as long as the camera
doesn't move.
Thorsten
Post a reply to this message
|
|
| |
| |
|
|
From: William F Pokorny
Subject: Re: v3.8+ crackle instability (facets?) with >1 uses per thread.
Date: 13 Jun 2024 19:26:02
Message: <666b800a$1@news.povray.org>
|
|
|
| |
| |
|
|
On 6/11/24 14:17, Thorsten wrote:
> It is impossible to predict the complexity with modern CPU, I think.
I agree. Today's hardware optimizations make performance tuning a tough
trick - and make questionable a number of "rules of thumb" about which
algorithms perform best.
>
> The benefit would be that the "texturing" would become a completely
> separate task, and could actually be done (sans reflection and
> refraction) after tracing, which, if nothing else, would lead to a cool
> looking render preview. The other effect would be that at least bounding
> optimisations and mesh intersection testing could be done on a GPU.
>
> Yet another benefit you get from separating the tracing and the
> texturing is that you end up with a sort of frame buffer that contains
> object data. An idea I never pursued to the end 20 or so years ago was
> that this gives rise to the ability to edit a ray-traced scene on the
> fly because you have access to the objects making up an individual pixel
> and can separate objects in and out of the scene as long as the camera
> doesn't move.
Cool ideas. :-) I can see how some parts might work, but far from all of
it.
Our ray tracing and texturing is today tangled in places (adc bailout,
filtering/transparency, media, object modifiers). There is too how to
handle anti-aliasing (AA) / camera focal blur.
Though our 'AA' approach today is expensive(a), it's a strength with
respect to 'true result' that each sample ray considers the scene -
including texturing - alongside all the ray tracing / branching in total.
Bill P.
(a) - With respect to performance, on my 'try it someday' list are
cheaper AA / focal blur modes where the rays beyond some 'sampling
depth/count' would terminate at a much shallower max_trace_level/sample
count(*). Or maybe we gradually reduce the trace depth in opposition to
the AA/blur sampling 'depth'. Results would be less true, but I
'suspect' they'd often look good as a rule. (There is a tradeoff buried
in the idea as the less accurate results due shallower ray trace depth
would sometimes itself trigger additional sampling - and sometimes not
where we would otherwise have shot more rays.)
(*) - Yes! I made trying the idea harder by implementing the forced min
sampling AA in yuqk.
Post a reply to this message
|
|
| |
| |
|
|
From: William F Pokorny
Subject: Re: v3.8+ crackle instability (facets?) with >1 uses per thread.
Date: 14 Jun 2024 08:27:14
Message: <666c3722$1@news.povray.org>
|
|
|
| |
| |
|
|
On 6/8/24 15:12, William F Pokorny wrote:
> The crackle pattern and facets perturbation maintain thread local
> storage so information can be cached in a thread safe way.
Note too:
https://stackoverflow.com/questions/35985960/c-why-is-boosthash-combine-the-best-way-to-combine-hash-values
---
In working to clean up and commit my last updates, I ran across a TODO
comment I'd added to friend std::size_t hash_value() in cracklecache.h
about the initial seed value of 0 - which bothers me. I did a quick
search this morning to look for rumblings about boost:combine().
Other issues aside. It might be our crackle caching mechanism is less
effective than it could be.
Bill P.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Minimally hijacking this thread to just post an FYI which may be helpful in some
of your source-code optimizing work.
https://iquilezles.org/articles/noacos/
I haven't looked under the hood in a while to see how we're handling stuff like
this, but it seems like it could provide some performance increases, and for the
basis for some macros / include files.
- BW
Post a reply to this message
|
|
| |
| |
|
|
From: William F Pokorny
Subject: Re: v3.8+ crackle instability (facets?) with >1 uses per thread.
Date: 15 Jun 2024 08:43:08
Message: <666d8c5c$1@news.povray.org>
|
|
|
| |
| |
|
|
On 6/14/24 09:49, Bald Eagle wrote:
> Minimally hijacking this thread to just post an FYI which may be helpful in some
> of your source-code optimizing work.
>
> https://iquilezles.org/articles/noacos/
Thanks. Been some years, but I read that article at some point in the
past! Good to be reminded of it.
Bill P.
Post a reply to this message
|
|
| |
| |
|
|
From: William F Pokorny
Subject: Re: v3.8+ crackle instability (facets?) with >1 uses per thread.
Date: 30 Oct 2024 02:19:17
Message: <6721cfe5$1@news.povray.org>
|
|
|
| |
| |
|
|
On 6/8/24 15:12, William F Pokorny wrote:
> I've been playing with 'facets' and 'crackle' of late. I've turned up a
> bug (or two) (*).
>
> Documenting now - partly so I can think through what I'm seeing as I write.
>
> The crackle pattern and facets perturbation maintain thread local
> storage so information can be cached in a thread safe way.
>
> The issue, I think, is that that storage is set up to work with one
> crackle and/or facets use per thread and no more.
>
> Once we run >1 of either in the same thread they share the thread local
> storage. This >1 usage per thread happens, for example, when we layer
> textures both based upon crackle
An update for those who might follow at some later time...
I think I've finished the re-write of the crackle pattern with a simpler
fixed size, per thread cache (for other than ip_solid on) which tracks
the pattern along with the center cube location. To be released in
yuqk(R16).
In testing the new code I discovered the 'repeat' feature of v3.8 beta 2
has a self cache collision issue at the origin in addition to the cache
collision issues of multiple crackle patterns within a thread.
For the attached images the scene set up is a large disc with a hole.
Within the hold there is a second smaller disc which doesn't quite fill
the hole. The rose color is the background seen through a gap. The outer
disc crackle pattern is scaled very small, but is otherwise the default
crackle.
The repeat, self, cache collision bug of v3.8 beta 2 is shown in the
upper left. I didn't chase a fix.
The image in the upper right is the version of yuqk I released in July
(R15) where, by hack, I disabled the crackle caching. The crackle
implementation is still what is in V3.8 beta 2. The repeat works as I
thin clipka intended!
In the lower left showing my development yuqk re-write, crackle repeat
feature result. Yes, its different than v3.8. I didn't like the
complexity and cost of the v3.8 implementation and went with something
simpler (I avoided the self cache repeat bug by chance...). With yuqk
(R16) the pattern flipping would be done with warp { repeat }(s).
The lower right is there just to fill out the 4x4! It shows the use of
a, new to yuqk, crackle ip_seed feature to change the inner disc's
ip_solid result. I added ip_seed to make it easier to get different
crackle looks on different shapes otherwise using the same crackle
pattern specification.
Bill P.
Aside 1: What is the repeated pattern on the outer disc seen in the v3.8
top row? It's a side effect of the more limited accuracy of the hashing
mechanism used to create the per cube point offsets in the v3.8 code. I
believe I changed things so this type of artifact is much less likely
with yuqk.
Aside 2 (*): Why is yuqk's lower left image a little brighter than
v3.8s? One of the aspects of the traditional POV-Ray crackle cube point
offsets is that they work from a starting corner. This produces a result
which, to my eye, creates too many pinched regions in the pattern's
result. With the yuqk re-write the offsets are done from the cube center
in a +-(0.0 to 0.49) way. Less pinching, more white area, brighter image...
(*) - This yuqk change unexpectedly created a sampling issue where, when
scaling the crackle pattern very small, the crackle's inner cube nature
becomes more quickly apparent when anti-aliasing is off. It's a kind of
harmonic of rays with the underlying, less pinched crackle pattern. It's
not an issue when AA is used.
Post a reply to this message
Attachments:
Download 'v38b2_repeat_at_origin_bug.pov.txt' (1 KB)
Download 'v38_repeat_at_origin_bug.png' (358 KB)
Preview of image 'v38_repeat_at_origin_bug.png'
|
|
| |
| |
|
|
From: William F Pokorny
Subject: Re: v3.8+ crackle instability (facets?) with >1 uses per thread.
Date: 31 Oct 2024 19:11:14
Message: <67240e92$1@news.povray.org>
|
|
|
| |
| |
|
|
On 10/30/24 02:19, William F Pokorny wrote:
> In the lower left showing my development yuqk re-write, crackle repeat
> feature result. Yes, its different than v3.8. I didn't like the
> complexity and cost of the v3.8 implementation and went with something
> simpler (I avoided the self cache repeat bug by chance...). With yuqk
> (R16) the pattern flipping would be done with warp { repeat }(s).
In the end, after I played a bit, I didn't much like my alternate repeat
implementation in yuqk...
Took me a while, but I think I've gotten to what clipka was aiming for
in his v3.8 crackle repeat feature. A <3,3,0> repeat is shown in the
center of the attached image.
Bill P.
Post a reply to this message
Attachments:
Download 'yuqk_crackle_repeat.jpg' (122 KB)
Preview of image 'yuqk_crackle_repeat.jpg'
|
|
| |
| |
|
|
|
|
| |
| |
|
|
William F Pokorny <ano### [at] anonymousorg> wrote:
> Took me a while, but I think I've gotten to what clipka was aiming for
> in his v3.8 crackle repeat feature. A <3,3,0> repeat is shown in the
> center of the attached image.
I would greatly appreciate some advice on how to properly accomplish this, as my
experiments in the past were fraught with unwanted artefacts:
https://news.povray.org/web.6394be0a7dc652cc1f9dae3025979125%40news.povray.org
I'm sure that one of your long "ramblings on" would be an insightful read, and
probably spur on other tangential projects - as they always do.
Hopefully we'll both have a few round-tuits at some point to compare notes.
Good work, and much appreciated!
- BW
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|