POV-Ray: Newsgroups: povray.beta-test: v3.8+ crackle instability (facets?) with >1 uses per thread.

POV-Ray : Newsgroups : povray.beta-test : v3.8+ crackle instability (facets?) with >1 uses per thread.		Server Time 12 Jul 2025 11:48:58 EDT (-0400)

<<< Previous 3 Messages

Goto Latest 10 Messages

Next 10 Messages >>>

From: Thorsten
Subject: Re: v3.8+ crackle instability (facets?) with >1 uses per thread.
Date: 11 Jun 2024 03:43:56
Message: <6668003c$1@news.povray.org>

On 10.06.2024 13:51, William F Pokorny wrote:
> On 6/9/24 00:21, William F Pokorny wrote:
>> Ah, and what about facets.
> 
> FWIW. The caching mechanism is simpler (older) for facets. As with 
> crackle I experimented some with forcing 100% misses. The slow down in 
> the heavy AA case is +195% as opposed to the +335% seen with crackle. 
> The difference likely comes down to the overhead for the simpler facets 
> cache being smaller. The facets cache comes close to what I wanted to 
> try with the crackle cache.
> 
> Going to let ideas to rattle around in my head for a while as to what to 
> do. ( 1. Limit use to one crackle and one facets use in any given scene. 
> 2. A cache per crackle/facets use / per thread. 3. ...)

Hi Bill,

the other issue to consider is that while there is no user interface for 
it, in theory multiple renders of the same scene can run in parallel. 
The actual solution to the whole problem is to keep the data needed not 
only thread-local but look carefully at what is actually cached and then 
ideally have it block local (also meaning, as with thread-local storage, 
that the pattern changes with render block size) or even better pixel 
local (no change with block size). To avoid the access to thread-local 
storage, the whole rendering actually could be overhauled (which would 
be good anyway) to move from a recursive to a stack based approach. That 
way the needed local data could be (more easily) passed as argument down 
to patterns ... but expect half a year full time to implement something 
like this.

Thorsten

Post a reply to this message

From: William F Pokorny
Subject: Re: v3.8+ crackle instability (facets?) with >1 uses per thread.
Date: 11 Jun 2024 11:39:12
Message: <66686fa0$1@news.povray.org>

On 6/11/24 03:43, Thorsten wrote:
> Hi Bill,
> 
> the other issue to consider is that while there is no user interface for 
> it, in theory multiple renders of the same scene can run in parallel. 
> The actual solution to the whole problem is to keep the data needed not 
> only thread-local but look carefully at what is actually cached and then 
> ideally have it block local (also meaning, as with thread-local storage, 
> that the pattern changes with render block size) or even better pixel 
> local (no change with block size). To avoid the access to thread-local 
> storage, the whole rendering actually could be overhauled (which would 
> be good anyway) to move from a recursive to a stack based approach. That 
> way the needed local data could be (more easily) passed as argument down 
> to patterns ... but expect half a year full time to implement something 
> like this.
> 
> Thorsten

Hi Thorsten,

Thank you for your thoughts about the situation.

One thing I've not done is think about all patterns / perturbations / 
shape, thread caching with respect to overlapping in-thread storage use. 
In other words, what other problems like this might be sitting in the 
code today...

On the blocking, you got me thinking one nearer term option with crackle 
and facets might be to track the pattern / perturbation pointers 
themselves alongside the usual cube centers. In cases where we get a 
hit, but the pointers themselves don't match, we'd act like we missed 
and create a new cache entry. Rather than stick that 'overlapping hit' 
entry in the cache, we'd do the distance measures locally and discard 
the entry. Not optimal, more storage for the cache, but it would be 
better than just turning the cache off.

On storage block or pixel local/thread storage. Better I'd say, but so 
long as the patterns might share the storage, I think it still leaves us 
exposed given how the crackle / facets patterns work today.

Overhauling the rendering approach. Yeah, likely due and good, but not 
at all trivial as you say. I'm not myself sure how such a restructuring 
should look in total.

With the solver work I did now 5-6 years ago, I came to the conclusion a 
fused shape/solver approach would be far better given we are 
ray-tracing. See:

https://news.povray.org/povray.programming/thread/%3C5d0f64ff%241%40news.povray.org%3E/

When I think about really implementing that approach, I also start to 
think about how a different approach to parallelism than our block based 
approach could be good. One where we spin up the combined 
shape/solver(s) as processes to which we'd send batches of rays at a 
time and get back batches of intersections... Yeah, I'm practically 
dreaming, but pretty sure that sort of set up would be best for the 
merged uni-variate, polynomial solver/shape approach. How it well that 
structure would work overall - I'm not at all sure. :-)

As a practical near term solution, one thing I want to try is similar to 
what I did with the four ripple/wave value-pattern/normal-perturbation 
re-writes. I dumped already calculated locations for ones always 
calculated on the fly. At the default source location count of 10, the 
hit for not storing the locations was 20% give or take - IIRC.

If I can figure out a way to re-write the crackle and facets at that 
sort of performance hit, I'll probably just dump all the caching / 
thread local storage in total for local stack based storage.

Whether I can accomplish such a re-write - at a performance hit not too 
bad -is an open question at the moment. Not the least for the reason 
it's a chunk of work which well might not work out as a solution in the 
end - so I'm procrastinating.

Bill P.

Post a reply to this message

From: Thorsten
Subject: Re: v3.8+ crackle instability (facets?) with >1 uses per thread.
Date: 11 Jun 2024 14:17:07
Message: <666894a3$1@news.povray.org>

On 11.06.2024 17:39, William F Pokorny wrote:
> When I think about really implementing that approach, I also start to 
> think about how a different approach to parallelism than our block based 
> approach could be good. One where we spin up the combined 
> shape/solver(s) as processes to which we'd send batches of rays at a 
> time and get back batches of intersections... Yeah, I'm practically 
> dreaming, but pretty sure that sort of set up would be best for the 
> merged uni-variate, polynomial solver/shape approach. How it well that 
> structure would work overall - I'm not at all sure. 😄

Well, yes, an intersection based approach would probably offer the most 
potential performance on a shared memory system. You would also end up 
with a stack-based approach automatically that way. However, you hit 
sort of a wall once you get to the really big multi-die systems like 
Epyc and newer Xeons because they only share the last level cache with 
all cores. So in the end the best performance probably hides somewhere 
in a hybrid of the two with blocks still offering some benefit for large 
multi-core systems. That is, of course, assuming the ray order doesn't 
disrupt first and second level caches too much. It is impossible to 
predict the complexity with modern CPU, I think.

The benefit would be that the "texturing" would become a completely 
separate task, and could actually be done (sans reflection and 
refraction) after tracing, which, if nothing else, would lead to a cool 
looking render preview. The other effect would be that at least bounding 
optimisations and mesh intersection testing could be done on a GPU.

Yet another benefit you get from separating the tracing and the 
texturing is that you end up with a sort of frame buffer that contains 
object data. An idea I never pursued to the end 20 or so years ago was 
that this gives rise to the ability to edit a ray-traced scene on the 
fly because you have access to the objects making up an individual pixel 
and can separate objects in and out of the scene as long as the camera 
doesn't move.

Thorsten

Post a reply to this message

From: William F Pokorny
Subject: Re: v3.8+ crackle instability (facets?) with >1 uses per thread.
Date: 13 Jun 2024 19:26:02
Message: <666b800a$1@news.povray.org>

On 6/11/24 14:17, Thorsten wrote:
> It is impossible to predict the complexity with modern CPU, I think.

I agree. Today's hardware optimizations make performance tuning a tough 
trick - and make questionable a number of "rules of thumb" about which 
algorithms perform best.

> 
> The benefit would be that the "texturing" would become a completely 
> separate task, and could actually be done (sans reflection and 
> refraction) after tracing, which, if nothing else, would lead to a cool 
> looking render preview. The other effect would be that at least bounding 
> optimisations and mesh intersection testing could be done on a GPU.
> 
> Yet another benefit you get from separating the tracing and the 
> texturing is that you end up with a sort of frame buffer that contains 
> object data. An idea I never pursued to the end 20 or so years ago was 
> that this gives rise to the ability to edit a ray-traced scene on the 
> fly because you have access to the objects making up an individual pixel 
> and can separate objects in and out of the scene as long as the camera 
> doesn't move.

Cool ideas. :-) I can see how some parts might work, but far from all of 
it.

Our ray tracing and texturing is today tangled in places (adc bailout, 
filtering/transparency, media, object modifiers). There is too how to 
handle anti-aliasing (AA) / camera focal blur.

Though our 'AA' approach today is expensive(a), it's a strength with 
respect to 'true result' that each sample ray considers the scene - 
including texturing - alongside all the ray tracing / branching in total.

Bill P.

(a) - With respect to performance, on my 'try it someday' list are 
cheaper AA / focal blur modes where the rays beyond some 'sampling 
depth/count' would terminate at a much shallower max_trace_level/sample 
count(*). Or maybe we gradually reduce the trace depth in opposition to 
the AA/blur sampling 'depth'. Results would be less true, but I 
'suspect' they'd often look good as a rule. (There is a tradeoff buried 
in the idea as the less accurate results due shallower ray trace depth 
would sometimes itself trigger additional sampling - and sometimes not 
where we would otherwise have shot more rays.)

(*) - Yes! I made trying the idea harder by implementing the forced min 
sampling AA in yuqk.

Post a reply to this message

From: William F Pokorny
Subject: Re: v3.8+ crackle instability (facets?) with >1 uses per thread.
Date: 14 Jun 2024 08:27:14
Message: <666c3722$1@news.povray.org>

On 6/8/24 15:12, William F Pokorny wrote:
> The crackle pattern and facets perturbation maintain thread local 
> storage so information can be cached in a thread safe way.

Note too:

https://stackoverflow.com/questions/35985960/c-why-is-boosthash-combine-the-best-way-to-combine-hash-values

---
In working to clean up and commit my last updates, I ran across a TODO 
comment I'd added to friend std::size_t hash_value() in cracklecache.h 
about the initial seed value of 0 - which bothers me. I did a quick 
search this morning to look for rumblings about boost:combine().

Other issues aside. It might be our crackle caching mechanism is less 
effective than it could be.

Bill P.

Post a reply to this message

From: Bald Eagle
Subject: Re: v3.8+ crackle instability (facets?) with >1 uses per thread.
Date: 14 Jun 2024 09:50:00
Message: <web.666c4a5b18b34e675a6710c25979125@news.povray.org>

Minimally hijacking this thread to just post an FYI which may be helpful in some
of your source-code optimizing work.

https://iquilezles.org/articles/noacos/

I haven't looked under the hood in a while to see how we're handling stuff like
this, but it seems like it could provide some performance increases, and for the
basis for some macros / include files.

- BW

Post a reply to this message

From: William F Pokorny
Subject: Re: v3.8+ crackle instability (facets?) with >1 uses per thread.
Date: 15 Jun 2024 08:43:08
Message: <666d8c5c$1@news.povray.org>

On 6/14/24 09:49, Bald Eagle wrote:
> Minimally hijacking this thread to just post an FYI which may be helpful in some
> of your source-code optimizing work.
> 
> https://iquilezles.org/articles/noacos/

Thanks. Been some years, but I read that article at some point in the 
past! Good to be reminded of it.

Bill P.

Post a reply to this message

From: William F Pokorny
Subject: Re: v3.8+ crackle instability (facets?) with >1 uses per thread.
Date: 30 Oct 2024 02:19:17
Message: <6721cfe5$1@news.povray.org>

On 6/8/24 15:12, William F Pokorny wrote:
> I've been playing with 'facets' and 'crackle' of late. I've turned up a 
> bug (or two) (*).
> 
> Documenting now - partly so I can think through what I'm seeing as I write.
> 
> The crackle pattern and facets perturbation maintain thread local 
> storage so information can be cached in a thread safe way.
> 
> The issue, I think, is that that storage is set up to work with one 
> crackle and/or facets use per thread and no more.
> 
> Once we run >1 of either in the same thread they share the thread local 
> storage. This >1 usage per thread happens, for example, when we layer 
> textures both based upon crackle

An update for those who might follow at some later time...

I think I've finished the re-write of the crackle pattern with a simpler 
fixed size, per thread cache (for other than ip_solid on) which tracks 
the pattern along with the center cube location. To be released in 
yuqk(R16).

In testing the new code I discovered the 'repeat' feature of v3.8 beta 2 
has a self cache collision issue at the origin in addition to the cache 
collision issues of multiple crackle patterns within a thread.

For the attached images the scene set up is a large disc with a hole. 
Within the hold there is a second smaller disc which doesn't quite fill 
the hole. The rose color is the background seen through a gap. The outer 
disc crackle pattern is scaled very small, but is otherwise the default 
crackle.

The repeat, self, cache collision bug of v3.8 beta 2 is shown in the 
upper left. I didn't chase a fix.

The image in the upper right is the version of yuqk I released in July 
(R15) where, by hack, I disabled the crackle caching. The crackle 
implementation is still what is in V3.8 beta 2. The repeat works as I 
thin clipka intended!

In the lower left showing my development yuqk re-write, crackle repeat 
feature result. Yes, its different than v3.8. I didn't like the 
complexity and cost of the v3.8 implementation and went with something 
simpler (I avoided the self cache repeat bug by chance...). With yuqk 
(R16) the pattern flipping would be done with warp { repeat }(s).

The lower right is there just to fill out the 4x4! It shows the use of 
a, new to yuqk, crackle ip_seed feature to change the inner disc's 
ip_solid result. I added ip_seed to make it easier to get different 
crackle looks on different shapes otherwise using the same crackle 
pattern specification.

Bill P.

Aside 1: What is the repeated pattern on the outer disc seen in the v3.8 
top row? It's a side effect of the more limited accuracy of the hashing 
mechanism used to create the per cube point offsets in the v3.8 code. I 
believe I changed things so this type of artifact is much less likely 
with yuqk.

Aside 2 (*): Why is yuqk's lower left image a little brighter than 
v3.8s? One of the aspects of the traditional POV-Ray crackle cube point 
offsets is that they work from a starting corner. This produces a result 
which, to my eye, creates too many pinched regions in the pattern's 
result. With the yuqk re-write the offsets are done from the cube center 
in a +-(0.0 to 0.49) way. Less pinching, more white area, brighter image...

(*) - This yuqk change unexpectedly created a sampling issue where, when 
scaling the crackle pattern very small, the crackle's inner cube nature 
becomes more quickly apparent when anti-aliasing is off. It's a kind of 
harmonic of rays with the underlying, less pinched crackle pattern. It's 
not an issue when AA is used.

Post a reply to this message

Attachments:
Download 'v38b2_repeat_at_origin_bug.pov.txt' (1 KB) Download 'v38_repeat_at_origin_bug.png' (358 KB)

Preview of image 'v38_repeat_at_origin_bug.png'

From: William F Pokorny
Subject: Re: v3.8+ crackle instability (facets?) with >1 uses per thread.
Date: 31 Oct 2024 19:11:14
Message: <67240e92$1@news.povray.org>

On 10/30/24 02:19, William F Pokorny wrote:
> In the lower left showing my development yuqk re-write, crackle repeat 
> feature result. Yes, its different than v3.8. I didn't like the 
> complexity and cost of the v3.8 implementation and went with something 
> simpler (I avoided the self cache repeat bug by chance...). With yuqk 
> (R16) the pattern flipping would be done with warp { repeat }(s).

In the end, after I played a bit, I didn't much like my alternate repeat 
implementation in yuqk...

Took me a while, but I think I've gotten to what clipka was aiming for 
in his v3.8 crackle repeat feature. A <3,3,0> repeat is shown in the 
center of the attached image.

Bill P.

Post a reply to this message

Attachments:
Download 'yuqk_crackle_repeat.jpg' (122 KB)

Preview of image 'yuqk_crackle_repeat.jpg'

From: Bald Eagle
Subject: Re: v3.8+ crackle instability (facets?) with >1 uses per thread.
Date: 1 Nov 2024 09:40:00
Message: <web.6724d96618b34e67a2cb7a7025979125@news.povray.org>

William F Pokorny <ano### [at] anonymousorg> wrote:

> Took me a while, but I think I've gotten to what clipka was aiming for
> in his v3.8 crackle repeat feature. A <3,3,0> repeat is shown in the
> center of the attached image.

I would greatly appreciate some advice on how to properly accomplish this, as my
experiments in the past were fraught with unwanted artefacts:
https://news.povray.org/web.6394be0a7dc652cc1f9dae3025979125%40news.povray.org

I'm sure that one of your long "ramblings on" would be an insightful read, and
probably spur on other tangential projects - as they always do.

Hopefully we'll both have a few round-tuits at some point to compare notes.

Good work, and much appreciated!

- BW

Post a reply to this message

<<< Previous 3 Messages

Goto Latest 10 Messages

Next 10 Messages >>>