|
|
On 6/8/24 15:12, William F Pokorny wrote:
> The issue, I think, is that that storage is set up to work with one
> crackle and/or facets use per thread and no more.
OK. I ran an experiment where I forced 100% cache misses in yuqk. I then
re-ran a collections of scenes running multiple crackle patterns per
thread. Everything looks OK.
This method of disabling the cache is not optimal as the cache set up
mechanism is forced to run all the time, but the cached data is never
used. Still, I ran some timing using the crackle2_v38.pov scene with no
AA and forced (+a0.0) heavy AA.
p380b2 -> yuqk (R15). Cache active. No AA. Shows yuqk 62% faster(a).
p380b2 -> yuqk (R15). Cache active. With AA. Shows yuqk 34% faster(a).
yuqk (with cache) -> yuqk (all misses). No AA. yuqk is 240% slower.
yuqk (with cache) -> yuqk (all misses). With AA. yuqk is 335% slower.
So... Forcing cache misses and getting no cache benefit is very costly.
Of course, the results are correct, which matters more.
Suppose, I need to attempt thread local storage which completely
replaces the current cache mechanism to see where that performance comes
in. :-(
Unsure if I'll do that work for R15 though. I might just force the cache
misses for now. It would leave me a release where crackle is working and
I've not further twiddled with how the the code works.
Ah, and what about facets.
Bill P.
(a) - Is the current yuqk speed up over p380b2 is mostly:
https://news.povray.org/povray.beta-test/thread/%3C663eff9d%241%40news.povray.org%3E/
I'm unsure what else it might be if not.
Post a reply to this message
|
|