POV-Ray : Newsgroups : povray.general : isosurface and max_gradient : Re: isosurface and max_gradient Server Time
1 Jul 2024 13:04:35 EDT (-0400)
  Re: isosurface and max_gradient  
From: William F Pokorny
Date: 7 May 2024 07:18:31
Message: <663a0e07$1@news.povray.org>
On 5/6/24 15:35, William F Pokorny wrote:
> I've not tried a clang compile yet. Off to do that - in part because 
> I've not done a clang compile of yuqk in long while...

The clang results for max gradients with min() are also all <<20.

---

That said, I found the clang++ compile broken! :-( While coding up a few 
of of the vm inbuilt functions I used code like:

    constexpr double Cn = 4.0*std::exp(-0.5)/std::sqrt(2.0); // **

which works with gnu compilers c+11 and later, but does not in today's 
mainstream c++ versions for clang++. I knew there were differences in 
what basic functions could be used in constant expressions, but I 
somehow fooled myself into thinking clang++ was OK with those two (they 
are officially available in later c++ standards).

The fix is to hard code the resultant values from exp() and sqrt() and 
the fixes will be in the next yuqk release (R15).

(**) - The why of this approach is, in part, related to the fact POV-Ray 
today defines many macro constants with MORE than double precision 
digits as well as others at double precision. What I suspect is that 
this mixed precision constant habit is sometimes not numerically 
optimal, but I've never found the time to prove / disprove the thought.

---

Once the clang compile issue was fixed, I happened to notice that so 
long as the clang++ configuration was run with the flag 
'-fgnuc-version=5' to set internally to clang++ __GNUC__, I got Intel's 
hand coded noise optimization for my i3. I didn't expect this to happen, 
though maybe it is the right result.

The (perhaps bogus) story I've been carrying around in my head is that 
the gnu g++ compile used to do the same dynamic optimization, but for at 
least a couple years now, it has not... It's one of the issues on my 
infinite to-do list.

So, the dynamic optimization situation on my i3 is today confusing / 
worrying? The clang++ compile has the dynamic optimization:

  Dynamic optimizations:
   CPU detected: Intel,SSE2,AVX,AVX2,FMA3
   Noise generator: avx2fma3-intel (hand-optimized by Intel)

While the g++ compile has the dynamic optimization:

  Dynamic optimizations:
   CPU detected: Intel,FMA3,FMA4
   Noise generator: generic (portable)

The latter is safe, but likely not optimal. The clang++ compile is 
finding the correct features for my i3.

Add to this, that in my testing back when clipka first ported the AMD 
and Intel hand optimizations to the Linux / Unix builds, the optimized 
code was only sometimes (the benchmark scene being one) faster on my i3. 
It sometimes tested slower than the generic (using -march=native) too.

Quickly on the biscuit.pov scene today, clang's dynamic optimization is 
2.3% faster than a -march=native compile.

Aside: clang native is currently 4.2% faster the g++ native with 
relatively basic optimization flags, but knowing learning why is always 
a time consuming trick and which is faster tends to change over time.

Lastly, since I rendered different versions of output png files while 
looking at performance, I did image comparisons. The clang++ native vs 
avx2fma3-intel results matches exactly - as should be. The interesting 
bit is the g++ native vs clang++ native mismatched by 1/255 on two 
pixels. :-)

Anyhow...

Bill P.


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.