|
![](/i/fill.gif) |
On 5/6/24 15:35, William F Pokorny wrote:
> I've not tried a clang compile yet. Off to do that - in part because
> I've not done a clang compile of yuqk in long while...
The clang results for max gradients with min() are also all <<20.
---
That said, I found the clang++ compile broken! :-( While coding up a few
of of the vm inbuilt functions I used code like:
constexpr double Cn = 4.0*std::exp(-0.5)/std::sqrt(2.0); // **
which works with gnu compilers c+11 and later, but does not in today's
mainstream c++ versions for clang++. I knew there were differences in
what basic functions could be used in constant expressions, but I
somehow fooled myself into thinking clang++ was OK with those two (they
are officially available in later c++ standards).
The fix is to hard code the resultant values from exp() and sqrt() and
the fixes will be in the next yuqk release (R15).
(**) - The why of this approach is, in part, related to the fact POV-Ray
today defines many macro constants with MORE than double precision
digits as well as others at double precision. What I suspect is that
this mixed precision constant habit is sometimes not numerically
optimal, but I've never found the time to prove / disprove the thought.
---
Once the clang compile issue was fixed, I happened to notice that so
long as the clang++ configuration was run with the flag
'-fgnuc-version=5' to set internally to clang++ __GNUC__, I got Intel's
hand coded noise optimization for my i3. I didn't expect this to happen,
though maybe it is the right result.
The (perhaps bogus) story I've been carrying around in my head is that
the gnu g++ compile used to do the same dynamic optimization, but for at
least a couple years now, it has not... It's one of the issues on my
infinite to-do list.
So, the dynamic optimization situation on my i3 is today confusing /
worrying? The clang++ compile has the dynamic optimization:
Dynamic optimizations:
CPU detected: Intel,SSE2,AVX,AVX2,FMA3
Noise generator: avx2fma3-intel (hand-optimized by Intel)
While the g++ compile has the dynamic optimization:
Dynamic optimizations:
CPU detected: Intel,FMA3,FMA4
Noise generator: generic (portable)
The latter is safe, but likely not optimal. The clang++ compile is
finding the correct features for my i3.
Add to this, that in my testing back when clipka first ported the AMD
and Intel hand optimizations to the Linux / Unix builds, the optimized
code was only sometimes (the benchmark scene being one) faster on my i3.
It sometimes tested slower than the generic (using -march=native) too.
Quickly on the biscuit.pov scene today, clang's dynamic optimization is
2.3% faster than a -march=native compile.
Aside: clang native is currently 4.2% faster the g++ native with
relatively basic optimization flags, but knowing learning why is always
a time consuming trick and which is faster tends to change over time.
Lastly, since I rendered different versions of output png files while
looking at performance, I did image comparisons. The clang++ native vs
avx2fma3-intel results matches exactly - as should be. The interesting
bit is the g++ native vs clang++ native mismatched by 1/255 on two
pixels. :-)
Anyhow...
Bill P.
Post a reply to this message
|
![](/i/fill.gif) |