POV-Ray: Newsgroups: povray.beta-test: Technical verification build "v3.8.0-beta.668"

POV-Ray : Newsgroups : povray.beta-test : Technical verification build "v3.8.0-beta.668" - Unix package!		Server Time 22 Dec 2024 03:48:19 EST (-0500)

From: clipka
Subject: Technical verification build "v3.8.0-beta.668" - Unix package!
Date: 23 Jun 2021 19:28:39
Message: <60d3c3a7@news.povray.org>

Hi folks,

This one is specifically for the Unix jockeys among you:

https://github.com/c-lipka/povray/releases/tag/v3.8.0-pre-beta.668


I'd like to point your attention specifically to the 
`povunix-v3.8.9-beta.668.tar.gz` tarball, which contains a dedicated 
Unix source package with the following (alleged) features:

- a much leaner package (25 MB instead of 63 MB)
- no need to run that pesky `prebuild.sh`
- all the files in places where Linux gurus expect it

Please give it a spin to see if it does what it's meant to do.


There's also an auto-generated source documentaion in both HTML and PDF 
format (povray-*-sourcedoc-html.7z and povray-*-sourcedoc.pdf, 
respectively), in case you're curious.

(And of course a Windows installer, but there shouldn't be anything new 
to that.)


Have fun testing!
Christoph

Post a reply to this message

From: Thomas Debe
Subject: Re: Technical verification build "v3.8.0-beta.668" - Unix package!
Date: 23 Jun 2021 23:20:37
Message: <60d3fa05$1@news.povray.org>

Am 24.06.21 um 01:28 schrieb clipka:
Hallo Christoph !

1.) Problem:
Build povunix-v3.8.9-beta.668.tar.gz Compiler clang-x:
Optimized Noise-Functions not compiled !
Clang defines __clang__ Not __GNUC__

Solution :
File :  unix/povconfig/syspovconfig.h

Z 169ff:  #if defined (__clang__)
             #define HAVE_ASM_AVX
             #define HAVE_ASM_AVX2
             #define HAVE_ASM_FMA3
             #define HAVE_ASM_FMA4
          #endif

Z. 179: // most notably platform-specific optimized implementations.
         #if defined (__GNUC__) || defined(__clang__)

2.) Problem Optimized Noise-Function Compiler: gcc-10
Povray-Output :
                ....
		
Dynamic optimizations:
   CPU detected: AMD,SSE2,AVX,AVX2,FMA3
   Noise generator: avx-generic (compiler-optimized)

CPU: AMD Ryzen 2700
cat /proc/cpuinfo | grep avx :  se4_1 sse4_2 movbe popcnt aes xsave avx ..
  avx2
  fma

So it should work with Intels implementation, but there is an vendor 
check in :

platform/x86/cpuid.cpp

Solution:

bool CPUInfo::IsIntel()
{
  return gpData->cpuidInfo.vendorId == kCPUVendor_Intel|| kCPUVendor_AMD;
}

I activated  CHECK_FUNCTIONAL in the optimized functions. No exception
arises. So I think it's ok.


Greetings
Thomas Debe

Post a reply to this message

From: clipka
Subject: Re: Technical verification build "v3.8.0-beta.668" - Unix package!
Date: 24 Jun 2021 07:19:37
Message: <60d46a49$1@news.povray.org>

Am 24.06.2021 um 05:20 schrieb Thomas Debe:
> Am 24.06.21 um 01:28 schrieb clipka:
> Hallo Christoph !
> 
> 1.) Problem:
> Build povunix-v3.8.9-beta.668.tar.gz Compiler clang-x:
> Optimized Noise-Functions not compiled !
> Clang defines __clang__ Not __GNUC__

(I presume that's a typo and you mean povunix-v3.8.0-beta.668.tar.gz.)

Can you please double-check whether these problems are specific to the 
Unix source package (povunix-v3.8.0-beta.668.tar.gz), or whether they 
also occur when building form the "raw" repository source 
(https://github.com/c-lipka/povray/archive/refs/tags/v3.8.0-pre-beta.668.tar.gz)?

> 
> Solution :
> File :  unix/povconfig/syspovconfig.h
> 
> Z 169ff:  #if defined (__clang__)
>              #define HAVE_ASM_AVX
>              #define HAVE_ASM_AVX2
>              #define HAVE_ASM_FMA3
>              #define HAVE_ASM_FMA4
>           #endif
> 
> Z. 179: // most notably platform-specific optimized implementations.
>          #if defined (__GNUC__) || defined(__clang__)
> 
> 2.) Problem Optimized Noise-Function Compiler: gcc-10
> Povray-Output :
>                 ....
> 
> Dynamic optimizations:
>    CPU detected: AMD,SSE2,AVX,AVX2,FMA3
>    Noise generator: avx-generic (compiler-optimized)
> 
> CPU: AMD Ryzen 2700
> cat /proc/cpuinfo | grep avx :  se4_1 sse4_2 movbe popcnt aes xsave avx ..
>   avx2
>   fma
> 
> So it should work with Intels implementation, but there is an vendor 
> check in :
> 
> platform/x86/cpuid.cpp

Yes - and that is very much deliberate. When AMD provided us with the 
AVX/FMA4 optimized code back in mid-2017, they also did some thorough 
performance testing on a very diverse farm of ~20 AMD and ~25 Intel 
machines, and ended up strongly recommending to specifically *NOT* use 
the AVX2/FMA3 optimized code (which had been provided by Intel years 
earlier) on AMD processors, but rather give preference to the portable 
code, in a variant compiler-optimized for AVX.

That was the recommendation for the Windows builds, anyway, but unless I 
see numbers from extensive and thorough analysis of Linux builds, I 
presume Linux compilers are on a similar level when it comes to 
automatic optimization.

There was even some suspicion that Intel might, back in the days, have 
custom-tailored their optimized code specifically to work poorly on AMD 
machines.


If you're seeing performance improvements with Intel's AVX2/FMA3 
hand-optimized code, then by all means use it; but I would recommend 
that you double-check whether it even does any good at all.


> Solution:
> 
> bool CPUInfo::IsIntel()
> {
>   return gpData->cpuidInfo.vendorId == kCPUVendor_Intel|| kCPUVendor_AMD;
> }

Um... no, that would be broken on multiple levels. For starters, it 
fails to do what you probably intend it to do (it actually makes the 
function always return `true`, even if the vendor is neither Intel nor 
AMD). And even if it worked as you intend, it would break the whole 
purpose of `CPUInfo::IsIntel()` - namely to detect whether the vendor 
*is*, as a matter of fact, *genuine* Intel.

If it should indeed be the case that modern AMD processors also prefer 
Intel's AVX2/FMA3 hand-optimized code, then what we'd really want to 
change is just the matrix in `platform/x86/optimizednoise.cpp`, which 
tells POV-Ray - based on the CPU features (and vendor!) we detect - what 
optimized noise implementation to use.

Post a reply to this message

From: William F Pokorny
Subject: Re: Technical verification build "v3.8.0-beta.668" - Unix package!
Date: 24 Jun 2021 07:53:31
Message: <60d4723b$1@news.povray.org>

On 6/23/21 7:28 PM, clipka wrote:
> Hi folks,
> 
> This one is specifically for the Unix jockeys among you:
> 
> https://github.com/c-lipka/povray/releases/tag/v3.8.0-pre-beta.668
> 
> 
...

Nice Christoph! Done only spot testing thus far, but looking good.

  g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0  (old Intel i3)

I'll try and squeeze in more testing based off the v3.8.0 unix tar ball 
over the coming weeks.

Bill P.

Post a reply to this message

From: William F Pokorny
Subject: Re: Technical verification build "v3.8.0-beta.668" - Unix package!
Date: 24 Jun 2021 09:21:54
Message: <60d486f2$1@news.povray.org>

On 6/23/21 11:20 PM, Thomas Debe wrote:
> 1.) Problem:
> Build povunix-v3.8.9-beta.668.tar.gz Compiler clang-x:
> Optimized Noise-Functions not compiled !
> Clang defines __clang__ Not __GNUC__
> 
> Solution :
> File :  unix/povconfig/syspovconfig.h
> 
> Z 169ff:  #if defined (__clang__)
>              #define HAVE_ASM_AVX
>              #define HAVE_ASM_AVX2
>              #define HAVE_ASM_FMA3
>              #define HAVE_ASM_FMA4
>           #endif
> 
> Z. 179: // most notably platform-specific optimized implementations.
>          #if defined (__GNUC__) || defined(__clang__)

With respect to this, remember POV-Ray is, on unix/linux, a gnu tooling 
targeted build. If using clang, it would be normal to use the 
clang/clang++ flag '-fgnuc-version=5', say if your clang supports all 
the variations. Though using my povr branch, I do get the intel hand 
optimizations on my i3 doing clang compiles.

---
Aside: Long been on my list to look at some of the disable build 
configurations Christoph added when enabling the noise optimizations for 
unix/linux. This a new to v3.8 feature for unix/linux users!

It all works to what I can test without and AMD CPU. For example, to 
turn off all the AMD/Intel hand optimizations you build with the 
compiler flags:

-DDISABLE_OPTIMIZED_NOISE_AVXFMA4 -DDISABLE_OPTIMIZED_NOISE_AVX2FMA3 and 
-DDISABLE_OPTIMIZED_NOISE_AVX

On my i3 which has avx I then get a portable avx instruction optimized 
noise(1). If I want that gone too - so I only get whatever optimizations 
the compiler would give me I add:

-DDISABLE_OPTIMIZED_NOISE_AVX_PORTABLE

Pretty cool. I've never set it up, but these configurations make it 
relatively easy to create a set load modules with the differing 
optimizations. We could, for example, look for more of the optimized 
noise performance outliers - scenes where the hand optimized code looked 
to be somewhat slower.

Bill P.

(1) - For those with avx2, wonder what OPTIMIZED_NOISE_AVX2_PORTABLE 
might offer...

Post a reply to this message

From: Thomas Debe
Subject: Re: Technical verification build "v3.8.0-beta.668" - Unix package!
Date: 27 Jun 2021 08:33:04
Message: <60d87000$1@news.povray.org>

Hello !
Sorry for the delay !

Am 24.06.21 um 13:19 schrieb clipka:
..
> Can you please double-check whether these problems are specific to the 
> Unix source package (povunix-v3.8.0-beta.668.tar.gz), or whether they 
> also occur when building form the "raw" repository source 
> (https://github.com/c-lipka/povray/archive/refs/tags/v3.8.0-pre-beta.668.tar.gz)? 
> 
Yes I can confirm the behavior for clang-10,clang-11 and clang-12.
..
> Yes - and that is very much deliberate. When AMD provided us with the 
> AVX/FMA4 optimized code back in mid-2017, they also did some thorough 
> performance testing on a very diverse farm of ~20 AMD and ~25 Intel 
> machines, and ended up strongly recommending to specifically *NOT* use 
> the AVX2/FMA3 optimized code (which had been provided by Intel years 
> earlier) on AMD processors, but rather give preference to the portable 
> code, in a variant compiler-optimized for AVX.

The real background was probably a bug in the FMA3 implementation of the 
first Ryzen [1].
The Ryzen was the first processor from AMD to support FMA3. And FMA4 is 
not officially supported via the CPU flags.
In german:
[1] 
https://www.heise.de/newsticker/meldung/AMD-bestaetigt-FMA3-Bug-bei-Ryzen-3658407.html

>> Solution:
>>
>> bool CPUInfo::IsIntel()
>> {
>>   return gpData->cpuidInfo.vendorId == kCPUVendor_Intel|| kCPUVendor_AMD;
>> }
> 
> Um... no, that would be broken on multiple levels. For starters, it 
> fails to do what you probably intend it to do (it actually makes the 
> function always return `true`, even if the vendor is neither Intel nor 
> AMD).

Like :
bool CPUInfo::IsIntel()
  {
    return true;
  }
????

regards
Thomas Debe

Post a reply to this message

From: clipka
Subject: Re: Technical verification build "v3.8.0-beta.668" - Unix package!
Date: 27 Jun 2021 19:20:22
Message: <60d907b6$1@news.povray.org>

Am 27.06.2021 um 14:33 schrieb Thomas Debe:
> Hello !
> Sorry for the delay !
> 
> Am 24.06.21 um 13:19 schrieb clipka:
> ..
>> Can you please double-check whether these problems are specific to the 
>> Unix source package (povunix-v3.8.0-beta.668.tar.gz), or whether they 
>> also occur when building form the "raw" repository source 
>> (https://github.com/c-lipka/povray/archive/refs/tags/v3.8.0-pre-beta.668.tar.gz)? 
>>
> Yes I can confirm the behavior for clang-10,clang-11 and clang-12.

"Yes" as in "yes, that's specific to the Unix source package", or as in 
"yes, they also occur when building from the 'raw' repository source"?

It can't be both, and to invesigate the matter it would help to know 
which of the two is the case.

> ..
>> Yes - and that is very much deliberate. When AMD provided us with the 
>> AVX/FMA4 optimized code back in mid-2017, they also did some thorough 
>> performance testing on a very diverse farm of ~20 AMD and ~25 Intel 
>> machines, and ended up strongly recommending to specifically *NOT* use 
>> the AVX2/FMA3 optimized code (which had been provided by Intel years 
>> earlier) on AMD processors, but rather give preference to the portable 
>> code, in a variant compiler-optimized for AVX.
> 
> The real background was probably a bug in the FMA3 implementation of the 
> first Ryzen [1].

No, it was a general lack of performance they also observed for older 
FMA3-enabled CPUs. Unless AMD provided us with falsified data, the 
numbers were quite clear.

> The Ryzen was the first processor from AMD to support FMA3. And FMA4 is 
> not officially supported via the CPU flags.

Ryzen was the first to support FMA4. Which did indeed not survive for 
long, at least not officially.

AMD have had FMA3-capable CPUs in their portfolio since 2012, 6 years 
earlier.

> In german:
> [1] 
>
https://www.heise.de/newsticker/meldung/AMD-bestaetigt-FMA3-Bug-bei-Ryzen-3658407.html


That bug did not manifest in performance issues (as far as I know, 
anyway), but in total CPU lockups. We have no indication that this bug 
was ever triggered by POV-Ray's Intel-optimized noise generator.

Also, I'm rather sure the people we had been dealing with at AMD 
expected the Ryzen to support FMA4, so even if the FMA3 bug had been a 
known issue back then and they therefore wanted to avoid running into it 
in POV-Ray, recommending their AVX/FMA4 optimization would have appeared 
to do the job. There would not have been any need to also discourage 
AVX2/FMA3 on AMD CPUs in general.


>>> Solution:
>>>
>>> bool CPUInfo::IsIntel()
>>> {
>>>   return gpData->cpuidInfo.vendorId == kCPUVendor_Intel|| 
>>> kCPUVendor_AMD;
>>> }
>>
>> Um... no, that would be broken on multiple levels. For starters, it 
>> fails to do what you probably intend it to do (it actually makes the 
>> function always return `true`, even if the vendor is neither Intel nor 
>> AMD).
> 
> Like :
> bool CPUInfo::IsIntel()
>   {
>     return true;
>   }
> ????

Yes, that's what the above code boils down to.

The `==` equality test operator binds stronger than the `||` boolean OR 
operator, and constitutes a boolean expression sometimes evaluating as 
true and sometimes as false. The expression to the right of the `||` is 
just an enum constant though, which is automatically promoted to its 
corresponding int value (which is non-zero). Due to its C heritage, C++ 
allows that int value to be used as a boolean, in which case any value 
other than zero is interpreted as "true".

So effectively you have

   return
     (gpData->cpuidInfo.vendorId == kCPUVendor_Intel) ||
     (kCPUVendor_AMD != 0);


Even if you were to put parentheses around the kCPUVendor* codes, it 
would not do what you'd expect:

   return
     gpData->cpuidInfo.vendorId ==
       (kCPUVendor_Intel || kCPUVendor_AMD);

is _not_ asking whether vendorId is any one of these values, but rather 
whether it is equal to the integer representation of the boolean OR of 
the boolean interpretation of the two enum constants' integer IDs:

   return
     gpData->cpuidInfo.vendorId == (
       ( (kCPUVendor_Intel != 0) || (kCPUVendor_AMD != 0) ) ? 1 : 0
     );

So if at least one of the enum constants of kCPUVendor_Intel or 
kCPUVendor_AMD happens to have a non-zero associated int value (which is 
indeed the case), then the function returns true if the vendor ID is 1. 
If on the other hand both enum constants would have an associated int 
value of 0, the function would return true if the vendor ID was 0.


In C++, what you presumably mean would have to be written as:

   return
     ( gpData->cpuidInfo.vendorId == kCPUVendor_Intel ) ||
     ( gpData->cpuidInfo.vendorId == kCPUVendor_AMD );

Post a reply to this message

From: Thomas Debe
Subject: Re: Technical verification build "v3.8.0-beta.668" - Unix package!
Date: 29 Jun 2021 04:15:40
Message: <60dad6ac$1@news.povray.org>

Hello !

>> Yes I can confirm the behavior for clang-10,clang-11 and clang-12.
> 
> "Yes" as in "yes, that's specific to the Unix source package", or as in 
> "yes, they also occur when building from the 'raw' repository source"?
> 
> It can't be both, and to invesigate the matter it would help to know 
> which of the two is the case.

I am sorry, but both packages are affected.

MD5SUM:
8e2b067662f6543885e65b71e4ef1377  povunix-v3.8.0-beta.668.tar.gz
8509e003a48ee55d0068c17064186269  povray-3.8.0-pre-beta.668.tar.gz

povunix:
Compilation settings:
   Build architecture:  x86_64-pc-linux-gnu
   Built/Optimized for: x86_64-pc-linux-gnu
   Compiler vendor:     gnu
   Compiler version:    clang++-10 10.0.1
   Compiler flags:      -pipe -Wno-multichar -Wno-write-strings 
-march=znver2 -mtune=znver2 -O2 -mfma -mavx -mavx2 -pthread
   Libraries:           -lSDL -L/usr/lib64 -lSDL -lpthread -lXpm  -lSM 
-lICE -lX11  -lIlmImf -lIlmImf-2_5 -lImath-2_5 -lHalf-2_5 -lIex-2_5 
-lIexMath-2_5 -lIlmThread-2_5 -pthread  -lIlmThread -ltiff -ljpeg -lpng 
-lz -lrt -lm -lboost_thread -lboost_system -lamdlibm -lm -pthread

No output for optimized noise.

Now with additional CXX-Flag -fgnuc-version=5 as Bill P. suggested.

Compilation settings:
   Build architecture:  x86_64-pc-linux-gnu
   Built/Optimized for: x86_64-pc-linux-gnu
   Compiler vendor:     gnu
   Compiler version:    clang++-10 10.0.1
   Compiler flags:      -pipe -Wno-multichar -Wno-write-strings 
-march=znver2 -mtune=znver2 -O2 -fgnuc-version=5 -mfma -mavx -mavx2 -pthread
   Libraries:           -lSDL -L/usr/lib64 -lSDL -lpthread -lXpm  -lSM 
-lICE -lX11  -lIlmImf -lIlmImf-2_5 -lImath-2_5 -lHalf-2_5 -lIex-2_5 
-lIexMath-2_5 -lIlmThread-2_5 -pthread  -lIlmThread -ltiff -ljpeg -lpng 
-lz -lrt -lm -lboost_thread -lboost_system -lamdlibm -lm -pthread

Output:


Dynamic optimizations:
   CPU detected: AMD,SSE2,AVX,AVX2,FMA3
   Noise generator: avx-generic (compiler-optimized)

This -fgnuc version=5 causes __GNUC__ to be defined by the clang 
compiler. The query of the compiler
is done in unix/povconfig/syspovconfig.h.

 From the clang documentation:

-fgnuc-version=

     This flag controls the value of __GNUC__ and related macros. This 
flag does not enable or disable any GCC extensions implemented in Clang. 
Setting the version to zero causes Clang to leave __GNUC__ and other 
GNU-namespaced macros, such as __GXX_WEAK__, undefined.

With the background of

http://news.povray.org/povray.unix/thread/%3C5b92aa0d%40news.povray.org%3E/

building POV-Ray with clang instead of gcc

I would recommend to include clang in unix/povconfig/syspovconfig.h. 
Clang has the ability to handle this since version 3.1.

I don't only compile but also test the runtime behavior of povray. In 
this context I noticed with clang builds of povray that the compiler 
flag -ffast-math has a degrading runtime behavior (more than four times 
longer runtime). This is not caused by the noise code and can't be 
explained at first, -Ofast has the same behavior.

regards
Thomas Debe

Post a reply to this message

From: William F Pokorny
Subject: Re: Technical verification build "v3.8.0-beta.668" - Unix package!
Date: 1 Jul 2021 07:42:01
Message: <60ddaa09@news.povray.org>

On 6/29/21 4:15 AM, Thomas Debe wrote:
> I don't only compile but also test the runtime behavior of povray. In 
> this context I noticed with clang builds of povray that the compiler 
> flag -ffast-math has a degrading runtime behavior (more than four times 
> longer runtime). This is not caused by the noise code and can't be 
> explained at first, -Ofast has the same behavior.

What scene(s) were you running when you saw the performance degrade? 
I've not seen this behavior - but I compile with clang more than I run 
with clang compiled code.

FWIW. With respect to -Ofast and g++, I've seen inconsistent results. 
Compiles are more often than not slower with the flag than without. 
Never swings in performance as large as 4x though.

As for the 4x difference, a wild guess is clang is handling some 
'-fast-math' allowed, undefined behavior, differently than gcc.

Bill P.

Post a reply to this message

From: Thomas Debe
Subject: Re: Technical verification build "v3.8.0-beta.668" - Unix package!
Date: 2 Jul 2021 09:01:35
Message: <60df0e2f$1@news.povray.org>

Hello !
After a
make && make check
I call unix/povray --benchmark
call
The scene renders not only slowly (rendered pixels) but also stalls.
I also just wanted to point out, because clang is available for Linux, 
Windows and is system compiler on FreeBSD and MacOS X.

regards
Thomas Debe

Post a reply to this message