|
|
|
|
|
|
| |
| |
|
|
From: clipka
Subject: Technical verification build "v3.8.0-beta.668" - Unix package!
Date: 23 Jun 2021 19:28:39
Message: <60d3c3a7@news.povray.org>
|
|
|
| |
| |
|
|
Hi folks,
This one is specifically for the Unix jockeys among you:
https://github.com/c-lipka/povray/releases/tag/v3.8.0-pre-beta.668
I'd like to point your attention specifically to the
`povunix-v3.8.9-beta.668.tar.gz` tarball, which contains a dedicated
Unix source package with the following (alleged) features:
- a much leaner package (25 MB instead of 63 MB)
- no need to run that pesky `prebuild.sh`
- all the files in places where Linux gurus expect it
Please give it a spin to see if it does what it's meant to do.
There's also an auto-generated source documentaion in both HTML and PDF
format (povray-*-sourcedoc-html.7z and povray-*-sourcedoc.pdf,
respectively), in case you're curious.
(And of course a Windows installer, but there shouldn't be anything new
to that.)
Have fun testing!
Christoph
Post a reply to this message
|
|
| |
| |
|
|
From: Thomas Debe
Subject: Re: Technical verification build "v3.8.0-beta.668" - Unix package!
Date: 23 Jun 2021 23:20:37
Message: <60d3fa05$1@news.povray.org>
|
|
|
| |
| |
|
|
Am 24.06.21 um 01:28 schrieb clipka:
Hallo Christoph !
1.) Problem:
Build povunix-v3.8.9-beta.668.tar.gz Compiler clang-x:
Optimized Noise-Functions not compiled !
Clang defines __clang__ Not __GNUC__
Solution :
File : unix/povconfig/syspovconfig.h
Z 169ff: #if defined (__clang__)
#define HAVE_ASM_AVX
#define HAVE_ASM_AVX2
#define HAVE_ASM_FMA3
#define HAVE_ASM_FMA4
#endif
Z. 179: // most notably platform-specific optimized implementations.
#if defined (__GNUC__) || defined(__clang__)
2.) Problem Optimized Noise-Function Compiler: gcc-10
Povray-Output :
....
Dynamic optimizations:
CPU detected: AMD,SSE2,AVX,AVX2,FMA3
Noise generator: avx-generic (compiler-optimized)
CPU: AMD Ryzen 2700
cat /proc/cpuinfo | grep avx : se4_1 sse4_2 movbe popcnt aes xsave avx ..
avx2
fma
So it should work with Intels implementation, but there is an vendor
check in :
platform/x86/cpuid.cpp
Solution:
bool CPUInfo::IsIntel()
{
return gpData->cpuidInfo.vendorId == kCPUVendor_Intel|| kCPUVendor_AMD;
}
I activated CHECK_FUNCTIONAL in the optimized functions. No exception
arises. So I think it's ok.
Greetings
Thomas Debe
Post a reply to this message
|
|
| |
| |
|
|
From: clipka
Subject: Re: Technical verification build "v3.8.0-beta.668" - Unix package!
Date: 24 Jun 2021 07:19:37
Message: <60d46a49$1@news.povray.org>
|
|
|
| |
| |
|
|
Am 24.06.2021 um 05:20 schrieb Thomas Debe:
> Am 24.06.21 um 01:28 schrieb clipka:
> Hallo Christoph !
>
> 1.) Problem:
> Build povunix-v3.8.9-beta.668.tar.gz Compiler clang-x:
> Optimized Noise-Functions not compiled !
> Clang defines __clang__ Not __GNUC__
(I presume that's a typo and you mean povunix-v3.8.0-beta.668.tar.gz.)
Can you please double-check whether these problems are specific to the
Unix source package (povunix-v3.8.0-beta.668.tar.gz), or whether they
also occur when building form the "raw" repository source
(https://github.com/c-lipka/povray/archive/refs/tags/v3.8.0-pre-beta.668.tar.gz)?
>
> Solution :
> File : unix/povconfig/syspovconfig.h
>
> Z 169ff: #if defined (__clang__)
> #define HAVE_ASM_AVX
> #define HAVE_ASM_AVX2
> #define HAVE_ASM_FMA3
> #define HAVE_ASM_FMA4
> #endif
>
> Z. 179: // most notably platform-specific optimized implementations.
> #if defined (__GNUC__) || defined(__clang__)
>
> 2.) Problem Optimized Noise-Function Compiler: gcc-10
> Povray-Output :
> ....
>
> Dynamic optimizations:
> CPU detected: AMD,SSE2,AVX,AVX2,FMA3
> Noise generator: avx-generic (compiler-optimized)
>
> CPU: AMD Ryzen 2700
> cat /proc/cpuinfo | grep avx : se4_1 sse4_2 movbe popcnt aes xsave avx ..
> avx2
> fma
>
> So it should work with Intels implementation, but there is an vendor
> check in :
>
> platform/x86/cpuid.cpp
Yes - and that is very much deliberate. When AMD provided us with the
AVX/FMA4 optimized code back in mid-2017, they also did some thorough
performance testing on a very diverse farm of ~20 AMD and ~25 Intel
machines, and ended up strongly recommending to specifically *NOT* use
the AVX2/FMA3 optimized code (which had been provided by Intel years
earlier) on AMD processors, but rather give preference to the portable
code, in a variant compiler-optimized for AVX.
That was the recommendation for the Windows builds, anyway, but unless I
see numbers from extensive and thorough analysis of Linux builds, I
presume Linux compilers are on a similar level when it comes to
automatic optimization.
There was even some suspicion that Intel might, back in the days, have
custom-tailored their optimized code specifically to work poorly on AMD
machines.
If you're seeing performance improvements with Intel's AVX2/FMA3
hand-optimized code, then by all means use it; but I would recommend
that you double-check whether it even does any good at all.
> Solution:
>
> bool CPUInfo::IsIntel()
> {
> return gpData->cpuidInfo.vendorId == kCPUVendor_Intel|| kCPUVendor_AMD;
> }
Um... no, that would be broken on multiple levels. For starters, it
fails to do what you probably intend it to do (it actually makes the
function always return `true`, even if the vendor is neither Intel nor
AMD). And even if it worked as you intend, it would break the whole
purpose of `CPUInfo::IsIntel()` - namely to detect whether the vendor
*is*, as a matter of fact, *genuine* Intel.
If it should indeed be the case that modern AMD processors also prefer
Intel's AVX2/FMA3 hand-optimized code, then what we'd really want to
change is just the matrix in `platform/x86/optimizednoise.cpp`, which
tells POV-Ray - based on the CPU features (and vendor!) we detect - what
optimized noise implementation to use.
Post a reply to this message
|
|
| |
| |
|
|
From: William F Pokorny
Subject: Re: Technical verification build "v3.8.0-beta.668" - Unix package!
Date: 24 Jun 2021 07:53:31
Message: <60d4723b$1@news.povray.org>
|
|
|
| |
| |
|
|
On 6/23/21 7:28 PM, clipka wrote:
> Hi folks,
>
> This one is specifically for the Unix jockeys among you:
>
> https://github.com/c-lipka/povray/releases/tag/v3.8.0-pre-beta.668
>
>
...
Nice Christoph! Done only spot testing thus far, but looking good.
g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 (old Intel i3)
I'll try and squeeze in more testing based off the v3.8.0 unix tar ball
over the coming weeks.
Bill P.
Post a reply to this message
|
|
| |
| |
|
|
From: William F Pokorny
Subject: Re: Technical verification build "v3.8.0-beta.668" - Unix package!
Date: 24 Jun 2021 09:21:54
Message: <60d486f2$1@news.povray.org>
|
|
|
| |
| |
|
|
On 6/23/21 11:20 PM, Thomas Debe wrote:
> 1.) Problem:
> Build povunix-v3.8.9-beta.668.tar.gz Compiler clang-x:
> Optimized Noise-Functions not compiled !
> Clang defines __clang__ Not __GNUC__
>
> Solution :
> File : unix/povconfig/syspovconfig.h
>
> Z 169ff: #if defined (__clang__)
> #define HAVE_ASM_AVX
> #define HAVE_ASM_AVX2
> #define HAVE_ASM_FMA3
> #define HAVE_ASM_FMA4
> #endif
>
> Z. 179: // most notably platform-specific optimized implementations.
> #if defined (__GNUC__) || defined(__clang__)
With respect to this, remember POV-Ray is, on unix/linux, a gnu tooling
targeted build. If using clang, it would be normal to use the
clang/clang++ flag '-fgnuc-version=5', say if your clang supports all
the variations. Though using my povr branch, I do get the intel hand
optimizations on my i3 doing clang compiles.
---
Aside: Long been on my list to look at some of the disable build
configurations Christoph added when enabling the noise optimizations for
unix/linux. This a new to v3.8 feature for unix/linux users!
It all works to what I can test without and AMD CPU. For example, to
turn off all the AMD/Intel hand optimizations you build with the
compiler flags:
-DDISABLE_OPTIMIZED_NOISE_AVXFMA4 -DDISABLE_OPTIMIZED_NOISE_AVX2FMA3 and
-DDISABLE_OPTIMIZED_NOISE_AVX
On my i3 which has avx I then get a portable avx instruction optimized
noise(1). If I want that gone too - so I only get whatever optimizations
the compiler would give me I add:
-DDISABLE_OPTIMIZED_NOISE_AVX_PORTABLE
Pretty cool. I've never set it up, but these configurations make it
relatively easy to create a set load modules with the differing
optimizations. We could, for example, look for more of the optimized
noise performance outliers - scenes where the hand optimized code looked
to be somewhat slower.
Bill P.
(1) - For those with avx2, wonder what OPTIMIZED_NOISE_AVX2_PORTABLE
might offer...
Post a reply to this message
|
|
| |
| |
|
|
From: Thomas Debe
Subject: Re: Technical verification build "v3.8.0-beta.668" - Unix package!
Date: 27 Jun 2021 08:33:04
Message: <60d87000$1@news.povray.org>
|
|
|
| |
| |
|
|
Hello !
Sorry for the delay !
Am 24.06.21 um 13:19 schrieb clipka:
..
> Can you please double-check whether these problems are specific to the
> Unix source package (povunix-v3.8.0-beta.668.tar.gz), or whether they
> also occur when building form the "raw" repository source
> (https://github.com/c-lipka/povray/archive/refs/tags/v3.8.0-pre-beta.668.tar.gz)?
>
Yes I can confirm the behavior for clang-10,clang-11 and clang-12.
..
> Yes - and that is very much deliberate. When AMD provided us with the
> AVX/FMA4 optimized code back in mid-2017, they also did some thorough
> performance testing on a very diverse farm of ~20 AMD and ~25 Intel
> machines, and ended up strongly recommending to specifically *NOT* use
> the AVX2/FMA3 optimized code (which had been provided by Intel years
> earlier) on AMD processors, but rather give preference to the portable
> code, in a variant compiler-optimized for AVX.
The real background was probably a bug in the FMA3 implementation of the
first Ryzen [1].
The Ryzen was the first processor from AMD to support FMA3. And FMA4 is
not officially supported via the CPU flags.
In german:
[1]
https://www.heise.de/newsticker/meldung/AMD-bestaetigt-FMA3-Bug-bei-Ryzen-3658407.html
>> Solution:
>>
>> bool CPUInfo::IsIntel()
>> {
>> return gpData->cpuidInfo.vendorId == kCPUVendor_Intel|| kCPUVendor_AMD;
>> }
>
> Um... no, that would be broken on multiple levels. For starters, it
> fails to do what you probably intend it to do (it actually makes the
> function always return `true`, even if the vendor is neither Intel nor
> AMD).
Like :
bool CPUInfo::IsIntel()
{
return true;
}
????
regards
Thomas Debe
Post a reply to this message
|
|
| |
| |
|
|
From: clipka
Subject: Re: Technical verification build "v3.8.0-beta.668" - Unix package!
Date: 27 Jun 2021 19:20:22
Message: <60d907b6$1@news.povray.org>
|
|
|
| |
| |
|
|
Am 27.06.2021 um 14:33 schrieb Thomas Debe:
> Hello !
> Sorry for the delay !
>
> Am 24.06.21 um 13:19 schrieb clipka:
> ..
>> Can you please double-check whether these problems are specific to the
>> Unix source package (povunix-v3.8.0-beta.668.tar.gz), or whether they
>> also occur when building form the "raw" repository source
>> (https://github.com/c-lipka/povray/archive/refs/tags/v3.8.0-pre-beta.668.tar.gz)?
>>
> Yes I can confirm the behavior for clang-10,clang-11 and clang-12.
"Yes" as in "yes, that's specific to the Unix source package", or as in
"yes, they also occur when building from the 'raw' repository source"?
It can't be both, and to invesigate the matter it would help to know
which of the two is the case.
> ..
>> Yes - and that is very much deliberate. When AMD provided us with the
>> AVX/FMA4 optimized code back in mid-2017, they also did some thorough
>> performance testing on a very diverse farm of ~20 AMD and ~25 Intel
>> machines, and ended up strongly recommending to specifically *NOT* use
>> the AVX2/FMA3 optimized code (which had been provided by Intel years
>> earlier) on AMD processors, but rather give preference to the portable
>> code, in a variant compiler-optimized for AVX.
>
> The real background was probably a bug in the FMA3 implementation of the
> first Ryzen [1].
No, it was a general lack of performance they also observed for older
FMA3-enabled CPUs. Unless AMD provided us with falsified data, the
numbers were quite clear.
> The Ryzen was the first processor from AMD to support FMA3. And FMA4 is
> not officially supported via the CPU flags.
Ryzen was the first to support FMA4. Which did indeed not survive for
long, at least not officially.
AMD have had FMA3-capable CPUs in their portfolio since 2012, 6 years
earlier.
> In german:
> [1]
>
https://www.heise.de/newsticker/meldung/AMD-bestaetigt-FMA3-Bug-bei-Ryzen-3658407.html
That bug did not manifest in performance issues (as far as I know,
anyway), but in total CPU lockups. We have no indication that this bug
was ever triggered by POV-Ray's Intel-optimized noise generator.
Also, I'm rather sure the people we had been dealing with at AMD
expected the Ryzen to support FMA4, so even if the FMA3 bug had been a
known issue back then and they therefore wanted to avoid running into it
in POV-Ray, recommending their AVX/FMA4 optimization would have appeared
to do the job. There would not have been any need to also discourage
AVX2/FMA3 on AMD CPUs in general.
>>> Solution:
>>>
>>> bool CPUInfo::IsIntel()
>>> {
>>> return gpData->cpuidInfo.vendorId == kCPUVendor_Intel||
>>> kCPUVendor_AMD;
>>> }
>>
>> Um... no, that would be broken on multiple levels. For starters, it
>> fails to do what you probably intend it to do (it actually makes the
>> function always return `true`, even if the vendor is neither Intel nor
>> AMD).
>
> Like :
> bool CPUInfo::IsIntel()
> {
> return true;
> }
> ????
Yes, that's what the above code boils down to.
The `==` equality test operator binds stronger than the `||` boolean OR
operator, and constitutes a boolean expression sometimes evaluating as
true and sometimes as false. The expression to the right of the `||` is
just an enum constant though, which is automatically promoted to its
corresponding int value (which is non-zero). Due to its C heritage, C++
allows that int value to be used as a boolean, in which case any value
other than zero is interpreted as "true".
So effectively you have
return
(gpData->cpuidInfo.vendorId == kCPUVendor_Intel) ||
(kCPUVendor_AMD != 0);
Even if you were to put parentheses around the kCPUVendor* codes, it
would not do what you'd expect:
return
gpData->cpuidInfo.vendorId ==
(kCPUVendor_Intel || kCPUVendor_AMD);
is _not_ asking whether vendorId is any one of these values, but rather
whether it is equal to the integer representation of the boolean OR of
the boolean interpretation of the two enum constants' integer IDs:
return
gpData->cpuidInfo.vendorId == (
( (kCPUVendor_Intel != 0) || (kCPUVendor_AMD != 0) ) ? 1 : 0
);
So if at least one of the enum constants of kCPUVendor_Intel or
kCPUVendor_AMD happens to have a non-zero associated int value (which is
indeed the case), then the function returns true if the vendor ID is 1.
If on the other hand both enum constants would have an associated int
value of 0, the function would return true if the vendor ID was 0.
In C++, what you presumably mean would have to be written as:
return
( gpData->cpuidInfo.vendorId == kCPUVendor_Intel ) ||
( gpData->cpuidInfo.vendorId == kCPUVendor_AMD );
Post a reply to this message
|
|
| |
| |
|
|
From: Thomas Debe
Subject: Re: Technical verification build "v3.8.0-beta.668" - Unix package!
Date: 29 Jun 2021 04:15:40
Message: <60dad6ac$1@news.povray.org>
|
|
|
| |
| |
|
|
Hello !
>> Yes I can confirm the behavior for clang-10,clang-11 and clang-12.
>
> "Yes" as in "yes, that's specific to the Unix source package", or as in
> "yes, they also occur when building from the 'raw' repository source"?
>
> It can't be both, and to invesigate the matter it would help to know
> which of the two is the case.
I am sorry, but both packages are affected.
MD5SUM:
8e2b067662f6543885e65b71e4ef1377 povunix-v3.8.0-beta.668.tar.gz
8509e003a48ee55d0068c17064186269 povray-3.8.0-pre-beta.668.tar.gz
povunix:
Compilation settings:
Build architecture: x86_64-pc-linux-gnu
Built/Optimized for: x86_64-pc-linux-gnu
Compiler vendor: gnu
Compiler version: clang++-10 10.0.1
Compiler flags: -pipe -Wno-multichar -Wno-write-strings
-march=znver2 -mtune=znver2 -O2 -mfma -mavx -mavx2 -pthread
Libraries: -lSDL -L/usr/lib64 -lSDL -lpthread -lXpm -lSM
-lICE -lX11 -lIlmImf -lIlmImf-2_5 -lImath-2_5 -lHalf-2_5 -lIex-2_5
-lIexMath-2_5 -lIlmThread-2_5 -pthread -lIlmThread -ltiff -ljpeg -lpng
-lz -lrt -lm -lboost_thread -lboost_system -lamdlibm -lm -pthread
No output for optimized noise.
Now with additional CXX-Flag -fgnuc-version=5 as Bill P. suggested.
Compilation settings:
Build architecture: x86_64-pc-linux-gnu
Built/Optimized for: x86_64-pc-linux-gnu
Compiler vendor: gnu
Compiler version: clang++-10 10.0.1
Compiler flags: -pipe -Wno-multichar -Wno-write-strings
-march=znver2 -mtune=znver2 -O2 -fgnuc-version=5 -mfma -mavx -mavx2 -pthread
Libraries: -lSDL -L/usr/lib64 -lSDL -lpthread -lXpm -lSM
-lICE -lX11 -lIlmImf -lIlmImf-2_5 -lImath-2_5 -lHalf-2_5 -lIex-2_5
-lIexMath-2_5 -lIlmThread-2_5 -pthread -lIlmThread -ltiff -ljpeg -lpng
-lz -lrt -lm -lboost_thread -lboost_system -lamdlibm -lm -pthread
Output:
Dynamic optimizations:
CPU detected: AMD,SSE2,AVX,AVX2,FMA3
Noise generator: avx-generic (compiler-optimized)
This -fgnuc version=5 causes __GNUC__ to be defined by the clang
compiler. The query of the compiler
is done in unix/povconfig/syspovconfig.h.
From the clang documentation:
-fgnuc-version=
This flag controls the value of __GNUC__ and related macros. This
flag does not enable or disable any GCC extensions implemented in Clang.
Setting the version to zero causes Clang to leave __GNUC__ and other
GNU-namespaced macros, such as __GXX_WEAK__, undefined.
With the background of
http://news.povray.org/povray.unix/thread/%3C5b92aa0d%40news.povray.org%3E/
building POV-Ray with clang instead of gcc
I would recommend to include clang in unix/povconfig/syspovconfig.h.
Clang has the ability to handle this since version 3.1.
I don't only compile but also test the runtime behavior of povray. In
this context I noticed with clang builds of povray that the compiler
flag -ffast-math has a degrading runtime behavior (more than four times
longer runtime). This is not caused by the noise code and can't be
explained at first, -Ofast has the same behavior.
regards
Thomas Debe
Post a reply to this message
|
|
| |
| |
|
|
From: William F Pokorny
Subject: Re: Technical verification build "v3.8.0-beta.668" - Unix package!
Date: 1 Jul 2021 07:42:01
Message: <60ddaa09@news.povray.org>
|
|
|
| |
| |
|
|
On 6/29/21 4:15 AM, Thomas Debe wrote:
> I don't only compile but also test the runtime behavior of povray. In
> this context I noticed with clang builds of povray that the compiler
> flag -ffast-math has a degrading runtime behavior (more than four times
> longer runtime). This is not caused by the noise code and can't be
> explained at first, -Ofast has the same behavior.
What scene(s) were you running when you saw the performance degrade?
I've not seen this behavior - but I compile with clang more than I run
with clang compiled code.
FWIW. With respect to -Ofast and g++, I've seen inconsistent results.
Compiles are more often than not slower with the flag than without.
Never swings in performance as large as 4x though.
As for the 4x difference, a wild guess is clang is handling some
'-fast-math' allowed, undefined behavior, differently than gcc.
Bill P.
Post a reply to this message
|
|
| |
| |
|
|
From: Thomas Debe
Subject: Re: Technical verification build "v3.8.0-beta.668" - Unix package!
Date: 2 Jul 2021 09:01:35
Message: <60df0e2f$1@news.povray.org>
|
|
|
| |
| |
|
|
Hello !
After a
make && make check
I call unix/povray --benchmark
call
The scene renders not only slowly (rendered pixels) but also stalls.
I also just wanted to point out, because clang is available for Linux,
Windows and is system compiler on FreeBSD and MacOS X.
regards
Thomas Debe
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|