For the past month plus I've been re-working the linux/unix build system
for the povr branch. Thought I'd pass along performance results I
thought interesting. All with gnu tooling. Ubuntu 20.04.
./configure -q COMPILED_BY="wfp_povr2" CXXFLAGS="-std=c++17 -O3
-ffast-math -march=native" LDFLAGS="-s" --with-static-link=
"-static -Wl,--whole-archive -lpthread -Wl,--no-whole-archive"
--with-x="no" --with-libtiff="no" --with-openexr="no"
Everything run for data below without x11, sdl, sdl2, tiff or openexr
due static link limitations. Upfront the limitations being no archives
shipped with development libraries. A lot of it looks to be license
driven - libzstd now part of tiff. It's got a BSD license which cannot
be statically linked without extra license text output from any binary
statically linking it. Ubuntu doesn't provide the .a archive as part of
the package install I think due this sort of license. Maybe other
reasons - anyway, more often, no archives to be found unless you build
the library yourself.
Interesting to me -march=native and static linking matter less these
days, but matter more with link time optimization (lto). I'm developing
the feeling that in pushing more optimization into the link phase,
perhaps some work is only being done now with -flto?
If one wants/needs the preview window using just -lto the way to go,
otherwise static and lto - unless you need tiff or openexr support.
The profile guided compiles are impressively faster - and static links
matter a good deal more with these. I've only played with one of the two
With it, you do a compile to instrument POV-Ray. You run a
representative render over about 10 minutes (the render runs 10x or so
slower than normal). This generates a bunch of profile data - a file for
each raw object file. You then do a second compile using the profile
data which helps the compiler optimize.
With something like POV-Ray, running enough stuff to get representative
profiling data for all the options would require investment. I wonder
too if in achieving coverage, some of the gains are forfeited for any
particular feature-set/render? I think for those doing animations, the
profile guided compiles are already a pretty good fit as the feature-set
is likely to be more or less the same.
Anyway, data below. Oh, the numbers like 3900832 are the executable
sizes in bytes.
(no lto, static, (no)-march=native) 3917216 ( +1.71% )
332.14user 1.96system 1:25.29elapsed
(no lto or static) 3900832 <--- v3.8 like build.
326.57user 1.88system 1:23.88elapsed
lto only 3479880 ( -4.42% )
312.12user 2.08system 1:20.29elapsed
static only 7260344 ( -0.29% )
325.62user 1.90system 1:23.64elapsed
lto and static 6868024 ( -5.29% ) ( lto only -> both -0.90% )
309.31user 1.61system 1:19.49elapsed
lto, static, Ofast 6863928 ( +0.72 ... )
311.54user 1.51system 1:20.05elapsed
static _w profguided 6244440 ( -14.98% ) ( static only -> -14.73%)
277.65user 1.58system 1:11.89elapsed
lto, static _w profguided 5884984 ( -19.64% )
262.42user 2.18system 1:07.92elapsed
Post a reply to this message