POV-Ray : Newsgroups : povray.unix : Performance data. Using povr branch updated build system. : Performance data. Using povr branch updated build system. Server Time
14 Sep 2024 03:19:25 EDT (-0400)
  Performance data. Using povr branch updated build system.  
From: William F Pokorny
Date: 1 Feb 2021 09:46:36
Message: <6018144c@news.povray.org>
For the past month plus I've been re-working the linux/unix build system 
for the povr branch. Thought I'd pass along performance results I 
thought interesting. All with gnu tooling. Ubuntu 20.04.

./configure -q COMPILED_BY="wfp_povr2" CXXFLAGS="-std=c++17 -O3
-ffast-math -march=native" LDFLAGS="-s" --with-static-link=
"-static -Wl,--whole-archive -lpthread -Wl,--no-whole-archive"
--with-libsdl="no" --with-libsdl2="no"
--with-x="no" --with-libtiff="no" --with-openexr="no"
NON_REDISTRIBUTABLE_BUILD=yes --enable-lto

Everything run for data below without x11, sdl, sdl2, tiff or openexr 
due static link limitations. Upfront the limitations being no archives 
shipped with development libraries. A lot of it looks to be license 
driven - libzstd now part of tiff. It's got a BSD license which cannot 
be statically linked without extra license text output from any binary 
statically linking it. Ubuntu doesn't provide the .a archive as part of 
the package install I think due this sort of license. Maybe other 
reasons - anyway, more often, no archives to be found unless you build 
the library yourself.

Interesting to me -march=native and static linking matter less these 
days, but matter more with link time optimization (lto). I'm developing 
the feeling that in pushing more optimization into the link phase, 
perhaps some work is only being done now with -flto?

If one wants/needs the preview window using just -lto the way to go, 
otherwise static and lto - unless you need tiff or openexr support.

The profile guided compiles are impressively faster - and static links 
matter a good deal more with these. I've only played with one of the two 
main methods.

With it, you do a compile to instrument POV-Ray. You run a 
representative render over about 10 minutes (the render runs 10x or so 
slower than normal). This generates a bunch of profile data - a file for 
each raw object file. You then do a second compile using the profile 
data which helps the compiler optimize.

With something like POV-Ray, running enough stuff to get representative 
profiling data for all the options would require investment. I wonder 
too if in achieving coverage, some of the gains are forfeited for any 
particular feature-set/render? I think for those doing animations, the 
profile guided compiles are already a pretty good fit as the feature-set 
is likely to be more or less the same.

Anyway, data below. Oh, the numbers like 3900832 are the executable 
sizes in bytes.

Bill P.



Data
----------------------------------------------------

(no lto, static, (no)-march=native)  3917216  ( +1.71% )
---
332.14user 1.96system 1:25.29elapsed


(no lto or static) 3900832   <--- v3.8 like build.
---
326.57user 1.88system 1:23.88elapsed


lto only           3479880   ( -4.42% )
---
312.12user 2.08system 1:20.29elapsed


static only        7260344   ( -0.29% )
---
325.62user 1.90system 1:23.64elapsed


lto and static     6868024   ( -5.29% )  ( lto only -> both -0.90% )
---
309.31user 1.61system 1:19.49elapsed


lto, static, Ofast 6863928               ( +0.72 ... )
---
311.54user 1.51system 1:20.05elapsed


static _w profguided  6244440  ( -14.98% )  ( static only -> -14.73%)
---
277.65user 1.58system 1:11.89elapsed


lto, static _w profguided 5884984  ( -19.64% )
---
262.42user 2.18system 1:07.92elapsed


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.