POV-Ray: Newsgroups: povray.beta-test: v3.8 Clean up TODOs. f_superellipsoid() / shadow cache.: v3.8 Clean up TODOs. f

POV-Ray : Newsgroups : povray.beta-test : v3.8 Clean up TODOs. f_superellipsoid() / shadow cache. : v3.8 Clean up TODOs. f_superellipsoid() / shadow cache.		Server Time 20 Apr 2024 04:03:10 EDT (-0400)

From: William F Pokorny
Date: 15 Apr 2020 12:55:44
Message: <5e973c90$1@news.povray.org>

In my povr branch I'm flipping the shape polarity of f_superellipsoid 
from positive to negative so it's a more standard implementation for a 
function.

On seeing the code many could be constants, I asked myself how fast can 
we go if we C++ compiled for a particular superellipsoid. The answer is 
a lot faster! See result (i) below. But yeah, not that realistic as a 
general approach as it's basically inlining enough of the function call 
to enable some compiler optimizations it looks.

I then didn't leave well enough alone and proceeded down the rabbit 
hole. The summary of this journey with respect to v38 is basically.

(1) Adopt the shadow cache fixes in my solver branch and discussed in 
github pull request #358. Single, expensive shapes like this especially 
those shadow cache fixes speed things up. See (a -> c) and (b -> h). 20+ 
% speed up from v38 master.

(2) For reasons I do not understand it looks like coding 
f_superellipsoid as a macro is a lot faster than any inbuilt method. See 
(a -> b) and (c -> h). It works so do it in v38 as at least an option to 
normal function calls. 24.5% speed up. (Anyone have a thought as to why? 
We are passing fewer parameters, but that seems a stretch to explain the 
bulk of the difference. I've not dug.)

(3) Sort of a general educational / learning thing - and perhaps a place 
where v38 could be better. I often code equations in the function call 
parameter positions. Where these can be SDL declare/local constants, do 
the latter as it's a lot faster. See (d -> c) 12% speed up. Here it 
should be the parser could fix the values itself on the calls with 
enough smarts, but it doesn't and so the function / vm looks to be doing 
these evaluations on each call from the isosurface.

Some reference code / comments below.

Bill P.

//---
      #declare EW = 0.5;
#declare NS = 0.5;
#declare P2 = (2.0/EW);
#declare P3 = EW*(1.0/NS);
#declare P4 = 2*(1/NS);
#declare P5 = (NS*0.5);
#declare Fn00 = function {
     -1+pow((pow((pow(abs(x),P2)
                 +pow(abs(y),P2)),P3)
                 +pow(abs(z),P4)),P5)
}
#declare Iso99 = isosurface {

//----- v38 master
//a) function { -f_superellipsoid(x,y,z,0.5,0.5) }
     //a) 268.56s, 274.586s  Extra negation? No.
     //a) Primarily one of my shadow cache fixes.
     //a) (double root evaluations)

//b) function { Fn00(x,y,z) }
     //b)  206.44s, 210.211s  // In v38 master too, a
     //b) macro would be better.

//------ povr
//c) function { f_superellipsoid(x,y,z,0,0.5,0.5,0,0) }
     //c 207.772s  // See above. Better -22% from master.

//d) function { f_superellipsoid(x,y,z,1,(2.0/EW),
//             EW*(1.0/NS),2*(1/NS),(NS*0.5)) }
     //d) 237.771s  // Calcs in args. +13.70%

//e) function { f_superellipsoid(x,y,z,1,P2,P3,P4,P5) }
     //e) 209.125s  // With conditional still slower. +0.65%

//f) function { f_superellipsoid(x,y,z,1,P2,P3,P4,P5) }
     //f) 207.478s  // hard code conditional. -0.79%

//g) function { f_superellipsoid(x,y,z,1,P2,P3,P4,P5) }
     //g) 206.670s  // Allocate new DBL vars. -0.53%

     function { Fn00(x,y,z) }  //h)
     //h) 156.363s  // Interesting. A macro would be better.

//i) function { f_superellipsoid(x,y,z,1,P2,P3,P4,P5) }
//i)  65.552s  // Compile with constants in place. 2-3x faster.

     contained_by { box { -2.0,2.0 } }
     threshold 0
     accuracy 0.0005
     max_gradient 5.1
     pigment { color Green }
}

Post a reply to this message