POV-Ray: Newsgroups: povray.beta-test: v3.8 Clean up TODOs. f

POV-Ray : Newsgroups : povray.beta-test : v3.8 Clean up TODOs. f_superellipsoid() / shadow cache.		Server Time 2 Jul 2025 17:54:32 EDT (-0400)

<<< Previous 3 Messages

Goto Initial 10 Messages

From: jr
Subject: Re: v3.8 Clean up TODOs. f_superellipsoid() / shadow cache.
Date: 17 Apr 2020 10:40:01
Message: <web.5e99be1d1e176347827e2b3e0@news.povray.org>

hi,

William F Pokorny <ano### [at] anonymousorg> wrote:
> ...
> Yep, I couldn't let these weird results go...
> ...
> As for why (2) where the raw SDL encoding is winning over an inbuilt
> compiled result. Looked at it with the linux perf profiling tools and it
> looks like when the pow() requests (done internally as exp()s and
> log()s) come at the hardware too fast, some are getting delayed. I see a
> big jump in pow() hardware cycle counts and under them irq and timer
> routines which are not there in the SDL hard coded case. Not sure if
> this 'hold up a minute' by the cpu is to control power or it's the way
> the hardware (an i3) handles too many overlapping pow() requests.
>
> Some better timing data below.
>
> I think going forward in povr ...

if it's any help, I'd be happy to compile 'povr'[*] and run your test scene(s)
on an i5, for comparison; also, can capture session(s) and send transcripts.
(assuming that if I configure + build under /tmp, povr will use installed v3.8
povray.{conf,ini} files)

[*] different configurations, if wanted.


regards, jr.

Post a reply to this message

From: William F Pokorny
Subject: Re: v3.8 Clean up TODOs. f_superellipsoid() / shadow cache.
Date: 18 Apr 2020 07:26:01
Message: <5e9ae3c9@news.povray.org>

On 4/17/20 10:35 AM, jr wrote:
> hi,
> 
> William F Pokorny <ano### [at] anonymousorg> wrote:
...
> 
> if it's any help, I'd be happy to compile 'povr'[*] and run your test scene(s)
> on an i5, for comparison; also, can capture session(s) and send transcripts.
> (assuming that if I configure + build under /tmp, povr will use installed v3.8
> povray.{conf,ini} files)
> 
> [*] different configurations, if wanted.
> 

Of interest I think and easier for now would be if you (or others) could 
run the attached v3.8 scene. You don't need povr to see or test for the 
pow() pileup the issue.

I'm thinking anyone on a system where the simd instructions are <=256 
bits wide will probably see <= SDL speed for the inbuilt command though
it should be faster. Those with avx512 instructions set cpu 'might' see 
'really' fast results for both as IIRC with that set we get a hardware 
exp() instruction.

Bill P.

//-------------------------------------------------
#version 3.8;
// Using recent v3.8, set +r<n> to get run times 60s+ maybe.
// Prefix the command with the system - not shell - time command.
//
// /usr/bin/time povray f_supreTest.pov +a0.0 +am1 +r2
//      or
// \time povray f_supreTest.pov +a0.0 +am1 +r2
//
// Results for my Ubuntu 18.04 i3 system running the default 4
// threads below. v38 master at commit 74b3ebe, but any should do.
//
// The inbuilt result should be faster, but it's
// almost 24% slower for my system. User time.
//

global_settings { assumed_gamma 1 }
#declare Grey50 = srgb <0.5,0.5,0.5>;
background { color Grey50 }
#declare Camera00 = camera {
     perspective
     location <3,3,-3.001>
     sky y
     angle 35
     right x*(image_width/image_height)
     look_at <0,0,0>
}
#declare White = srgb <1,1,1>;
#declare Light00 = light_source { <50,150,-250>, White }
#declare Red = srgb <1,0,0>;
#declare CylinderX = cylinder { -1*x, 1*x, 0.01 pigment { Red } }
#declare Green = srgb <0,1,0>;
#declare CylinderY = cylinder { -1*y, 1*y, 0.01 pigment { Green } }
#declare Blue = srgb <0,0,1>;
#declare CylinderZ = cylinder { -1*z, 1*z, 0.01 pigment { Blue } }

#include "functions.inc"

// SDL coded version.
#declare EW = 1/3;
#declare NS = 1/4;
#declare P2 = (2.0/EW);
#declare P3 = EW*(1.0/NS);
#declare P4 = 2*(1/NS);
#declare P5 = (NS*0.5);
#declare Fn00 = function {
     -1+pow((pow((pow(abs(x),P2)
                 +pow(abs(y),P2)),P3)
                 +pow(abs(z),P4)),P5)
}

#declare Iso99 = isosurface {
// function { Fn00(x,y,z) }                       // 154.544s
    function { -f_superellipsoid(x,y,z,1/3,1/4) }  // 191.359s +23.82%
     contained_by { box { -2.0,2.0 } }
     threshold 0
     accuracy 0.0005
     max_gradient 5.1
     pigment { color Green }
}

//--- scene ---
     camera { Camera00 }
     light_source { Light00 }
     object { CylinderX }
     object { CylinderY }
     object { CylinderZ }
     object { Iso99 }

Post a reply to this message

From: jr
Subject: Re: v3.8 Clean up TODOs. f_superellipsoid() / shadow cache.
Date: 18 Apr 2020 12:25:01
Message: <web.5e9b29671e176347827e2b3e0@news.povray.org>

hi,

William F Pokorny <ano### [at] anonymousorg> wrote:
> Of interest I think and easier for now would be if you (or others) could
> run the attached v3.8 scene.

see p.b.misc, same subject.


regards, jr.

Post a reply to this message

From: William F Pokorny
Subject: Re: v3.8 Clean up TODOs. f_superellipsoid() / shadow cache.
Date: 18 Apr 2020 13:35:45
Message: <5e9b3a71$1@news.povray.org>

On 4/18/20 12:23 PM, jr wrote:
> hi,
> 
> William F Pokorny <ano### [at] anonymousorg> wrote:
>> Of interest I think and easier for now would be if you (or others) could
>> run the attached v3.8 scene.
> 
> see p.b.misc, same subject
> 

Thank you. Interesting.

My 4th gen i3 at 22nm relative results a lot like your earlier 32nm 
generation i3 results. My i3 the same generation as your i5, but the 
relative differences are larger. Oh! your i5-4570 looks to be limited to 
one thread per core, so yeah, that looks not too different than my 2 
core results.

Looks to me, more or less, lines up performance difference with what I 
see too - the SDL coded version is faster... Didn't say it outright, but 
my guess is in the SDL method the pow()s are tossed at the processor 
somewhat slower and so 'fewer/(none?)' are asked to wait for some set 
time period.

Bill P.

Post a reply to this message

From: jr
Subject: Re: v3.8 Clean up TODOs. f_superellipsoid() / shadow cache.
Date: 18 Apr 2020 14:15:00
Message: <web.5e9b42f21e176347827e2b3e0@news.povray.org>

hi,

William F Pokorny <ano### [at] anonymousorg> wrote:
> On 4/18/20 12:23 PM, jr wrote:
> > ...
> Thank you. Interesting.
>
> My 4th gen i3 at 22nm relative results a lot like your earlier 32nm
> generation i3 results. My i3 the same generation as your i5, but the
> relative differences are larger. Oh! your i5-4570 looks to be limited to
> one thread per core, so yeah, that looks not too different than my 2
> core results.

yes.  forgot to say, the 'povray.ini's on all machines set 'work_threads'.  one
per core, except the goose which is set to '2', hence override.


> Looks to me, more or less, lines up performance difference with what I
> see too - the SDL coded version is faster... Didn't say it outright, but
> my guess is in the SDL method the pow()s are tossed at the processor
> somewhat slower and so 'fewer/(none?)' are asked to wait for some set
> time period.

would "spacing" with 'nanosleep(2)' help?


regards, jr.

Post a reply to this message

From: William F Pokorny
Subject: Re: v3.8 Clean up TODOs. f_superellipsoid() / shadow cache.
Date: 18 Apr 2020 15:02:07
Message: <5e9b4eaf$1@news.povray.org>

On 4/18/20 2:12 PM, jr wrote:
> hi,
...
> 
> would "spacing" with 'nanosleep(2)' help?
...
> 

Thought about such things and, yes, expect something like that might help.

The solution I settled upon was to add a field to f_superellipsoid() 
which lets me switch to a single float version of the code. The 
hardware/alg/SIMD? lanes are wide enough singles run fast like we'd 
expect from an inbuilt. Except at the parameter edges (near zero, larger 
value differences) of the EW,NS, it's working well enough the difference 
is impossible to spot unless you run value or image compares of some 
kind. Single nearly 2x faster than the SDL version and even faster than 
the inbuilt at double float given the pow() bottleneck.

Trick helps enough, I wonder if some other inbuilts could benefit from a 
float over double option too. But, I'm deleting many of the more obscure 
built in functions(1). We have functions for shapes and 'things' that 
are interesting to run - once - but not generally useful otherwise. Plus 
the values and polarities are all over the place with them. Leaves not 
many functions where the trick might apply.

Bill P.

(1) - Maybe at some point down the road I'll create a f_museum() 
function and roll all of the obscure stuff into that one function by index.

Post a reply to this message

From: jr
Subject: Re: v3.8 Clean up TODOs. f_superellipsoid() / shadow cache.
Date: 18 Apr 2020 16:15:01
Message: <web.5e9b5f361e176347827e2b3e0@news.povray.org>

hi,

William F Pokorny <ano### [at] anonymousorg> wrote:
> ...
> Trick helps enough, I wonder if some other inbuilts could benefit from a
> float over double option too.

does the .. cost of extra speed, in context, matter so much?  asking because
(and perhaps I'm completely off-track) only today was a post (by user 'guarnio')
where the problem is/was the range of float not being enough.

> But, I'm deleting many of the more obscure
> built in functions(1). We have functions for shapes and 'things' that
> are interesting to run - once - but not generally useful otherwise. Plus
> the values and polarities are all over the place with them. Leaves not
> many functions where the trick might apply.
>
> Bill P.
>
> (1) - Maybe at some point down the road I'll create a f_museum()
> function and roll all of the obscure stuff into that one function by index.

I think that if 'f_museum' is created first, and then various functions
"retired" there, they'll remain available at all times.  ("v good" at voicing my
opinions :-))

regards, jr.

Post a reply to this message

From: William F Pokorny
Subject: Re: v3.8 Clean up TODOs. f_superellipsoid() / shadow cache.
Date: 19 Apr 2020 18:13:06
Message: <5e9cccf2$1@news.povray.org>

On 4/18/20 4:12 PM, jr wrote:
> hi,
> 
> William F Pokorny <ano### [at] anonymousorg> wrote:
>> ...
>> Trick helps enough, I wonder if some other inbuilts could benefit from a
>> float over double option too.
> 
> does the .. cost of extra speed, in context, matter so much?  asking because
> (and perhaps I'm completely off-track) only today was a post (by user 'guarnio')
> where the problem is/was the range of float not being enough.
> 

Not trying to be flippant, but I think it does when it does, and doesn't 
when it doesn't. It's a judgement.

The scale and range of a scene with respect to accuracy as an issue is 
always there relative to the accuracy you have available.

With functions and isosurfaces, the speed of even very fast inbuilt 
functions matters because you mostly want to combine them with other 
functions to create whatever. The performance of all those functions 
mixed together mathematically is what can quickly get out of hand to the 
point of being practically unusable performance wise.

With functions and isosurfaces, we already have an object with user 
variable accuracy via the accuracy value passed which is often << 7/8 
digits (I typically use 0.0005). I've done some limited testing and the 
isosurface solver and - partly due the types of functional input - it 
cannot deliver more than 6-7 digits of accuracy max as a rule sometimes 
less. With other object types and solvers you can get up in the 11/12 
digit ranges though often less. All at doubles.

Relatedly, I believe in going after better performance continually in 
software tools - otherwise you're on the slippery slope to poky. :-)

Bill P.

Post a reply to this message

From: jr
Subject: Re: v3.8 Clean up TODOs. f_superellipsoid() / shadow cache.
Date: 21 Apr 2020 03:55:00
Message: <web.5e9ea6551e176347827e2b3e0@news.povray.org>

hi,

William F Pokorny <ano### [at] anonymousorg> wrote:
> On 4/18/20 4:12 PM, jr wrote:
> > ... the problem is/was the range of float not being enough.
>
> Not trying to be flippant, but I think it does when it does, and doesn't
> when it doesn't. It's a judgement.
>
> The scale and range of a scene with respect to accuracy as an issue is
> always there relative to the accuracy you have available.

naively, I'd assumed some kind of upgrade/development "policy" that sees all
floats replaced with doubles, in time.

> ...
> Relatedly, I believe in going after better performance continually in
> software tools - otherwise you're on the slippery slope to poky. :-)

hmm, I probably "sit on the fence" on that.  eg agree with you when it's a
compiler or other s/ware which has to take h/ware developments into account, but
kind of disagree for, say, programs not tied to h/ware, like 'sed'.

regards, jr.

Post a reply to this message

From: William F Pokorny
Subject: Re: v3.8 Clean up TODOs. f_superellipsoid() / shadow cache.
Date: 22 Apr 2020 08:30:34
Message: <5ea038ea@news.povray.org>

On 4/21/20 3:52 AM, jr wrote:
> hi,
> 
...
>>
>> The scale and range of a scene with respect to accuracy as an issue is
>> always there relative to the accuracy you have available.
> 
> naively, I'd assumed some kind of upgrade/development "policy" that sees all
> floats replaced with doubles, in time.
> Maybe. I'm not aware of any such policy, but I'm not a core developer.

The code base is internally mostly at double floats. There are a few 
places like bounding and color management where single floats get used. 
Done to save storage in the former I think or where the additional 
accuracy is of no practical value (to color results at least) in the 
later. On 'my' list to look at moving these to doubles.

For povr in the continuous pattern wave modification code I recently 
moved a few pattern stored values from singles to doubles. Partly to 
avoid the type conversions, but mostly because my grand plan is to flush 
out the function/pattern code so the interplay between functions and 
patterns is as seamless as it can be. I didn't want functions modified 
by a wave modifier to be getting single float parameters - in a way not 
visible to the user - when the reasonable assumption is everything is at 
double floats.

>> ...
>> Relatedly, I believe in going after better performance continually in
>> software tools - otherwise you're on the slippery slope to poky. :-)
> 
> hmm, I probably "sit on the fence" on that.  eg agree with you when it's a
> compiler or other s/ware which has to take h/ware developments into account, but
> kind of disagree for, say, programs not tied to h/ware, like 'sed'.
> 

I'm with you I think. I failed to be clear (I 'was' too flippant :-)). I 
am pushing for continual performance testing and especially an 
unwillingness to take much slowdown due changes over time without 
compensating improvements somewhere.

What has happened intentionally - or not - with POV-Ray moving v37 to 
v38 and the generic architecture compile shipped with linux 
distributions is a 30-40% slow down with certain common types of scenes.

https://github.com/POV-Ray/povray/issues/363

This after running down a lot of stuff like dynamic casts in the ray 
tracing code to recover performance seen in the benchmark scene.

In part the benchmark scene doesn't cover but a small slice of 
functionality in POV-Ray and mostly this was all that was getting run 
for performance testing.

I believe too, too many times we said this change is only a 1 or 2% slow 
down... Do enough of those in a year and you are well on to pokey at 
year end. The 1 or 2% slowdown at year end is relative to current 
performance. Many later changes, if looked at January 1st, might have 
been rejected out of hand as being too much of a slow down.

Aside: The GNU build methodology supports a code marking method for 
hardware optimized versions of functions that get picked/set at 'load 
time' depending upon your particular hardware or certain hardware 
capabilities. Both compiler and hand optimized code can be implemented 
in this way. Yes, this a reason my personal povr version is headed to a 
GNU only build(1) process. I want to play with this capability in povr 
proper.

Bill P.

(1) - Our current vector template class looks to be somewhat in the way 
of best 'compiler' hardware optimization...

Post a reply to this message

<<< Previous 3 Messages

Goto Initial 10 Messages