POV-Ray: Newsgroups: povray.general: Tracking down slow renders

POV-Ray : Newsgroups : povray.general : Tracking down slow renders		Server Time 19 Apr 2024 17:45:13 EDT (-0400)

<<< Previous 10 Messages

Goto Initial 10 Messages

From: jr
Subject: Re: Tracking down slow renders
Date: 11 May 2023 04:00:00
Message: <web.645ca0395485baa958c093306cde94f1@news.povray.org>

hi,

William F Pokorny <ano### [at] anonymousorg> wrote:
> ... [quality levels] ...
> For my povr fork play (as an idea for v4.0) I'm thinking for a start
> I'll change the bucketing to:
>
>      explicit QualityFlags(int level) :
>          ambientOnly (level <= 1),
>          quickColour (level <= 5),
>          shadows     (level >= 4),
>          areaLights  (level >= 5),
>          refractions (level >= 6),
>          reflections (level >= 7),
>          normals     (level >= 8),
>          media       (level >= 10),
>          radiosity   (level >= 11),
>          photons     (level >= 9),
>          subsurface  (level >= 12)
>      {}
>
> and make the default quality level 12 instead of 9.
>
> It seems to me in lumping so many of the most expensive features
> together we lose the debugging capability we can get from the quality
> level feature.
>
> For 'quality' you do want to run media, radiosity, photons and
> subsurface together. They are tangled and affect each other.

(just thinking aloud "gut reaction")  wondering, naively, if "adding in" each
feature/level separately would not be a better strategy then.  that is, some
ini/command-line format to eg say (the equivalent of) "+media -photons ..."


regards, jr.

Post a reply to this message

From: William F Pokorny
Subject: Re: Tracking down slow renders
Date: 11 May 2023 08:02:07
Message: <645cd93f$1@news.povray.org>

On 5/10/23 17:27, Alain Martel wrote:
>  From my experience, when using +q0 to +q3, transparent pigments do show 
> as transparent. Both filter and transmit.
> Then, from +q4 to +q7, anything transparent shows as black, for any 
> pigment and any amount of filter or transmit.

Thanks Alain. This must mean the non refracting transparency is handled 
apart from refracted - we perhaps should add a 'level' for those sorts 
of rays too.

Also that the 'full ambient' and 'ambient' both meant the same "full 
ambient" in the original documentation.

---

Thanks jr. Your idea is better. It would add a somewhat large set of 
flags/ini options, but offer much more flexibility during debugging.

It's more doable with Christoph's post v3.7, quality code mechanisms 
cleanup. The individual controls are now booleans prior to the tracing 
conditionals - which means I should be able to override the quality 
level set booleans if I get the order of the flag / ini evaluations 
right. Or probably safer, but uglier to code, some method which 
remembers all the individual feature control settings and uses those at 
some later time during set up to override the quality determined boolean 
settings.

Bill P.

Post a reply to this message

From: Chris R
Subject: Re: Tracking down slow renders
Date: 29 Jun 2023 09:35:00
Message: <web.649d87a05485baa964326bc25cc1b6e@news.povray.org>

William F Pokorny <ano### [at] anonymousorg> wrote:
....
> 1) If you have any transparency - and IIRC - since v3.7 rays transit
> through transparent surfaces, with an ior of 1.0, without increasing the
> max_trace count. This was done to avoid part of the old problem of black
> pixels on hitting max_trace_level.
>
> What I've had happen, very occasionally since, is some smallish number
> of rays skimming a 'numerically bumpy' surface. The rays end up
> transiting in an out of the shape a large number of times. Alain taught
> me the trick of changing the ior to something like 1.0005 so those
> transitions through transparency count again toward the max_trace_level.
>
>
> 2) With isosurfaces, using 'all_intersections' (really a max of 10) when
> you don't need them 'all' can be expensive. I usually start with
> max_trace at 2. Sometimes I cheat down to 1, if the object is simple,
> opaque and I don't see artefacts.
>
> There is too the quality level. If of the radiosity, subsurface and
> media features all you use is radiosity, you can drop from 9 (=10 & =11)
> to 8 and check performance without radiosity. Dropping to 7 cuts out
> reflection, refraction and transparency(a). Plus, you can use start/end
> row start/end column settings to render only in regions running slow for
> performance testing.
>
> Bill P.
>

I got access to a nice, big Linux box with 64 cores and tried rendering the
scene again, using the smaller block sizes and reduced max_trace_level.  It
chugged along pretty steadily for about 48 hours, and then with 128 pixels left
(out of a 1920x1080 image), its been stuck for an additional 72 hours with no
sign of progress.

The camera in the scene is at a very low angle above a wooden table with a bumpy
surface.  The table top is encased in a varnish object that is transparent and
has its own bumpy quality as well.  The two blocks that are stuck appear to be
on this surface and may be where some of the lights in the room are causing
specular highlights.

I'm using v3.8, and I have gotten into the habit of always using material{}
instead of texture{} for my objects so I don't forget to assign an ior when
using fresnel at the finish level and for reflection.  I'll have to look more
closely at the code to make sure I didn't miss any and leave them with the
default 1.0 ior, but I know for sure the varnish has an ior assigned from a
table of ior values.

I may end up shifting the camera location slightly and see if that eliminates
the problem spots, but I'd be interest in trying to figure out whether I'm just
hitting some threshold edge condition where an algorithm is just never
terminating.  Could I be hitting a loop where the threshold condition cannot be
met due to floating point precision problems?

-- Chris R.

Post a reply to this message

From: Alain Martel
Subject: Re: Tracking down slow renders
Date: 29 Jun 2023 12:04:51
Message: <649daba3$1@news.povray.org>

Le 2023-06-29 à 09:31, Chris R a écrit :

> closely at the code to make sure I didn't miss any and leave them with the
> default 1.0 ior, but I know for sure the varnish has an ior assigned from a
> table of ior values.
> 
Leaving the ior at 1.0 is very probably not your problem.
An ior 1 next to «air» effectively kill any fresnel reflection and 
should do the same for highlights based on the fresnel model.

If your varnish have an ior of 1, then, it don't contribute to anything 
having «fresnel» in it's definition.

Post a reply to this message

From: Bald Eagle
Subject: Re: Tracking down slow renders
Date: 29 Jun 2023 13:20:00
Message: <web.649dbc815485baa91f9dae3025979125@news.povray.org>

"Chris R" <car### [at] comcastnet> wrote:

> I got access to a nice, big Linux box with 64 cores and tried rendering the
> scene again, using the smaller block sizes and reduced max_trace_level.  It
> chugged along pretty steadily for about 48 hours, and then with 128 pixels left
> (out of a 1920x1080 image), its been stuck for an additional 72 hours with no
> sign of progress.

I know that I've run into that sort of thing with heightfields with a reflective
finish.  That was some sort of "known" bug, that I have no idea ever got
investigated or fixed.

Perhaps you can try to reduce the reflectivity of the surface - Only the very
upper/outermost surface should need a reflective value.

You can also try to render _just_ those blocks with the appropriate command line
switches, so you don't have to wait 48 before testing that region of the render.

- BW

Post a reply to this message

From: William F Pokorny
Subject: Re: Tracking down slow renders
Date: 29 Jun 2023 19:38:58
Message: <649e1612$1@news.povray.org>

On 6/29/23 09:31, Chris R wrote:
> I may end up shifting the camera location slightly and see if that eliminates
> the problem spots, but I'd be interest in trying to figure out whether I'm just
> hitting some threshold edge condition where an algorithm is just never
> terminating.  Could I be hitting a loop where the threshold condition cannot be
> met due to floating point precision problems?

(64 cores... What a dream! :-) )

Yes, always possible there is some unknown problem causing the run 
extreme render times.

Given you have the problem isolated to two blocks, why not try what Bald 
Eagle suggested. There are the (start row / end row) and (start column / 
end column) controls. These I used often while running down various 
solver issues (image artefacts) in 2017 / 2018 (a).

So. You could start by making a pretty good guess at the image rows 
covering those to hung blocks. Then try row by row renders using:

povray ... +sr600 +er600 ...

povray ... +sr601 +er601 ...

When you find a row which hangs you can then start to use the column 
options:

povray ... +sr602 +er602 +sc500 +ec1500 ...

povray ... +sr602 +er602 +sc700 +ec900 ...

You might need to 'kill' the hung processes (or cntl-c cancel if that 
works) while you isolate to pixels. You can use the 'kill' command for 
this or 'kill' via linux process monitoring commands like top/htop.

Should be you can leave all other options the same and slowly isolate 
the problem pixels. If like my old hanging transparent blobs, it might 
end up being just a pixel position or two (b).

If you can get to a scene of reasonable size and complexity also cut 
down to a small number of problem pixels, I'll take a look at it on my 
little two core i3 while doing some extra performance/debugging stuff.

Note though. Real life has me tied up until late July or more likely 
early August so I won't be able to look at anything immediately.

Aside: I think all linux distribution offerings come with 'top' or 
'htop' commands. It might be interesting to kick off one of these 
commands in a terminal while povray is struggling to look at whether cpu 
bound (povray processes at near 100%) or perhaps consuming large amounts 
of memory - and so perhaps the 'hung' threads are paging even on your 
huge machine (b). Large memory use would hint at runaway recursion 
internal to the code (runaway transparency being one way this might 
happen).

Bill P.

(a) Somewhere, I have a script framework which isolates to particular 
pixels automatically based on various triggers - but that framework had 
nothing which isolated on 'long render times...'

(b) Or there is a very remote chance the block is hung on exit. We have 
seen 'render block' hangs with the Simple Direct media Layer (SDL1.2) 
preview code - which the official v3.8 code still uses - though there 
has long been a newer and less buggy 2.0 package. Using something like 
top or htop, you might see the povray process(es) using very little cpu 
or memory resource in this sort of case.

Post a reply to this message

From: Chris R
Subject: Re: Tracking down slow renders
Date: 30 Jun 2023 08:35:00
Message: <web.649ecbcc5485baa98a3c0aee5cc1b6e@news.povray.org>

William F Pokorny <ano### [at] anonymousorg> wrote:
> On 6/29/23 09:31, Chris R wrote:
> > I may end up shifting the camera location slightly and see if that eliminates
> > the problem spots, but I'd be interest in trying to figure out whether I'm just
> > hitting some threshold edge condition where an algorithm is just never
> > terminating.  Could I be hitting a loop where the threshold condition cannot be
> > met due to floating point precision problems?
>
> (64 cores... What a dream! :-) )
>
> Yes, always possible there is some unknown problem causing the run
> extreme render times.
>
> Given you have the problem isolated to two blocks, why not try what Bald
> Eagle suggested. There are the (start row / end row) and (start column /
> end column) controls. These I used often while running down various
> solver issues (image artefacts) in 2017 / 2018 (a).
>
> So. You could start by making a pretty good guess at the image rows
> covering those to hung blocks. Then try row by row renders using:
>
> povray ... +sr600 +er600 ...
>
> povray ... +sr601 +er601 ...
>
> When you find a row which hangs you can then start to use the column
> options:
>
> povray ... +sr602 +er602 +sc500 +ec1500 ...
>
> povray ... +sr602 +er602 +sc700 +ec900 ...
>
> You might need to 'kill' the hung processes (or cntl-c cancel if that
> works) while you isolate to pixels. You can use the 'kill' command for
> this or 'kill' via linux process monitoring commands like top/htop.
>
> Should be you can leave all other options the same and slowly isolate
> the problem pixels. If like my old hanging transparent blobs, it might
> end up being just a pixel position or two (b).
>
> If you can get to a scene of reasonable size and complexity also cut
> down to a small number of problem pixels, I'll take a look at it on my
> little two core i3 while doing some extra performance/debugging stuff.
>
> Note though. Real life has me tied up until late July or more likely
> early August so I won't be able to look at anything immediately.
>
>
> Aside: I think all linux distribution offerings come with 'top' or
> 'htop' commands. It might be interesting to kick off one of these
> commands in a terminal while povray is struggling to look at whether cpu
> bound (povray processes at near 100%) or perhaps consuming large amounts
> of memory - and so perhaps the 'hung' threads are paging even on your
> huge machine (b). Large memory use would hint at runaway recursion
> internal to the code (runaway transparency being one way this might
> happen).
>
> Bill P.
>
>
> (a) Somewhere, I have a script framework which isolates to particular
> pixels automatically based on various triggers - but that framework had
> nothing which isolated on 'long render times...'
>
> (b) Or there is a very remote chance the block is hung on exit. We have
> seen 'render block' hangs with the Simple Direct media Layer (SDL1.2)
> preview code - which the official v3.8 code still uses - though there
> has long been a newer and less buggy 2.0 package. Using something like
> top or htop, you might see the povray process(es) using very little cpu
> or memory resource in this sort of case.

Thanks for all of the debugging hints.

Top is reporting that 2 cores are running at 100%, so one per 64-pixel block.
Memory has held constant, and this machine has 80GB so I don't think it's paging
very often.  Not much else is running on the machine at this point.

I'm going to leave it running over the 4-day holiday weekend and if nothing has
changed I'll kill it and try to isolate it.  Or if I'm feeling ambitious I'll
see if I can attach gdb to the running process and at least look at the stack
trace.

-- Chris R.

Post a reply to this message

<<< Previous 10 Messages

Goto Initial 10 Messages