|
|
William F Pokorny <ano### [at] anonymousorg> wrote:
DETAILS:
Description of the SPEC test case:
POV-Ray is a free and open source ray-tracing application. The CPU2017
version is based on POV-Ray version 3.7.
The benchmark renders a 2560 x 2048 pixel image of a chess board, with the
pieces placed on the board in the starting position. The rendered scene
image is saved as a Targa (.tga) file.
This output image is compared with the reference output image using the
SPEC utility [1]imagevalidate, which calculates a the structural
similarity (SSIM) index between corresponding 8x8 groups of pixels in each
image. The SSIM index has a range of -1 (maximal difference) to 1
(identical). The benchmark requires that all computed SSIM indices be
greater than 0.996 in order for the run to be considered successful.
A log of the execution is also generated, but its contents are not used to
validate correction operation of the benchmark.
POV-Ray (which became 511.povray_r) was contributed to CPU2017 by one of
the developers, Thorsten Froehlich, under the GNU Affero License v3.
Performance of SPEC2017 REF sized test case:
Sample performance difference using Intel compiler on a Skylake system
with the spec2017 REF size test case, but most compilers on most platforms,
including x86 or ARM platforms, can see similar large performance differences
with just this one slight numerical difference in J.
If J turns out to be zero, or if reset. 218 seconds.
Without resetting J to zero when close to zero. 349 seconds. 60% slower.
Both output images validated successfully using the SPEC imagevalidate method,
but the one output image took much longer to generate.
Input .ini file used for the SPEC2017 REF sized test case:
Width=2560
Height=2048
All_Console=On
All_File=SPEC-benchmark.log
Antialias_Depth=3
Antialias=On
Antialias_Threshold=0.3
Bits_Per_Color=8
Bounding_Threshold=1
Bounding=On
Buffer_Output=Off
Buffer_Size=0
Clock=0
Continue_Trace=Off
Create_Histogram=Off
Cyclic_Animation=Off
Debug_Console=Off
Display=Off
Display_Gamma=1.0
Draw_Vistas=Off
Fatal_Console=On
Fatal_Error_Command=
Fatal_Error_Return=I
Field_Render=Off
Final_Clock=1
Final_Frame=1
Histogram_Name=
Histogram_Grid_Size=0.0
Initial_Clock=0
Initial_Frame=1
Include_Header=
Input_File_Name=SPEC-benchmark-ref.pov
Jitter_Amount=0.30
Jitter=Off
Light_Buffer=On
Odd_Field=Off
Output_Alpha=Off
Output_File_Name=SPEC-benchmark.tga
Output_File_Type=t
Palette=3
Pause_When_Done=Off
Post_Frame_Command=
Post_Frame_Return=I
Post_Scene_Command=
Post_Scene_Return=I
Preview_End_Size=1
Preview_Start_Size=1
Pre_Frame_Command=
Pre_Frame_Return=I
Pre_Scene_command=
Pre_Scene_Return=I
Quality=9
Remove_Bounds=Off
Render_Console=On
Sampling_Method=1
Split_Unions=Off
Statistic_Console=On
Subset_End_Frame=1
Subset_Start_Frame=1
Test_Abort_Count=0
Test_Abort=Off
User_Abort_Command=
User_Abort_Return=I
Verbose=Off
Version=3.5
Video_Mode=0
Vista_Buffer=On
Warning_Console=Off
Description of input .pov:
// Persistence Of Vision raytracer sample file.
// POV-Ray scene description for chess board.
// By Ville Saari
// Copyright (c) 1991 Ferry Island Pixelboys
//
// This scene has 430 primitives in objects and 41 in bounding shapes and
// it takes over 40 hours to render by standard amiga.
//
// If you do some nice modifications or additions to this file, please send
// me a copy. My Internet address is: vsa### [at] niksulahutfi
//
// -w320 -h240
// -w800 -h600 +a0.3
// Note : CHESS2.POV was created from Ville Saari's chess.pov
// -- Dan Farmer 1996
// - Cchanged textures
// - Added camera blur and changed focal length
// - Use sky sphere
// - Modularized the code
// - Added felt pads to bottom of pieces
// remaining manual bounding commented out by Bob Hughes, August 31, 2001
Execution Trace:
With my own little debug trace through the Compute_Quadric_BBox() routine of
file quadrics.cpp using the SPEC test case, I see 19 calls to the
Compute_Quadric_BBox() routine.
All calls to Compute_Quadric_BBox() execute one of the many
"Check for xxxxxxx (?-axis)" code blocks, EXCEPT the one case where
"J is recalculated, ends up being close to zero, but not quite zero,
and is not reset". In this one case, none of the "Check ..." code blocks are
executed, the "Add translation" and the "Beware of bounding boxes to large"
are executed, and large values for BBox.Lengths the BBox.Lower_Left values
are used.
Executed code when J is "recalculated" and ends up being exactly equal to
0.0 (or is reset to zero if close) for the 1 of 19 cases that differed:
J was recalculated to J=4.440892e-16,
and then reset to 0.0 with modified code I added.
Check for cone (y-axis).
Add translation.
Quadric->BBox.Lengths 0 to 2: 4.333333 3.500000 4.333333
Quadric->BBox.Lower_Left 0 to 2: -2.166667 8.000000 -2.166667
Executed code when J is "recalculated" and ends up NOT quite being exactly
equal to 0.0:
Add translation.
Beware of bounding boxes to large.
Quadric->BBox.Lengths 0 to 2: 20000000000.000000 20000000000.000000
20000000000.000000
Quadric->BBox.Lower_Left 0 to 2: -10000000000.000000 -10000000000.000000
-10000000000.000000
For the faster executing run, note that the "Beware of bounding boxes to
large" code block is still executed for 8 of the 19 calls to
Compute_Quadric_BBox(), and even when "Beware of bounding boxes to large"
is NOT called, the large BBox values may be still be set or inherited.
For this faster test case, 12 of the 19 calls to
Compute_Quadric_BBox() ended up with large BBox values, so having the
large BBox values is not unusual, but for the one call where J was not
reset when perhaps it could/should have been reset, the large BBox values
were one of the differences encountered between the slower and faster
runs.
Portion of log file results (not used for SPEC verification) for the
fast and slow execution.
Fast execution with recalculated J reset to zero when it is close to zero:
218 seconds.
Image Resolution 2560 x 2048
----------------------------------------------------------------------------
Pixels: 5242880 Samples: 56705552 Smpls/Pxl: 10.82
Rays: 114705211 Saved: 1162288 Max Level: 5/5
----------------------------------------------------------------------------
Ray->Shape Intersection Tests Succeeded Percentage
----------------------------------------------------------------------------
Box 16019108 16019107 100.00
Cone/Cylinder 12818313 3292990 25.69
CSG Intersection 1100588845 111370367 10.12
CSG Union 88374684 76631294 86.71
Plane 4514922317 2468835793 54.68
Quadric 330552192 175746735 53.17
Sphere 289188952 70116199 24.25
Bounding Box 2485773879 745702120 30.00
Light Buffer 1823322702 1064180254 58.36
----------------------------------------------------------------------------
Slow execution with recalculated J NOT reset to zero when it was close
to zero: 349 seconds.
Note the large increase in "Tests" for many of the
"Ray->Shape Intersestion" values, such as CSG Intersection, Plane,
Quadric, and Sphere, (flagged with **) which presumably
results in the large increase in execution time.
Image Resolution 2560 x 2048
----------------------------------------------------------------------------
Pixels: 5242880 Samples: 56705552 Smpls/Pxl: 10.82
Rays: 114705211 Saved: 1162288 Max Level: 5/5
----------------------------------------------------------------------------
Ray->Shape Intersection Tests Succeeded Percentage
----------------------------------------------------------------------------
Box 16019108 16019107 100.00
Cone/Cylinder 12818313 3292990 25.69
CSG Intersection 2573044515** 113104449 4.40
CSG Union 88374684 76631294 86.71
Plane 5987377987** 3804982796 63.55
Quadric 1066780027** 285487345 26.76
Sphere 9123922972** 73475346 0.81
Bounding Box 2368729644 732816314 30.94
Light Buffer 2057730695 1350487574 65.63
----------------------------------------------------------------------------
> My quick take is those initial EPSILONs (1e-10 in POV-Ray) should be
> more like what I'm using in my personal povr branch (gkDBL_epsilon at
> ~ 4.4e-16)
Based on my observations, 4.4e-16 might be a little too restrictive to catch
many of the obvious "close to zero values". An EPSILON of 1e-10 may,
or may not, be too lax, but I think 4.4e-16 would be too restrictive.
The recalculated values of J that were not close to zero for this case,
and hence didn't need to be reset were either -1.0, or already 0.0.
Post a reply to this message
|
|