POV-Ray: Newsgroups: povray.unix: build of povray 3.6.1 on Solaris 8 with Sun ONE Studio 11 fails: Re: build of povray 3.6.1 on Solaris 8 with Sun ONE Studio 11 fails

POV-Ray : Newsgroups : povray.unix : build of povray 3.6.1 on Solaris 8 with Sun ONE Studio 11 fails : Re: build of povray 3.6.1 on Solaris 8 with Sun ONE Studio 11 fails		Server Time 12 Jul 2025 13:50:41 EDT (-0400)
From: Dennis Clarke
Date: 9 Jun 2006 14:15:00
Message: <web.4489b8c5d12e44c53bce8e910@news.povray.org>
Warp <war### [at] tagpovrayorg> wrote:
> Dennis Clarke <dcl### [at] blastwaveorg> wrote:
> > I have a strict belief that code like POV-Ray could perform the exact same
> > sort of test if we exclude all noise components and random signal
> > components. I was thinking ( brace yourself ) of a sphere hangng over a
> > checkerboard with no anti-aliasing and nothing that would create expected
> > differences in output data.
>
>   It's not even a question of randomness or noise. (In fact, if POV-Ray
> uses the its own randomness algorithm, its result will be the same in all
> systems.)

  That stands to reason.

>   The problem with floating point numbers is that it depends a lot on how
> they are calculated whether two different systems will give identical
> results.

  I agree with the "how" but not with the "different systems".

  Let's look further down in your message here :

>   For instance, Intel CPUs (and compatibles) use actually 10 bytes long
> floating point numbers internally even though the program stores them
> as 8 bytes long in memory. Some operations might, just might give a
> different result due to the increased accuracy. Some other processors
> might use 8 bytes long floating point numbers internally, others might
> use even 16 bytes long ones.

  This is an issue of implementation of a given floating point
representation and is no longer a concern in post 1992 era computer work.
I would be so bold as to go back even further to the first draft of the
IEEE 754 standards in 1985 and then again in 1987.  Previous to this we
needed to create our own implementation and having done so with portable
code ( or extensive reams of machine code ) we were able to perform
calculations on any architecture with a precise and exactly identical
result.

  This was because we had total control over "how" the computation was
performed in any given architecture and we also had total control over the
internal representation of the actual floating point data right down to the
storage in memory.  These days were long and difficult but the consistency
and quality of the code was very very high.  Computations executed in a
precise manner and with a predictable result even when the results were not
precise.  Thats a floating point joke.  I hope you caught it. :-)

  The arrival of technology that implemented the floating point
infrastructure for us in both hardware and software was both a boon and a
bane.  As you clearly pointed out above we see that different architectures
implement vastly different  methods of storing floating point data and thus
the calculations are wildly different and not just in the least significant
bits.  Is this a feature of the actual internal register structure of these
chips?  No Sir it was not.  The ability to move binary data around in the
6502 or 6809 or early Intel 8088 or Zilog Z80 were all essentially the
same.  We loaded bits, we shifted bits and we ported floating point
implementations to those processors with the precise exact same results in
every case.  It was the arrival of well documented standards such as IEEE
754 as well as their implementations via the early Weitek chips and Intel
8087 FPU that allowed us to move away from the mainframes and workstations
to lower cost hardware.  Lower cost software and significantly different
results at time. I am sure we all remember the fiasco of the Intel Pentium
P90 chip and I have a dual processor system here today that has the flawed
processor.

But I digress.

The real issue is that we seem to have lost fine control of the "how" a
calculation is performed even though we have access to standard
implementations of floating point data as well as a multitude of
calculations in hardware.  Even transcendental calculations and logarithms
are implemented in hardware for us today.  It is my hope the log(x) will
always report the precise same result on ALL architectures due to these
standard implementations.

>   Some numbers (such as 0.1) cannot be represented accurately with
> (binary) floating point and the error grows if such inaccurate numbers

<snip>

sorry, I guess I just wanted to say that I have had to write low level
floating point code and that I have a passing familiarity with the issues.
:-)

>   Also compiler optimizations may affect the accuracy of the computations.
> For example, a compiler might take this code:
>
> r1 = a/x; r2 = b/x; r3 = c/x; r4 = d/x;
>
> and optimize it (at least if allowed) into an equivalent of this:
>
> tmp = 1/x; r1 = a*tmp; r2 = b*tmp; r3 = c*tmp; r4 = d*tmp;

YES!  This is the loss of fine control that I refer to.

However, one would assume that the exact same code with the exact same
compiler software vendor would result in the exact same results on
differing hardware.  I can report that floating point results that I
perform on either Solaris x86 or Sparc result in the precise same results
given the same compiler switches.  I can say that I have similar
experiences with IBM 3090 mainframes and AS/400 and even with some embedded
technology.  Essentially the bits will always be the same on any architeture
if we have fine control.  The various manipulations of compiler
optimizations will change all of that.  Quickly.  ( another pun .. sorry )

I was once told by a real honest to goodness computer scientist that
compiler optimizations are a great way to arrive at the wrong answer very
quickly.

There was a man named Goldberg that has written some excellent material on
the issues.  I'll have to look him up.

>   Now, "a/x" often gives a different result than "a*(1/x)" due to how
> floating point calculations work. (Technically speaking, the "1/x"
> calculation loses information which is not lost when the final result is
> calculated with a direct "a/x".)
>   There are many other places where the compiler might use shortcuts in
> order to create faster code.

  Indeed.  I agree entirely.

  However ... you probably saw this coming.

  The IEEE754 floating point standard has a few data types and in every case
and in every calculation we are given the possibility of an "inexact" flag
returned to us.  There are very well defined accuracy requirements on
floating-point operations and in all cases the result of any operation must
return an exact result within the 64-bits of the double extended data type.
There is a least 79 bits of data that we can refer to and 64 bits are
absolutely accurate OR we get a return flag of "inexact".  This allows us
to perform the calculation in some other manner.

>   All these differences are in the very least significant bits of the
> result. Whether this minuscule difference will end up as a differing
> pixel in the resulting image depends a lot on what is done.

Well with povray I am willing to try a few experiments on a few
architectures just to see if its possible to get an exact same data result.
 For fun if nothing else.

> Some operations
> done by a raytracer like POV-Ray may be quite prone to even minuscule
> changes in the very least-significant bits of some calculations. For
> example a change of 0.00001% in the direction of a ray may in some scenes
> cause a big enough difference to produce a pixel which differs in the
> least significant bit (and sometimes even more).

The butterfly effect and I agree with that entirely.  I would expcet that
the calculation of a surface normal will result in the exact same vector
every time but the compiler optimizations may modify this in the least bits
as you say.  I think that I saw a post ( years and years ago ) in which
someone repeatedly performed a POVRay trace of a sphere with each iteration
shrinking in scale by a factor of ten.  Eventually the microscopic sphere
was no longer a sphere anymore.  We can only be so precise with computers
and a given floating point implementation.


Dennis
Post a reply to this message