POV-Ray: Newsgroups: povray.beta-test: Radiosity Status: Giving Up...

POV-Ray : Newsgroups : povray.beta-test : Radiosity Status: Giving Up...		Server Time 5 Jul 2025 16:18:14 EDT (-0400)

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>

From: Thorsten Froehlich
Subject: Re: Radiosity Status: Giving Up...
Date: 1 Jan 2009 03:07:21
Message: <495c79b9$1@news.povray.org>

Warp wrote:
> Thorsten Froehlich <tho### [at] trfde> wrote:
>> In essence x86 is the only architecture where more than sqrt is still 
>> supported in microcode hardware. Doing this in software is much more 
>> desirable and efficient.
> 
>   So they really are making the FPU deliberately less efficient than
> it could be?
> 
>   How do you calculate the sine and cosine of a 64-bit floating point
> value in 17-137 clock cycles in software?

Look in the gcc or icc standard library implementations how they do it. Or 
compile and disassemble the compiled code.

	Thorsten

Post a reply to this message

From: Thorsten Froehlich
Subject: Re: Radiosity Status: Giving Up...
Date: 1 Jan 2009 03:13:10
Message: <495c7b16@news.povray.org>

clipka wrote:
> Thorsten Froehlich <tho### [at] trfde> wrote:
>> I have no intention to continue such an argument on semantics, this leads
>> nowhere and does not change the fact that x87 FPU usage is deprecated.
> 
> .... which in turn does not change the fact either that modern CPUs (still) have
> a dedicated command to compute a square root (and not only that, but also for
> trigononetrics and the like), which was a question previously raised, and
> answered by me by referring to the x87 FPU instruction set, while at the same
> time maintaining that they're not a "naive" hardware implementation.

Yes, but what use are instructions you won't be able to use in the future 
and your are already recommended not to use now?

> I still have doubts whether using SSE2 roots/trigonometrics etc. is really
> faster than using the FPU, or whether the compiler really does not use them -

Well, doubts don't change code efficiency ;-)

> at least on 32-bit systems. Note that the deprecation statement relates to
> AMD64, not x86 in general as far as I can see.

It relates to Windows, Mac OS X and Linux running in 64-bit mode. The 
depreciation is a joint undertaking by major OS vendors, Intel and AMD.

	Thorsten

Post a reply to this message

From: Thorsten Froehlich
Subject: Re: Radiosity Status: Giving Up...
Date: 1 Jan 2009 03:17:39
Message: <495c7c23$1@news.povray.org>

Warp wrote:
>   Btw, another advantage of using the FPU rather than calculating in
> software is that you could, at least in theory, have the FPU calculating
> your operation while the CPU does other (non-FPU) operations at the same
> time. I don't know if any compiler is able to opimize like this, though.

For two decades now the CPU and FPU have been the same thing on x86. It is 
not like they are two different processors. They are *one* processor. The 
terminology is just a leftover from times when the logic we nowadays call 
FPU did not fit on the same die as the integer unit called CPU back then.

	Thorsten

Post a reply to this message

From: Thorsten Froehlich
Subject: Re: Radiosity Status: Giving Up...
Date: 1 Jan 2009 03:21:25
Message: <495c7d05@news.povray.org>

Warp wrote:
> Thorsten Froehlich <tho### [at] trfde> wrote:
>> Clearly you do not know much about floating-point units in modern processors 
>> then. You actually want to do it is software because that is more efficient 
>> (see my other post). x87 is pretty much the last architecture to still have 
>> microcode ops for more than sqrt.
> 
>   You are telling me that calculating trigonometric functions on 64-bit
> floating point values in software is faster than using the FPU?

Well, it is also what the AMD-link I supplied explains.

>   I'm not sure that would make too much sense.

Well, maybe not to you, but to Intel, AMD, me and the rest of the world ;-)

 > It would mean that they
> would *deliberately* make the FPU calculate those functions in a less
> efficient way than you could do with the CPU.

No.

But anyway, you can read all this in the Intel and AMD documentation.

	Thorsten

Post a reply to this message

From: clipka
Subject: Re: Radiosity Status: Giving Up...
Date: 1 Jan 2009 03:30:01
Message: <web.495c7e61cd9d1e7530acaf600@news.povray.org>

> Look in the gcc or icc standard library implementations how they do it. Or
> compile and disassemble the compiled code.

Looks sufficiently ugly for my taste :)

Post a reply to this message

From: clipka
Subject: Re: Radiosity Status: Giving Up...
Date: 1 Jan 2009 03:35:00
Message: <web.495c7fb6cd9d1e7530acaf600@news.povray.org>

Thorsten Froehlich <tho### [at] trfde> wrote:
> For two decades now the CPU and FPU have been the same thing on x86. It is
> not like they are two different processors. They are *one* processor. The
> terminology is just a leftover from times when the logic we nowadays call
> FPU did not fit on the same die as the integer unit called CPU back then.

.... and yet, all these times, the FPU has been doing its business in parallel to
the CPU, like in the very first days.

On the other hand, given how much stuff is happening in parallel in a CPU
nowadays, this special status may not be really special anymore.

Post a reply to this message

From: Warp
Subject: Re: Radiosity Status: Giving Up...
Date: 1 Jan 2009 05:53:07
Message: <495ca093@news.povray.org>

Thorsten Froehlich <tho### [at] trfde> wrote:
> For two decades now the CPU and FPU have been the same thing on x86. It is 
> not like they are two different processors. They are *one* processor. The 
> terminology is just a leftover from times when the logic we nowadays call 
> FPU did not fit on the same die as the integer unit called CPU back then.

  That doesn't matter. The CPU part does not stop if the FPU is doing
something (the only situation where the CPU will wait for the FPU is
when it tries to retrieve some value from it).

  This means that if the program executes an FPU opcode which takes dozens
of clock cycles for the FPU to perform, the CPU part will continue executing
CPU opcodes until a new FPU opcode (eg. fst) is encountered.

  The original Quake engine was rather famous for using this to its
adantage in the 486 and Pentium processors: While FPU was calculating
a heavy division (heavier in those days than today), the CPU was
interpolating and drawing textures linearly the next 15 pixels. This
made the division operation almost free (at the cost of the perspective
correctness of the texture not being completely perfect).

-- 
                                                          - Warp

Post a reply to this message

From: Warp
Subject: Re: Radiosity Status: Giving Up...
Date: 1 Jan 2009 05:57:54
Message: <495ca1b2@news.povray.org>

Thorsten Froehlich <tho### [at] trfde> wrote:
> Yes, but what use are instructions you won't be able to use in the future 
> and your are already recommended not to use now?

  As long as the hardware supports x87, I see absolutely no rational reason
why an OS would drop support for 99% of programs just because it doesn't
want the FPU to be used.

  The OS would, in fact, have to go to great lengths in order to detect
that a program is using the FPU and deliberately stop it (rather than
allow it to simply malfunction, which would be stupid).

  "Sorry, your program uses the FPU, and while this computer does have
an FPU, and it could run your program just perfectly, I'm not going to
allow it. Tough luck."

-- 
                                                          - Warp

Post a reply to this message

From: Warp
Subject: Re: Radiosity Status: Giving Up...
Date: 1 Jan 2009 06:00:01
Message: <495ca231@news.povray.org>

clipka <nomail@nomail> wrote:
> Warp <war### [at] tagpovrayorg> wrote:
> > > Yes, by not saving and restoring the x87 "register" stack when switching
> > > threads or making operating system calls. You need OS support for that.
> >
> >   That would be a rather broken OS.

> Not if this was part of the OS specification.

  So the hardware would be perfectly able to run the software, but the
OS deliberately stops the software from being run if it uses the FPU.
And this makes sense?

-- 
                                                          - Warp

Post a reply to this message

From: Warp
Subject: Re: Radiosity Status: Giving Up...
Date: 1 Jan 2009 06:38:35
Message: <495cab3b@news.povray.org>

Thorsten Froehlich <tho### [at] trfde> wrote:
> Look in the gcc or icc standard library implementations how they do it. Or 
> compile and disassemble the compiled code.

  Maybe it's different in a x86-64 architecture, but at least in my P4,
according to my tests, if the compiler uses the FPU opcodes it will be
faster than anything else.

  I made a little function to try to test the speed of trigonometric
functions with different compiler options:

double foo(double d)
{
    double retval = 0;
    for(double angle = 0; angle <= d; angle += 0.0001)
        retval += std::sin(angle) + std::cos(angle);
    return retval;
}

  Then I call it with "foo(10000);"

  It performs quite many other operations as well, so the trigonometric
functions get a bit buried among the others. However, it still produces
measurably differences in execution speed with different options.

  I use the optimization options "-O3 -march=native" for all tests. For
some reason if I use the option "-ffast-math", gcc will produce a direct
fsincos opcode, but if I don't specify it, it will produce a library call
instead. I don't really understand why, but that suits me just fine for
this test. Here are some results (average of 4 runs, rounded to 1 decimal):

-O3 -march=native : 8.2 seconds
-O3 -march=native -ffast-math : 7.1 seconds
-O3 -march=native -mfpmath=sse : 8.1 seconds

  It could be possible to run the test with pure software FP calculations
by using the -msoft-float option, but apparently my gcc (or, more precisely,
libgcc) has not been compiled with the support for it, so I can't test it.
It's a pitty. It would have been interesting to see how much slower it would
have been.

  I haven't really looked at what the gcc sincos library call is doing,
but it might well be that it just executes an fsincos opcode, and that
the time difference is coming from the overhead of the function call.
Anyways, whatever the reason, at least on 32-bit x86 it just seems to
be faster to execute an fsincos directly.

-- 
                                                          - Warp

Post a reply to this message

<<< Previous 10 Messages

Goto Latest 10 Messages

Next 10 Messages >>>