![](/i/fill.gif) |
![](/i/fill.gif) |
|
![](/i/fill.gif) |
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
From: Thorsten Froehlich
Subject: Re: Radiosity Status: Giving Up...
Date: 1 Jan 2009 07:22:51
Message: <495cb59b@news.povray.org>
|
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
Warp wrote:
> Thorsten Froehlich <tho### [at] trf de> wrote:
>> For two decades now the CPU and FPU have been the same thing on x86. It is
>> not like they are two different processors. They are *one* processor. The
>> terminology is just a leftover from times when the logic we nowadays call
>> FPU did not fit on the same die as the integer unit called CPU back then.
>
> That doesn't matter.
When did I say it does? You asserted it would:
> The CPU part does not stop if the FPU is doing
> something (the only situation where the CPU will wait for the FPU is
> when it tries to retrieve some value from it).
"at least in theory, have the FPU calculating your operation while the CPU
does other (non-FPU) operations at the same time. I don't know if any
compiler is able to opimize like this, though."
I am asserting that (both of) your statements are incorrect because you
continue to view the FPU as a separate entity from the CPU in your
statements. What you refer to as CPU is the combination of ALU and LSU. That
is not the whole CPU. The FPU is just one other component of the CPU, it is
in no way distinct, especially not in x86 processors.
In fact, Intel's "Core" architecture fuses the ALU and FPU like no other CPU
design currently around. Go look for a Core i7 (Nehalem) block diagram in
the IDF videos on the Intel site, or look at i.e. this redrawn one:
<http://pc.watch.impress.co.jp/docs/2008/0403/kaigai_nehalem.pdf>
Notice something? - Where is your FPU, where is your "CPU"? There are six
separate execution units, each with some unique and some common features...
Thorsten
Post a reply to this message
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
Warp wrote:
> Thorsten Froehlich <tho### [at] trf de> wrote:
>> Yes, but what use are instructions you won't be able to use in the future
>> and your are already recommended not to use now?
>
> As long as the hardware supports x87, I see absolutely no rational reason
> why an OS would drop support for 99% of programs just because it doesn't
> want the FPU to be used.
Tell that Microsoft, Apple and the Linux community.
Thorsten
Post a reply to this message
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
Warp wrote:
> Thorsten Froehlich <tho### [at] trf de> wrote:
>> Look in the gcc or icc standard library implementations how they do it. Or
>> compile and disassemble the compiled code.
<snip>
> I haven't really looked at what the gcc sincos library call is doing,
> but it might well be that it just executes an fsincos opcode, and that
> the time difference is coming from the overhead of the function call.
You asked what a fast SSE trigonometry implementation would look like, not
what code your compiler generates when targeting a P4. So clearly you should
not be looking at the x87 implementation using the fsincos opcode when you
want to know how the SSE code would look like!?!
Thorsten
Post a reply to this message
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
Thorsten Froehlich <tho### [at] trf de> wrote:
> Warp wrote:
> > Thorsten Froehlich <tho### [at] trf de> wrote:
> >> For two decades now the CPU and FPU have been the same thing on x86. It is
> >> not like they are two different processors. They are *one* processor. The
> >> terminology is just a leftover from times when the logic we nowadays call
> >> FPU did not fit on the same die as the integer unit called CPU back then.
> >
> > That doesn't matter.
> When did I say it does? You asserted it would:
> > The CPU part does not stop if the FPU is doing
> > something (the only situation where the CPU will wait for the FPU is
> > when it tries to retrieve some value from it).
> "at least in theory, have the FPU calculating your operation while the CPU
> does other (non-FPU) operations at the same time. I don't know if any
> compiler is able to opimize like this, though."
> I am asserting that (both of) your statements are incorrect because you
> continue to view the FPU as a separate entity from the CPU in your
> statements. What you refer to as CPU is the combination of ALU and LSU. That
> is not the whole CPU. The FPU is just one other component of the CPU, it is
> in no way distinct, especially not in x86 processors.
> In fact, Intel's "Core" architecture fuses the ALU and FPU like no other CPU
> design currently around. Go look for a Core i7 (Nehalem) block diagram in
> the IDF videos on the Intel site, or look at i.e. this redrawn one:
> <http://pc.watch.impress.co.jp/docs/2008/0403/kaigai_nehalem.pdf>
> Notice something? - Where is your FPU, where is your "CPU"? There are six
> separate execution units, each with some unique and some common features...
OMG. You blame me for nitpicking about semantics, and now you are doing
that exact same thing yourself.
I never said the "FPU" would be a separate piece of circuitry from the CPU.
When I say "FPU" I mean, rather obviously, "the part of the processor which
performs the floating point calculations". It doesn't matter how it's
physically distributed inside the processor, I was talking about its
behavior.
--
- Warp
Post a reply to this message
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
Thorsten Froehlich <tho### [at] trf de> wrote:
> Warp wrote:
> > Thorsten Froehlich <tho### [at] trf de> wrote:
> >> Yes, but what use are instructions you won't be able to use in the future
> >> and your are already recommended not to use now?
> >
> > As long as the hardware supports x87, I see absolutely no rational reason
> > why an OS would drop support for 99% of programs just because it doesn't
> > want the FPU to be used.
> Tell that Microsoft, Apple and the Linux community.
Windows, MacOS X and Linux all fully support programs which use the FPU.
If they wouldn't, at least 99% of programs would stop working.
I still see no rational reason to deliberately and on purpose break
99% of programs. What would be the point? Task switching takes a negligible
amount of time, so skipping storing and loading the FPU registers would be
a rather useless micro-optimization.
What other benefit could there be, from the point of view of an OS?
(Sure, they might say "please use SSE rather than the FPU from now on",
but that's a completely different thing from actually going and actively
making most programs out there stop working, for no good reason. The
hardware is there, so why not use it? It doesn't make any sense.)
--
- Warp
Post a reply to this message
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
Thorsten Froehlich <tho### [at] trf de> wrote:
> You asked what a fast SSE trigonometry implementation would look like, not
> what code your compiler generates when targeting a P4. So clearly you should
> not be looking at the x87 implementation using the fsincos opcode when you
> want to know how the SSE code would look like!?!
It's obviously telling me that whatever the SSE implementation might be,
it's *not* faster (nor even equally fast) than the fsincos opcode in my
computer, which contradicts what you said that it could be done in software
more efficiently. If it could be done more efficiently, wouldn't gcc do just
that when I instruct it to use SSE?
Is SSE different in x86_64 than it is in x86_32?
--
- Warp
Post a reply to this message
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
Warp wrote:
> Thorsten Froehlich <tho### [at] trf de> wrote:
>> Warp wrote:
>>> Thorsten Froehlich <tho### [at] trf de> wrote:
>>>> Yes, but what use are instructions you won't be able to use in the future
>>>> and your are already recommended not to use now?
>>> As long as the hardware supports x87, I see absolutely no rational reason
>>> why an OS would drop support for 99% of programs just because it doesn't
>>> want the FPU to be used.
>
>> Tell that Microsoft, Apple and the Linux community.
>
> Windows, MacOS X and Linux all fully support programs which use the FPU.
> If they wouldn't, at least 99% of programs would stop working.
>
> I still see no rational reason to deliberately and on purpose break
> 99% of programs.
DO NOT ASK ME! They are going to do it, period! Is it really that difficult
to understand? What the heck do you argue with me about the rationale behind
*their* decisions?
Thorsten
Post a reply to this message
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
Warp wrote:
> Thorsten Froehlich <tho### [at] trf de> wrote:
>> You asked what a fast SSE trigonometry implementation would look like, not
>> what code your compiler generates when targeting a P4. So clearly you should
>> not be looking at the x87 implementation using the fsincos opcode when you
>> want to know how the SSE code would look like!?!
>
> It's obviously telling me that whatever the SSE implementation might be,
> it's *not* faster (nor even equally fast) than the fsincos opcode in my
> computer, which contradicts what you said that it could be done in software
> more efficiently. If it could be done more efficiently, wouldn't gcc do just
> that when I instruct it to use SSE?
Why do you argue with me about what Microsoft, Apple, Intel and AMD say? I
have no intention to discuss this any further, sorry. This is ridiculous! If
you don't know how to get the performance out of your compiled program that
Microsoft, Apple, Intel and AMD say is possible, then that is not my
problem. If you seriously believe Microsoft, Apple, Intel and AMD would make
suggestions how software runs slower on the latest x86 processors, then
believe it, I cannot change what you want to believe.
Thorsten
Post a reply to this message
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
Thorsten Froehlich <tho### [at] trf de> wrote:
> DO NOT ASK ME! They are going to do it, period! Is it really that difficult
> to understand? What the heck do you argue with me about the rationale behind
> *their* decisions?
What decisions? Do you have any concrete reference eg. to some online
linux community resource where they are saying that support for programs
using the FPU will be dropped? Because I would certainly like to know how
it makes any sense.
--
- Warp
Post a reply to this message
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
Thorsten Froehlich <tho### [at] trf de> wrote:
> Why do you argue with me about what Microsoft, Apple, Intel and AMD say? I
> have no intention to discuss this any further, sorry. This is ridiculous! If
> you don't know how to get the performance out of your compiled program that
> Microsoft, Apple, Intel and AMD say is possible, then that is not my
> problem. If you seriously believe Microsoft, Apple, Intel and AMD would make
> suggestions how software runs slower on the latest x86 processors, then
> believe it, I cannot change what you want to believe.
Then I only must conclude that Microsoft, Apple, Intel and AMD are lying
because I can't see any actual proof of that in my computer.
--
- Warp
Post a reply to this message
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |
| ![](/i/fill.gif) |
|
![](/i/fill.gif) |
|
![](/i/fill.gif) |
| ![](/i/fill.gif) |