|
 |
Orchid XP v8 <voi### [at] dev null> wrote:
> Interesting. I have always heard that floating-point arithmetic is much
> slower than integer arithmetic.
An FPU addition has taken 1 clock cycle since probably the 486 (the
version which had the integrated FPU), or at most the Pentium.
The first Intel processor to have a 1-clock-cycle FPU multiplication
was, if I remember correctly, the Pentium. (You can believe this requires
a rather large percentage of the entire area of the microchip.)
Of course the Pentium was not the first processor to have a 1-clock-cycle
floating point multiplication. Many RISC processors had that probably 10
years earlier. It's old technology.
Division is much more complicated than multiplication, which is why
it has always taken (and will most probably always take) quite many clock
cycles to compute.
Of course the actual throughput of the FPU in most programs is slower
than that because of all the data which has to be transferred between the
memory and the FPU. You can write specialized code (usually in ASM) which
takes full advantage of the FPU by calculating as much as possible without
having to constantly load and store the values from/to memory, but compilers
are not even today very good at making this kind of optimization. If you
examine the assembler output of a compiler, you will usually notice that
it loads and stores FPU registers *a alot*, often more than would be
necessary.
This may be one of the reason why XMM, SSE and other SIMD chips have
been developed, ie. so that with a new protocol it would be possible to
write code which utilizes the auxiliary chip better.
> So you're saying they're actually roughly the same speed now?
When counting in clock cycles they have been about the same speed since
the 486. The FPU is just a bit hindered by a more complicated data transfer
between the memory and the FPU.
--
- Warp
Post a reply to this message
|
 |