|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Is integer addition faster or slower than floating-point addition? How
about multiplication? How do trigonometric functions compare? Is single
precision any faster than double precision? Are 8-bit integers faster
than 16-bit integers?
Does anybody know of a resource where I can get an idea of the relative
difference in speed between the various arithmetic operations on
different data types on "typical" current-generation CPUs?
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Invisible <voi### [at] devnull> wrote:
> Is integer addition faster or slower than floating-point addition? How
> about multiplication? How do trigonometric functions compare? Is single
> precision any faster than double precision? Are 8-bit integers faster
> than 16-bit integers?
>
> Does anybody know of a resource where I can get an idea of the relative
> difference in speed between the various arithmetic operations on
> different data types on "typical" current-generation CPUs?
Have a look here:
http://siyobik.info/index.php?module=x86
I suppose the latency times at the end of the page show how many instruction
cycles the CPU needs for the instruction
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Just found this:
http://software.intel.com/en-us/articles/instruction-latencies-in-assembly-code-for-64-bit-intel-architecture
and this:
http://swox.com/doc/x86-timing.pdf
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Invisible <voi### [at] devnull> wrote:
> Is integer addition faster or slower than floating-point addition?
Impossible to say. It depends on the processor type, the integrity of
the pipelines, the combination of instructions, and tons of other things.
It also depends on what you are doing. If you are loading the values of
two variables into the FPU, calculating the addition, and then storing the
result into a variable, then that's definitely slower than integer addition
because the loads and stores consume a lot more clock cycles for the FPU.
If you are measuring purely the clock cycles taken by one single addition,
disregarding everything else, then they are probably equally fast (although
with modern Intel/AMD processors I cannot even say that for sure, as they
have microcodes which take fractions of a clock cycle and weirdness like
that).
> How about multiplication?
It depends on the processor. Some processors have 1-clock-cycle FPU
multiplication, while others don't. Some have special circuitry for
calculating CPU register multiplication in 1 clock cycle, others have
a small fraction of that circuitry which calculates it in 2 or a few
clock cycles, and yet others calculate it with the FPU (which curiously
makes integer multiplication slower than floating point multiplication).
> How do trigonometric functions compare?
What do you think?
> Is single precision any faster than double precision?
Only if we measure with the pipeline and cache capacity requirements.
> Are 8-bit integers faster than 16-bit integers?
It depends on the processor.
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
>> Is integer addition faster or slower than floating-point addition?
>
> Impossible to say. It depends on the processor type, the integrity of
> the pipelines, the combination of instructions, and tons of other things.
>
> If you are measuring purely the clock cycles taken by one single addition,
> disregarding everything else, then they are probably equally fast (although
> with modern Intel/AMD processors I cannot even say that for sure, as they
> have microcodes which take fractions of a clock cycle and weirdness like
> that).
Interesting. I have always heard that floating-point arithmetic is much
slower than integer arithmetic. (That's why they invented the FPU, but
it's still slower.) So you're saying they're actually roughly the same
speed now?
>> How about multiplication?
>
> It depends on the processor. Some processors have 1-clock-cycle FPU
> multiplication, while others don't. Some have special circuitry for
> calculating CPU register multiplication in 1 clock cycle, others have
> a small fraction of that circuitry which calculates it in 2 or a few
> clock cycles, and yet others calculate it with the FPU (which curiously
> makes integer multiplication slower than floating point multiplication).
From one table I saw, integer multiplication is significantly slower
than integer addition, and integer division is markedly slower again. I
don't know if the same holds for floating-point though, or how fast/slow
floating-point arithmetic is compared to integer arithmetic in general.
>> How do trigonometric functions compare?
>
> What do you think?
I think they're slower, but how much? 2x slower? 20,000x slower? I have
no idea.
>> Is single precision any faster than double precision?
>
> Only if we measure with the pipeline and cache capacity requirements.
>
>> Are 8-bit integers faster than 16-bit integers?
>
> It depends on the processor.
OK, cool. So basically there is no way I can tell whether implementing
an algorithm one way or the other will yield the best speed. Yay, me. :-/
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Orchid XP v8 wrote:
> OK, cool. So basically there is no way I can tell whether implementing
> an algorithm one way or the other will yield the best speed. Yay, me. :-/
It won't tell you everything about the speed of the final program, but I
can't imagine it'd be hard to write some test programs and then time
them to determine some of these answers for yourself.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Kevin Wampler wrote:
> Orchid XP v8 wrote:
>> OK, cool. So basically there is no way I can tell whether implementing
>> an algorithm one way or the other will yield the best speed. Yay, me. :-/
>
> It won't tell you everything about the speed of the final program, but I
> can't imagine it'd be hard to write some test programs and then time
> them to determine some of these answers for yourself.
I guess I'm still used to the Old Days of computing, when the speed of
the CPU was the primary bottleneck. Of course, these days the memory
subsystem is the primary bottleneck - to the point where algorithms
which are "less efficient" on paper can actually run faster in reality
if they have superior cache behaviour.
Obviously, cache behaviour is something I have absolutely no control
over, so there's no point worrying about it.
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Orchid XP v8 wrote:
> OK, cool. So basically there is no way I can tell whether implementing
> an algorithm one way or the other will yield the best speed. Yay, me. :-/
Not only that, but modern processors can get a significant speed boost
based on the order of instructions presented. e.g. if you have 2
instructions, one dependent on the previous, it's entirely possible to
move an instruction between those two to take advantage of the
processor's pipelining. But, then you have to know what instructions
each pipeline can handle, and those 2 instructions cannot share any
resources. e.g. You may have something like:
(my x86 assembler is a bit rough so my syntax may be off.)
mov ecx, 10h
add ecx, ah
mov ebx, [var1]
add ebx, 16
The above could be rearranged to give better performance:
mov ecx, 10h ; p1
mov ebx, [var1] ; p2
add ecx, ah ; p1
add ebx, 16 ; p2
--
~Mike
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Orchid XP v8 wrote:
> Obviously, cache behaviour is something I have absolutely no control
> over, so there's no point worrying about it.
Not entirely true; If you want good speed I think it's worth trying to
code things so all the necessary memory will fit within your cache. Of
course this will put a limit on the size of problem you can solve
without slowing things down a bit. I'm sure that someone like Warp
could give you more detailed pointers, although if you're doing this in
haskell it may not lend itself to this sort of optimization as well as,
say, C++.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
>> Obviously, cache behaviour is something I have absolutely no control
>> over, so there's no point worrying about it.
>
> Not entirely true;
Well, no. You can use large arrays and access them in a specific order
in an attempt to improve cache coherancy. But beyond that, in a GC
language where the runtime randomly allocates and rearranges data in RAM
from time to time, you really have very little control.
--
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |