POV-Ray : Newsgroups : povray.off-topic : Processor speed Server Time
7 Sep 2024 01:21:47 EDT (-0400)
  Processor speed (Message 4 to 13 of 23)  
<<< Previous 3 Messages Goto Latest 10 Messages Next 10 Messages >>>
From: Warp
Subject: Re: Processor speed
Date: 28 Oct 2008 13:25:04
Message: <49074aef@news.povray.org>
Invisible <voi### [at] devnull> wrote:
> Is integer addition faster or slower than floating-point addition?

  Impossible to say. It depends on the processor type, the integrity of
the pipelines, the combination of instructions, and tons of other things.

  It also depends on what you are doing. If you are loading the values of
two variables into the FPU, calculating the addition, and then storing the
result into a variable, then that's definitely slower than integer addition
because the loads and stores consume a lot more clock cycles for the FPU.

  If you are measuring purely the clock cycles taken by one single addition,
disregarding everything else, then they are probably equally fast (although
with modern Intel/AMD processors I cannot even say that for sure, as they
have microcodes which take fractions of a clock cycle and weirdness like
that).

> How about multiplication?

  It depends on the processor. Some processors have 1-clock-cycle FPU
multiplication, while others don't. Some have special circuitry for
calculating CPU register multiplication in 1 clock cycle, others have
a small fraction of that circuitry which calculates it in 2 or a few
clock cycles, and yet others calculate it with the FPU (which curiously
makes integer multiplication slower than floating point multiplication).

> How do trigonometric functions compare?

  What do you think?

> Is single precision any faster than double precision?

  Only if we measure with the pipeline and cache capacity requirements.

> Are 8-bit integers faster than 16-bit integers?

  It depends on the processor.

-- 
                                                          - Warp


Post a reply to this message

From: Orchid XP v8
Subject: Re: Processor speed
Date: 28 Oct 2008 15:28:55
Message: <490767f7@news.povray.org>
>> Is integer addition faster or slower than floating-point addition?
> 
>   Impossible to say. It depends on the processor type, the integrity of
> the pipelines, the combination of instructions, and tons of other things.
> 
>   If you are measuring purely the clock cycles taken by one single addition,
> disregarding everything else, then they are probably equally fast (although
> with modern Intel/AMD processors I cannot even say that for sure, as they
> have microcodes which take fractions of a clock cycle and weirdness like
> that).

Interesting. I have always heard that floating-point arithmetic is much 
slower than integer arithmetic. (That's why they invented the FPU, but 
it's still slower.) So you're saying they're actually roughly the same 
speed now?

>> How about multiplication?
> 
>   It depends on the processor. Some processors have 1-clock-cycle FPU
> multiplication, while others don't. Some have special circuitry for
> calculating CPU register multiplication in 1 clock cycle, others have
> a small fraction of that circuitry which calculates it in 2 or a few
> clock cycles, and yet others calculate it with the FPU (which curiously
> makes integer multiplication slower than floating point multiplication).

 From one table I saw, integer multiplication is significantly slower 
than integer addition, and integer division is markedly slower again. I 
don't know if the same holds for floating-point though, or how fast/slow 
floating-point arithmetic is compared to integer arithmetic in general.

>> How do trigonometric functions compare?
> 
>   What do you think?

I think they're slower, but how much? 2x slower? 20,000x slower? I have 
no idea.

>> Is single precision any faster than double precision?
> 
>   Only if we measure with the pipeline and cache capacity requirements.
> 
>> Are 8-bit integers faster than 16-bit integers?
> 
>   It depends on the processor.

OK, cool. So basically there is no way I can tell whether implementing 
an algorithm one way or the other will yield the best speed. Yay, me. :-/

-- 
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*


Post a reply to this message

From: Kevin Wampler
Subject: Re: Processor speed
Date: 28 Oct 2008 15:31:47
Message: <490768a3@news.povray.org>
Orchid XP v8 wrote:
> OK, cool. So basically there is no way I can tell whether implementing 
> an algorithm one way or the other will yield the best speed. Yay, me. :-/

It won't tell you everything about the speed of the final program, but I 
can't imagine it'd be hard to write some test programs and then time 
them to determine some of these answers for yourself.


Post a reply to this message

From: Orchid XP v8
Subject: Re: Processor speed
Date: 28 Oct 2008 15:34:04
Message: <4907692c@news.povray.org>
Kevin Wampler wrote:
> Orchid XP v8 wrote:
>> OK, cool. So basically there is no way I can tell whether implementing 
>> an algorithm one way or the other will yield the best speed. Yay, me. :-/
> 
> It won't tell you everything about the speed of the final program, but I 
> can't imagine it'd be hard to write some test programs and then time 
> them to determine some of these answers for yourself.

I guess I'm still used to the Old Days of computing, when the speed of 
the CPU was the primary bottleneck. Of course, these days the memory 
subsystem is the primary bottleneck - to the point where algorithms 
which are "less efficient" on paper can actually run faster in reality 
if they have superior cache behaviour.

Obviously, cache behaviour is something I have absolutely no control 
over, so there's no point worrying about it.

-- 
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*


Post a reply to this message

From: Mike Raiford
Subject: Re: Processor speed
Date: 28 Oct 2008 15:36:26
Message: <490769ba@news.povray.org>
Orchid XP v8 wrote:

> OK, cool. So basically there is no way I can tell whether implementing 
> an algorithm one way or the other will yield the best speed. Yay, me. :-/

Not only that, but modern processors can get a significant speed boost 
based on the order of instructions presented. e.g. if you have 2 
instructions, one dependent on the previous, it's entirely possible to 
move an instruction between those two to take advantage of the 
processor's pipelining. But, then you have to know what instructions 
each pipeline can handle, and those 2 instructions cannot share any 
resources. e.g. You may have something like:

(my x86 assembler is a bit rough so my syntax may be off.)

mov ecx, 10h
add ecx, ah

mov ebx, [var1]
add ebx, 16


The above could be rearranged to give better performance:

mov ecx, 10h    ; p1
mov ebx, [var1] ; p2

add ecx, ah     ; p1
add ebx, 16     ; p2

-- 
~Mike


Post a reply to this message

From: Kevin Wampler
Subject: Re: Processor speed
Date: 28 Oct 2008 15:41:54
Message: <49076b02$1@news.povray.org>
Orchid XP v8 wrote:
> Obviously, cache behaviour is something I have absolutely no control 
> over, so there's no point worrying about it.

Not entirely true; If you want good speed I think it's worth trying to 
code things so all the necessary memory will fit within your cache.  Of 
course this will put a limit on the size of problem you can solve 
without slowing things down a bit.  I'm sure that someone like Warp 
could give you more detailed pointers, although if you're doing this in 
haskell it may not lend itself to this sort of optimization as well as, 
say, C++.


Post a reply to this message

From: Orchid XP v8
Subject: Re: Processor speed
Date: 28 Oct 2008 15:50:10
Message: <49076cf2$1@news.povray.org>
>> Obviously, cache behaviour is something I have absolutely no control 
>> over, so there's no point worrying about it.
> 
> Not entirely true;

Well, no. You can use large arrays and access them in a specific order 
in an attempt to improve cache coherancy. But beyond that, in a GC 
language where the runtime randomly allocates and rearranges data in RAM 
from time to time, you really have very little control.

-- 
http://blog.orphi.me.uk/
http://www.zazzle.com/MathematicalOrchid*


Post a reply to this message

From: Tom Austin
Subject: Re: Processor speed
Date: 28 Oct 2008 16:24:10
Message: <490774ea$1@news.povray.org>
Orchid XP v8 wrote:
> Kevin Wampler wrote:
>> Orchid XP v8 wrote:
>>> OK, cool. So basically there is no way I can tell whether 
>>> implementing an algorithm one way or the other will yield the best 
>>> speed. Yay, me. :-/
>>
>> It won't tell you everything about the speed of the final program, but 
>> I can't imagine it'd be hard to write some test programs and then time 
>> them to determine some of these answers for yourself.
> 
> I guess I'm still used to the Old Days of computing, when the speed of 
> the CPU was the primary bottleneck. Of course, these days the memory 
> subsystem is the primary bottleneck - to the point where algorithms 
> which are "less efficient" on paper can actually run faster in reality 
> if they have superior cache behaviour.
> 
> Obviously, cache behaviour is something I have absolutely no control 
> over, so there's no point worrying about it.
> 


Depends on what language you are using.

Even today some programs are written to use only internal registers - 
only using external memory when absolutely necessary.

Also, depending on what processing unit you are using it could be faster 
to count down to zero than up to a given number - based on the machine 
instructions available and the time cost of each instruction.


When you get into high level languages you are going to lose some 
efficiency in execution because the focus starts towards ease of actual 
programming vs execution efficiency.

You get to a point where the program just runs and the compiler gives 
little care to optimizing the execution speed.  It's easier to program, 
but at a cost.



Tom


Post a reply to this message

From: Warp
Subject: Re: Processor speed
Date: 28 Oct 2008 19:13:15
Message: <49079c8b@news.povray.org>
Mike Raiford <"m[raiford]!at"@gmail.com> wrote:
> Not only that, but modern processors can get a significant speed boost 
> based on the order of instructions presented. e.g. if you have 2 
> instructions, one dependent on the previous, it's entirely possible to 
> move an instruction between those two to take advantage of the 
> processor's pipelining.

  Intel processors have reordered instructions automatically since the
Pentium Pro (AFAIR).

-- 
                                                          - Warp


Post a reply to this message

From: Warp
Subject: Re: Processor speed
Date: 28 Oct 2008 19:24:29
Message: <49079f2d@news.povray.org>
Orchid XP v8 <voi### [at] devnull> wrote:
> Interesting. I have always heard that floating-point arithmetic is much 
> slower than integer arithmetic.

  An FPU addition has taken 1 clock cycle since probably the 486 (the
version which had the integrated FPU), or at most the Pentium.

  The first Intel processor to have a 1-clock-cycle FPU multiplication
was, if I remember correctly, the Pentium. (You can believe this requires
a rather large percentage of the entire area of the microchip.)

  Of course the Pentium was not the first processor to have a 1-clock-cycle
floating point multiplication. Many RISC processors had that probably 10
years earlier. It's old technology.

  Division is much more complicated than multiplication, which is why
it has always taken (and will most probably always take) quite many clock
cycles to compute.

  Of course the actual throughput of the FPU in most programs is slower
than that because of all the data which has to be transferred between the
memory and the FPU. You can write specialized code (usually in ASM) which
takes full advantage of the FPU by calculating as much as possible without
having to constantly load and store the values from/to memory, but compilers
are not even today very good at making this kind of optimization. If you
examine the assembler output of a compiler, you will usually notice that
it loads and stores FPU registers *a alot*, often more than would be
necessary.

  This may be one of the reason why XMM, SSE and other SIMD chips have
been developed, ie. so that with a new protocol it would be possible to
write code which utilizes the auxiliary chip better.

> So you're saying they're actually roughly the same speed now?

  When counting in clock cycles they have been about the same speed since
the 486. The FPU is just a bit hindered by a more complicated data transfer
between the memory and the FPU.

-- 
                                                          - Warp


Post a reply to this message

<<< Previous 3 Messages Goto Latest 10 Messages Next 10 Messages >>>

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.