POV-Ray: Newsgroups: povray.general: Real benefit of a 64 bit Pov binary on a 64 bit CPU in a 64 bit opsys?

POV-Ray : Newsgroups : povray.general : Real benefit of a 64 bit Pov binary on a 64 bit CPU in a 64 bit opsys?		Server Time 29 Oct 2025 11:16:34 EDT (-0400)

<<< Previous 10 Messages

Goto Initial 10 Messages

From: Nicolas George
Subject: Re: Real benefit of a 64 bit Pov binary on a 64 bit CPU in a 64 bit opsys?
Date: 27 Aug 2006 22:06:08
Message: <44f24f90@news.povray.org>

Warp  wrote in message <44f2271e@news.povray.org>:
>   I don't know if this makes a difference in speed in different processors,
> but it *might* be that the AMD64 can smartly perform those operations
> (which always yield '0' as the answer) faster with its 64-bit-mode
> enhancements. But not necessarily. It's just a guess.

No, it does not chance anything at all.

I made that benchmark in the first time to compare floating point arithmetic
and 64 bits integer arithmetic, to select which one to use for a situation
where both would be acceptable. At first, I just used "r += i * i", but I
added the division to get a result that fits in 64 bits, and to be able to
compare it to a trusted result. Adding it barely changed the timings at all.

>   The 'register' keyword is a no-op. It doesn't do anything and is just
> a useless backwards-compatible drag of history.

I tried it because I suspected the 80 bits / 64 bits problem: maybe gcc
would feel freer to keep the double in 80 bits precision. But it did not
change anything.

Post a reply to this message

From: Warp
Subject: Re: Real benefit of a 64 bit Pov binary on a 64 bit CPU in a 64 bit opsys?
Date: 27 Aug 2006 22:16:03
Message: <44f251e2@news.povray.org>

Nicolas George <nicolas$george@salle-s.org> wrote:
> I tried it because I suspected the 80 bits / 64 bits problem: maybe gcc
> would feel freer to keep the double in 80 bits precision.

  'double' is 64-bit in x86 architectures (and in fact in most other
architectures too). If you want a 80-bit fp number, use 'long double'.

> But it did not change anything.

  As I said, the 'register' keyword is a no-op. I would bet that most
if not all modern compilers simply ignore it completely (except when
checking for correct syntactical usage, just for backwards compatibility;
after that check they most probably just drop it completely).

-- 
                                                          - Warp

Post a reply to this message

From: Thorsten Froehlich
Subject: Re: Real benefit of a 64 bit Pov binary on a 64 bit CPU in a 64 bit opsys?
Date: 28 Aug 2006 08:24:35
Message: <44f2e083$1@news.povray.org>

Nicolas George wrote:
> I made that benchmark in the first time to compare floating point arithmetic
> and 64 bits integer arithmetic, to select which one to use for a situation
> where both would be acceptable. At first, I just used "r += i * i", but I
> added the division to get a result that fits in 64 bits, and to be able to
> compare it to a trusted result. Adding it barely changed the timings at all.

You do realise that "unsigned" declares an "unsigned int", which will be 32
bit in both 32 and 64 bit modes of compilation? Thus, the only thing your
code as posted tests is how well the compiler handles the slowest operations
in there, which are the 32 bit integer division and the conversion of a 32
bit integer to a double. Neither of which tell you much about performance of
anything but those two specific operations, which are special cases in and
by themselves.

In fact, the integer division is slow due to the iterative algorithm (in
hardware!) needed to execute it. It would in fact be slower for most 64 bit
integers assuming the numbers you divide are sufficiently big (outside 32
bit range) because the time an integer division takes is not constant but
rather depends on the number of one- and zero-bits with any reasonable
processor released in the last three decades.

	Thorsten

Post a reply to this message

From: Thorsten Froehlich
Subject: Re: Real benefit of a 64 bit Pov binary on a 64 bit CPU in a 64 bit opsys?
Date: 28 Aug 2006 08:38:28
Message: <44f2e3c4$1@news.povray.org>

Warp wrote:
>   As I said, the 'register' keyword is a no-op. I would bet that most
> if not all modern compilers simply ignore it completely (except when
> checking for correct syntactical usage, just for backwards compatibility;
> after that check they most probably just drop it completely).

Please be advised that your statement is incorrect. 'register' is not
defined to be a "no-op" at all. See for example ISO/IEC 14882:2003 (or 1998
if you like, either way it is the ISO C++ standard) section 7.1.1 number 3.
Specifically, it is a hint to the compiler that "the object so declared will
be heavily used". Of course, it is true that some modern compilers
completely ignore it, but that is more due to their register allocators not
being able to handle external priority hints rather than the keyword being
specified to be a "no-op" anywhere. Also note that is is completely legal to
declare a float or double as 'register', it I completely reasonable to
expect a compiler to be able to use this hint to optimise the usage of
floating-point registers (except that a x87 FPU does not have registers but
a stack, most RISC processors do of course have a full complement of
floating-point registers).

	Thorsten

Post a reply to this message

From: Warp
Subject: Re: Real benefit of a 64 bit Pov binary on a 64 bit CPU in a 64 bit opsys?
Date: 28 Aug 2006 09:07:08
Message: <44f2ea7c@news.povray.org>

Thorsten Froehlich <tho### [at] trfde> wrote:
> Please be advised that your statement is incorrect. 'register' is not
> defined to be a "no-op" at all. See for example ISO/IEC 14882:2003 (or 1998
> if you like, either way it is the ISO C++ standard) section 7.1.1 number 3.
> Specifically, it is a hint to the compiler that "the object so declared will
> be heavily used".

  I meant "it's in practice a no-op". I didn't mean it's "officially a
deprecated keyword".

  Of course in theory it gives a hint to the compiler, and in really old
C compilers it really had an effect, but AFAIK compilers have ignored it
completely for over a decade. They will use their own internal optimization
algorithms regardless of that keyword.

  It's the same as with the 'inline' keyword with regard to optimization:
When optimizing, compilers usually completely ignore that keyword. It doesn't
have any effect on the output. (But of course 'inline' has other meaningful
uses which makes it a non-no-op.)

-- 
                                                          - Warp

Post a reply to this message

From: Warp
Subject: Re: Real benefit of a 64 bit Pov binary on a 64 bit CPU in a 64 bit opsys?
Date: 28 Aug 2006 09:10:11
Message: <44f2eb33@news.povray.org>

Thorsten Froehlich <tho### [at] trfde> wrote:
> In fact, the integer division is slow due to the iterative algorithm (in
> hardware!) needed to execute it. It would in fact be slower for most 64 bit
> integers assuming the numbers you divide are sufficiently big (outside 32
> bit range) because the time an integer division takes is not constant but
> rather depends on the number of one- and zero-bits with any reasonable
> processor released in the last three decades.

  Are you sure it's an iterative algorithm at transistor level?

  One would think that, for example, adding two integers is iterative
(how else could you know if the sum of two bits overflows, thus creating
the need to add an additional 1 to the next bit?). However, it is
perfectly possible to add integers of any amount of bits in 1 clock cycle.

-- 
                                                          - Warp

Post a reply to this message

From: Thorsten Froehlich
Subject: Re: Real benefit of a 64 bit Pov binary on a 64 bit CPU in a 64 bit opsys?
Date: 28 Aug 2006 09:42:30
Message: <44f2f2c6$1@news.povray.org>

Warp wrote:
> Thorsten Froehlich <tho### [at] trfde> wrote:
>> In fact, the integer division is slow due to the iterative algorithm (in
>> hardware!) needed to execute it. It would in fact be slower for most 64 bit
>> integers assuming the numbers you divide are sufficiently big (outside 32
>> bit range) because the time an integer division takes is not constant but
>> rather depends on the number of one- and zero-bits with any reasonable
>> processor released in the last three decades.
> 
>   Are you sure it's an iterative algorithm at transistor level?

Yes. I don't have a good reference at hand, but a very quick search reveals
<http://www.intel.com/technology/itj/2006/volume10issue02/art01_Intro_to_Core_Duo/p03_improved_cores.htm>.
Just search for "IDIV" in that document and notice the remark about the
algorithm being iterative and them employing "early exit", which accounts
for the variable execution time (aka number of zeros and ones in the input).
Given that this talks about Intel's most modern CPU core, implications
should be clear.

Of course, all this is much better explained in books about computer
hardware (plenty of those around). The good ones present several variations
of algorithms available in detail. However, in essence the algorithms remain
rather similar to a binary version of the common decimal "by hand" division
method we all once learned in school. As you may recall, that "algorithm" is
also iterative ;-)

	Thorsten

Post a reply to this message

From: Nicolas George
Subject: Re: Real benefit of a 64 bit Pov binary on a 64 bit CPU in a 64 bit opsys?
Date: 28 Aug 2006 10:53:31
Message: <44f3036b@news.povray.org>

Thorsten Froehlich  wrote in message <44f2e083$1@news.povray.org>:
> You do realise that "unsigned" declares an "unsigned int", which will be 32
> bit in both 32 and 64 bit modes of compilation? Thus, the only thing your
> code as posted tests is how well the compiler handles the slowest operations
> in there, which are the 32 bit integer division and the conversion of a 32
> bit integer to a double. Neither of which tell you much about performance of
> anything but those two specific operations, which are special cases in and
> by themselves.


served me. I stumbled on the difference for floats as a side effect, and
just reported it in this thread because it seemed relevant.

After digging in the assembly code, I found the exact cause, which is a
combination of architecture improvements and bad compiler optimisation. As
you guessed, the bottleneck is the integer to float conversion (the
division, on the other hand, is meaningless, since it is the very same in 32
and 64 bits).

In 64 bits, gcc uses cvtsi2sdq to convert the integer into a SSE2 register.
In 32 bits, the default is to use the FPU, and the conversion goes through
the stack; in 32 bits with SSE2, the conversion goes through the stack,
through the FPU and again through the stack, which is a (known) very bad
optimisation by gcc. Converting to use cvtsi2sd gives results similar to the
64 bits version.

As a conclusion, it is possible to sum up with:

- (Already explained here) 64 bits processors are also a new generation of
  processors, with new features and a better architecture, that lead to
  better performances both in 64 bits and 32 bits mode.

- (New remark) 64 bits processors need a rewrite of the compiler's code
  generation phase, that can provide improved usage of the new features.



and linear correlation coefficient of video image, where the most repeated
operation is something like "t += *d * *d". Since there is a very large
number of pixels, t needs to be either a 64 bits integer or a double
precision float. I found that 64 bits integer were slightly faster on 64
bits CPU, and not too much slower on 32 bits CPU.

Post a reply to this message

<<< Previous 10 Messages

Goto Initial 10 Messages