POV-Ray : Newsgroups : povray.general : Real benefit of a 64 bit Pov binary on a 64 bit CPU in a 64 bit opsys? : Re: Real benefit of a 64 bit Pov binary on a 64 bit CPU in a 64 bit opsys? Server Time
1 Aug 2024 04:13:25 EDT (-0400)
  Re: Real benefit of a 64 bit Pov binary on a 64 bit CPU in a 64 bit opsys?  
From: Nicolas George
Date: 28 Aug 2006 10:53:31
Message: <44f3036b@news.povray.org>
Thorsten Froehlich  wrote in message <44f2e083$1@news.povray.org>:
> You do realise that "unsigned" declares an "unsigned int", which will be 32
> bit in both 32 and 64 bit modes of compilation? Thus, the only thing your
> code as posted tests is how well the compiler handles the slowest operations
> in there, which are the 32 bit integer division and the conversion of a 32
> bit integer to a double. Neither of which tell you much about performance of
> anything but those two specific operations, which are special cases in and
> by themselves.


served me. I stumbled on the difference for floats as a side effect, and
just reported it in this thread because it seemed relevant.

After digging in the assembly code, I found the exact cause, which is a
combination of architecture improvements and bad compiler optimisation. As
you guessed, the bottleneck is the integer to float conversion (the
division, on the other hand, is meaningless, since it is the very same in 32
and 64 bits).

In 64 bits, gcc uses cvtsi2sdq to convert the integer into a SSE2 register.
In 32 bits, the default is to use the FPU, and the conversion goes through
the stack; in 32 bits with SSE2, the conversion goes through the stack,
through the FPU and again through the stack, which is a (known) very bad
optimisation by gcc. Converting to use cvtsi2sd gives results similar to the
64 bits version.

As a conclusion, it is possible to sum up with:

- (Already explained here) 64 bits processors are also a new generation of
  processors, with new features and a better architecture, that lead to
  better performances both in 64 bits and 32 bits mode.

- (New remark) 64 bits processors need a rewrite of the compiler's code
  generation phase, that can provide improved usage of the new features.



and linear correlation coefficient of video image, where the most repeated
operation is something like "t += *d * *d". Since there is a very large
number of pixels, t needs to be either a 64 bits integer or a double
precision float. I found that 64 bits integer were slightly faster on 64
bits CPU, and not too much slower on 32 bits CPU.


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.