POV-Ray : Newsgroups : povray.beta-test : Radiosity Status: Giving Up... : Re: Radiosity Status: Giving Up... Server Time
29 Jul 2024 00:28:20 EDT (-0400)
  Re: Radiosity Status: Giving Up...  
From: clipka
Date: 2 Jan 2009 13:50:01
Message: <web.495e611fcd9d1e759fcd4c570@news.povray.org>
Warp <war### [at] tagpovrayorg> wrote:
>   And you are missing my point. All you wrote is correct, but irrelevant
> with respect to what I said. Even though what you said is correct, it still
> doesn't make it any more sensical for an OS to deliberately boycott 99% of
> programs out there by restricting their access to a piece of hardware which
> *is* there and is perfectly usable at virtually no cost.

Quotes from some AMD article I just (re-)discovered:



instructions in 64-bit threads."
(http://developer.amd.com/Pages/62720069_4.aspx)

So contrary to how I understood it so far, we're actually only talking about
that brand new 64-bit software, not the 99% of old 32-bit stuff.

(2) "AMD64 64-bit mode doubles the number of XMM registers from eight to
sixteen. [...] This option makes it much easier to create efficient sequences
of arithmetical operations and for compilers to produce highly optimized code.
As an example, math library implementation of trigonometric functions such as
sin , cos , and tan can somewhat surprisingly, significantly outperform the x87
built-in instructions for these functions."

So according to AMD, 64-bit mode comes with extensions that make the x87 FPU
obsolete. (Which explains why your P4 tests indicated otherwise.)

Hard to believe, but not impossible. For example, the AMD Phenom processor docs
specify a latency of 93 clock cycles for the x87 FCOS command. Some paper I
found on the net about some highly sophisticated algorithm for various
transcendental functions including cos gives 124 clock cycles for a
hypothetical processor, and 276 clock cycles for a real-world RISC processor
(http://perso.ens-lyon.fr/nathalie.revol/publis/RY00.pdf). Obviously even this
highly sophisticated algorithm takes more steps than the x87 FPU needs clock
cycles - but then again it may be possible to parallelize enough of these steps
to get from 124 to below 93.


So for 64-bit apps, it may indeed be of advantage to use SSE2 instead of the x87
FPU ("somewhat surprisingly", as AMD states). And 32-bit apps do not seem to be
in any danger of losing x87 FPU support any time soon.


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.