POV-Ray : Newsgroups : povray.beta-test : Radiosity Status: Giving Up... : Re: Radiosity Status: Giving Up... Server Time
29 Jul 2024 08:12:21 EDT (-0400)
  Re: Radiosity Status: Giving Up...  
From: Warp
Date: 1 Jan 2009 06:38:35
Message: <495cab3b@news.povray.org>
Thorsten Froehlich <tho### [at] trfde> wrote:
> Look in the gcc or icc standard library implementations how they do it. Or 
> compile and disassemble the compiled code.

  Maybe it's different in a x86-64 architecture, but at least in my P4,
according to my tests, if the compiler uses the FPU opcodes it will be
faster than anything else.

  I made a little function to try to test the speed of trigonometric
functions with different compiler options:

double foo(double d)
{
    double retval = 0;
    for(double angle = 0; angle <= d; angle += 0.0001)
        retval += std::sin(angle) + std::cos(angle);
    return retval;
}

  Then I call it with "foo(10000);"

  It performs quite many other operations as well, so the trigonometric
functions get a bit buried among the others. However, it still produces
measurably differences in execution speed with different options.

  I use the optimization options "-O3 -march=native" for all tests. For
some reason if I use the option "-ffast-math", gcc will produce a direct
fsincos opcode, but if I don't specify it, it will produce a library call
instead. I don't really understand why, but that suits me just fine for
this test. Here are some results (average of 4 runs, rounded to 1 decimal):

-O3 -march=native : 8.2 seconds
-O3 -march=native -ffast-math : 7.1 seconds
-O3 -march=native -mfpmath=sse : 8.1 seconds

  It could be possible to run the test with pure software FP calculations
by using the -msoft-float option, but apparently my gcc (or, more precisely,
libgcc) has not been compiled with the support for it, so I can't test it.
It's a pitty. It would have been interesting to see how much slower it would
have been.

  I haven't really looked at what the gcc sincos library call is doing,
but it might well be that it just executes an fsincos opcode, and that
the time difference is coming from the overhead of the function call.
Anyways, whatever the reason, at least on 32-bit x86 it just seems to
be faster to execute an fsincos directly.

-- 
                                                          - Warp


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.