Warp wrote:
> Thorsten Froehlich <tho### [at] trf de> wrote:
>> Look in the gcc or icc standard library implementations how they do it. Or
>> compile and disassemble the compiled code.
<snip>
> I haven't really looked at what the gcc sincos library call is doing,
> but it might well be that it just executes an fsincos opcode, and that
> the time difference is coming from the overhead of the function call.
You asked what a fast SSE trigonometry implementation would look like, not
what code your compiler generates when targeting a P4. So clearly you should
not be looking at the x87 implementation using the fsincos opcode when you
want to know how the SSE code would look like!?!
Thorsten
Post a reply to this message
|