POV-Ray: Newsgroups: povray.unofficial.patches: SSE2 optymalization of Intersect_Triangle function: Re: SSE2 optymalization of Intersect

POV-Ray : Newsgroups : povray.unofficial.patches : SSE2 optymalization of Intersect_Triangle function : Re: SSE2 optymalization of Intersect_Triangle function		Server Time 12 Jul 2025 10:08:01 EDT (-0400)

From: Nicolas Calimet
Date: 26 Jan 2005 10:00:48
Message: <41f7b0a0@news.povray.org>

> We made this little patch for Pov-Ray. It's optymalized version of
> Intersect_Triangle function with SSE2.

	As Warp suggests, it would be really interesting to see some
actual speedup demonstration of using your assembler code rather
than that generated by gcc on a Pentium 4 machine (with the optimi-
zation flags that ./configure sets for it).  You claim a 20% speedup,
which seems reasonnable but need to be supported by reproducible
test cases.  I will try it myself if time permits (also on an
k8 architecture).

	A few general comments after a very quick look at your code
(I'm not an assembler guru though):

- your assembly code looks a lot like what gcc-3.4.2 outputs, but
I didn't check things very carefully: could you point out what you
did optimize?
- seperating Intersect_Triangle() away from triangle.cpp make you loose
the inlining gcc does of it and of the other function calls within
All_Triangle_Intersections();
- you seem to call fabs where apparently gcc inlines the corresponding
assembly code too.

	Overall the way you proceed with this optimization could be also
optimized itself, by e.g. inlining assembly code within triangle.cpp
(here you kind of mess up with the build system to insert your own code).
Also that should save you writing some unecessary code related to e.g.
the triangle structs.  If you code does improve speed as you suggest,
I'd be interested to see a rewrite of your patch according to the points
mentionned above.

	- NC

Post a reply to this message