|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On Fri, 02 Aug 2002 20:58:33 -0400, Spider wrote:
> export CFLAGS="optimziation"
Are you quite sure on your spelling, there? ;-)
-Mark Gordon
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Apache wrote:
> If people are continue improving algorithmic changes that are compiler and
> platform independent, wouldn't it be a good thing to keep track of those
> changes and work on a 3.5.1 or a 3.6 version?
>
>
Yea and it seems like the p.programming newsgroup helps with that.
You've probably already noticed it but if not, Micha has posted the
patch he was talking about to that newsgroup.
-Roz
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
In article <3d4b8aec@news.povray.org>, Micha Riser <mri### [at] gmxnet>
wrote:
> You probably mean Sqr((sin(EPoint[X])+sin(EPoint[Y])+sin(EPoint[Z]))/3);
Yeah, I screwed up the parentheses and the ".0" isn't necessary.
> Temporary variables are do not affect the performance mostly though. The
> has to use several registers anyways.
I know, and a good compiler would probably optimize it to the same
machine code, but there is no reason to use them, not even readability.
(Do modern compilers even pay any attention to "register"?)
My point was it doesn't conform to any kind of "strict guidelines" other
than maximum compatibility (it was apparently for compiling with an old
386 compiler). The "noise" variable I just don't understand...I guess it
helped some compiler with optimization or something.
> But to show the vector nature of this calculation it should be
> written as:
> DBL value=0;
> for(int i=0; i<3; i++) value+=sin(EPoint[i]);
> return Sqr(value/3.0);
>
> Of course this assumes that X,Y,Z are 0-2 index.
You mean for helping the compiler detect something that can be
vectorized and doing it automatically? It won't pick out the possibility
in the one-line version?
Would it get this version?
DBL value = 0;
value += sin(EPoint[0]);
value += sin(EPoint[1]);
value += sin(EPoint[2]);
return Sqr(value/3.0);
Or this (assuming a struct with x, y, and z components):
value += sin(EPoint.x);
value += sin(EPoint.y);
value += sin(EPoint.z);
I'm not surprised the for loop wasn't used in the existing
version...SIMD stuff wasn't even a factor, so of course the code wasn't
designed for it, and compilers probably weren't good enough at
optimizing to get rid of the for() loop.
--
Christopher James Huff <chr### [at] maccom>
POV-Ray TAG e-mail: chr### [at] tagpovrayorg
TAG web site: http://tag.povray.org/
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Christopher James Huff wrote:
>
> You mean for helping the compiler detect something that can be
> vectorized and doing it automatically? It won't pick out the possibility
> in the one-line version?
> Would it get this version?
> DBL value = 0;
> value += sin(EPoint[0]);
> value += sin(EPoint[1]);
> value += sin(EPoint[2]);
> return Sqr(value/3.0);
No, unfortunately the Intel compiler (the only one that I have which does
vectorisation) does not recoginze this. You explicitly have to use a loop.
I have done rewriting vector.h in such a way. But I have no Pentium4 to
test it :(
> I'm not surprised the for loop wasn't used in the existing
> version...SIMD stuff wasn't even a factor, so of course the code wasn't
> designed for it, and compilers probably weren't good enough at
> optimizing to get rid of the for() loop.
Todays g++ 3.1 does a good job in loop-unrolling. But with my modified
loop-using 'vector.h' it does still produce a slightly slower code (OK,
maybe I have also made some mistakes in the converting..)
- Micha
--
http://objects.povworld.org - the POV-Ray Objects Collection
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Micha Riser wrote:
>
> loop-using 'vector.h' it does still produce a slightly slower code (OK,
I have to correct this. The increasing number of background applications
had influenced the testing. g++ 3.1 is equally fast when using a 'looped'
vector.h.
- Micha
--
http://objects.povworld.org - the POV-Ray Objects Collection
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
I only remembered an article written by Thorsten about the issue. I don't
remember the specifics.
--
#macro N(D)#if(D>99)cylinder{M()#local D=div(D,104);M().5,2pigment{rgb M()}}
N(D)#end#end#macro M()<mod(D,13)-6mod(div(D,13)8)-3,10>#end blob{
N(11117333955)N(4254934330)N(3900569407)N(7382340)N(3358)N(970)}// - Warp -
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
In article <3d4c2076@news.povray.org>, Micha Riser <mri### [at] gmxnet>
wrote:
> No, unfortunately the Intel compiler (the only one that I have which does
> vectorisation) does not recoginze this. You explicitly have to use a loop.
> I have done rewriting vector.h in such a way. But I have no Pentium4 to
> test it :(
So it will only help operations on arrays then...seems like a stupid
limitation. There isn't some compiler directive that could tell it what
can be optimized?
> Todays g++ 3.1 does a good job in loop-unrolling. But with my modified
> loop-using 'vector.h' it does still produce a slightly slower code (OK,
> maybe I have also made some mistakes in the converting..)
Well, todays g++ 3.1 didn't exist 10 years ago. ;-)
With a modern compiler, I wouldn't expect it to be much slower, but how
could you get any improvement? The SIMD instructions can't handle double
precision math as far as I know...they would help colors, but not
vectors.
--
Christopher James Huff <chr### [at] maccom>
POV-Ray TAG e-mail: chr### [at] tagpovrayorg
TAG web site: http://tag.povray.org/
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Christopher James Huff wrote:
> So it will only help operations on arrays then...seems like a stupid
> limitation. There isn't some compiler directive that could tell it what
> can be optimized?
SIMD only work on contignous blocks of memory.. so you're likely to use
arrays when the're useful.
>
> With a modern compiler, I wouldn't expect it to be much slower, but how
> could you get any improvement? The SIMD instructions can't handle double
> precision math as far as I know...they would help colors, but not
> vectors.
Yes, SSE can help for colour calculations. With it you can do operations on
4 floats simutanously. But SSE2 (which pentium4 and 64-bit AMD support)
will work on double precision! That's why I am looking for someone with a
pentium4...
- Micha
--
http://objects.povworld.org - the POV-Ray Objects Collection
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |