|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Thorsten Froehlich wrote:
> In article <3d05de17@news.povray.org> , Thomas Willhalm
> <tho### [at] uni-konstanzde> wrote:
>
>> icc 6 IV -O3 -tpp7 -xW -unroll -ip
>
> How about adding any one of these: "-ipo", "-wp_ipo", "-prefetch", "-rcd"?
For some strange reason, icc (version 6) doesn't recognize the option
"-prefetch" although it is listed in the documentation.
If I compile with "-rcd", I get a segmentation fault when I try to run the
program. This doesn't imply that icc is buggy but it may also be the case
that some of the experimental code in megapovplus is not clean.
So, I've compiled it with -wp_ipo and should be able to post the result
tomorrow morning.
By the way, I tried to avoid dependencies from the OS provided libraries by
switching off the display and file output.
Thomas
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Ole Laursen wrote:
> "Thorsten Froehlich" <tho### [at] trfde> writes:
>> In article <87l### [at] bachcomposers> , Ole Laursen
>>
>> This is under Linux, of course. It may just be a library issue that
>> doesn't
>> exist under Windows. gcc surely is not such a good compiler...
>
> This is povray.unix, so who cares how ICC performs on Windows? Get a
> real operating system. :-)
Didn't Thorsten work on a Mac, anyway? :-)
> Anyway, AFAIK ICC doesn't include a C library, so they use the same
> libraries.
There are some dependencies:
$ ldd megapovplus
libvgagl.so.1 => /usr/lib/libvgagl.so.1 (0x40026000)
libvga.so.1 => /usr/lib/libvga.so.1 (0x40035000)
libz.so.1 => /lib/libz.so.1 (0x4008d000)
libpng.so.2 => /usr/lib/libpng.so.2 (0x4009c000)
libm.so.6 => /lib/libm.so.6 (0x400ce000)
libX11.so.6 => /usr/X11R6/lib/libX11.so.6 (0x400f0000)
libcxa.so.1 => /opt/intel/compiler60/ia32/lib/libcxa.so.1
(0x401b1000)
libc.so.6 => /lib/libc.so.6 (0x4021f000)
libdl.so.2 => /lib/libdl.so.2 (0x40345000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
Thomas
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Ole Laursen wrote:
> It was rendering time in seconds - and you probably want shorter
> rendering times, right? :-)
I am currently doing a lot of benchmarking work and everything is starting
to look "loops per second"... even my food!
--
Alessandro Coppo
a.coppo@<REMOVE_ME>iol.it
www.geocities.com/alexcoppo
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Thomas Willhalm wrote:
>
> So, I've compiled it with -wp_ipo and should be able to post the result
> tomorrow morning.
The rendering times were almost the same - even slighlty longer.
(IMO a variation of one or two percent should be accepted.)
Thomas
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
"Thorsten Froehlich" <tho### [at] trfde> wrote:
> This is under Linux, of course. It may just be a library issue that
doesn't
> exist under Windows. gcc surely is not such a good compiler...
... but not such a bad compiler, either. In general, VC surpasses GCC in
terms of code quality (especially fp), of course, but consider this sample
code (I stumble upon such things, at times, as I actively use both gcc and
VC++), gcc 2.95.3-5 vs. MSC/C++ 12.00.8168 (VS 6.0):
----- test.c:
typedef struct { short a; char b, c; } D;
D func( void )
{
D d = { 1, 2, 3 };
return d;
}
----- test.bat:
@echo off
gcc -c -S -O2 -fomit-frame-pointer test.c
cl -c -FA -Ox -nologo test.c
----- test.s:
_func:
movl $50462721,%eax
ret
----- test.asm (_d$ = -4):
_func PROC NEAR
push ecx
mov WORD PTR _d$[esp+4], 1
mov BYTE PTR _d$[esp+6], 2
mov BYTE PTR _d$[esp+7], 3
mov eax, DWORD PTR _d$[esp+4]
pop ecx
ret 0
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On Wed, 12 Jun 2002 10:40:12 +0200, Thomas Willhalm wrote:
> Thorsten Froehlich wrote:
>
>> In article <3d05de17@news.povray.org> , Thomas Willhalm
>> <tho### [at] uni-konstanzde> wrote:
>>
>>> finally, I've found the time to compare the different compilations of
>>> povray on a Pentium IV. I used megapovplus and modified povbench.pov
>>> from povray 3.5 beta to run on it.
>>>
>>> Running time in seconds:
>>> P-IV Athlon
>>> gcc 2.95.3 13354 7035
>>> gcc 3.0.1 11319 6555
>>> gcc 3.1 8971 5901
>>> icc 6 15907 5679
>>> icc 6 IV 10589
>>
>> What are the results of the Windows version on the same system?
>
> I'm sorry. Windows isn't installed on any of these computers. (Well, to
> be completely honest, there is Vmware running Windows NT4 on the Athlon,
> but IMHO a benchmark of this won't tell us anything.)
>
> THomas
vmware dose have very low overhead for some type of programs.
If you render without output the only overhead you get is pagefault
and that should not be more than 1-4 percent from native speed.
Once you put things on the screen or use system functions like writing/
reading to disk you loose.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
theese numbers were quite interesting, could you please run one more?
Try CFLAGS="-march=athlon-xp -O3 -finline-functions -ffast-math
-foptimize-sibling-calls -ansi -march=i686 -DCPU=686" for the athlon-xp
and see if it differs some or more from the output with march=i686
gcc 3.0.x had march=athlon
gcc 3.1 had march=athlon-tbird, athlon-xp, athlon-mp and athlon-4. I
havent dug any deeper to see just what is different between them, except
for some submodel changes.
antoher thing to be tested would be -mfpmath="sse,387" which will attempt
to use bothe the sse and the i387 fp engine at the same time, thus
doubling(!) the amount of registers accessible. I've got a slight feeling
this may do some interesting things for applications like POV .
also, since we're not using debugging here, it should be considered to use
-fomit-frame-pointer on gcc, thus freeing up another register, not always
desirable or noticeable in desktop applications, but this is a "special
case" so it should be ok :)
Regards,
Spider
...a memory...
On Tue, 11 Jun 2002 13:26:34 +0200, Thomas Willhalm wrote:
> Hello,
>
> finally, I've found the time to compare the different compilations of
> povray on a Pentium IV. I used megapovplus and modified povbench.pov from
> povray 3.5 beta to run on it.
>
> Running time in seconds:
> P-IV Athlon
> gcc 2.95.3 13354 7035
> gcc 3.0.1 11319 6555
> gcc 3.1 8971 5901
> icc 6 15907 5679
> icc 6 IV 10589
>
> "P-IV" is a Intel(R) Pentium(R) 4 CPU 1.60GHz
> running SuSE Linux with kernel 2.4.16-4GB
>
> "Athlon" is a AMD Athlon(TM) XP 1500+ (1343.051 MHz)
> running SuSE Linux with kernel 2.4.10-4GB
>
> Compiling options were:
> gcc 2.95.3 -O3 -finline-functions -ffast-math -ansi -march=i686 -DCPU=686
> gcc 3.0.1 -O3 -finline-functions -ffast-math -foptimize-sibling-calls
> -ansi -march=i686 -DCPU=686
> gcc 3.1 -O3 -finline-functions -ffast-math -foptimize-sibling-calls
> -ansi -march=i686 -DCPU=686
> icc 6 -O3 -tpp6 -xK -unroll -ip
> icc 6 IV -O3 -tpp7 -xW -unroll -ip
>
> The last version is optimized for Pentium-IV. That's why the binary doesn't
> run on the Athlon.
>
> Best regards
> Thomas
--
begin .signature
This is a .signature virus! Please copy me into your .signature!
See Microsoft KB Article Q265230 for more information.
end
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Spider wrote:
> theese numbers were quite interesting, could you please run one more?
> Try CFLAGS="-march=athlon-xp -O3 -finline-functions -ffast-math
> -foptimize-sibling-calls -ansi -march=i686 -DCPU=686" for the athlon-xp
> and see if it differs some or more from the output with march=i686
> antoher thing to be tested would be -mfpmath="sse,387" which will attempt
> also, since we're not using debugging here, it should be considered to use
> -fomit-frame-pointer on gcc, thus freeing up another register, not always
> desirable or noticeable in desktop applications, but this is a "special
> case" so it should be ok :)
Good points, at least the running time says so:
Running time in seconds:
gcc 2.95.3 7048s
gcc 3.0.1 6574s
gcc 3.1 5908s
gcc 3.1 5749s (new options)
icc 6 5699s
For the records: It's a AMD Athlon(TM) XP 1500+ (1343.051 MHz)
running SuSE Linux with kernel 2.4.10-4GB
Compiling options were:
gcc 2.95.3
-O3 -finline-functions -ffast-math -ansi -march=i686 -DCPU=686
gcc 3.0.1
-O3 -finline-functions -ffast-math -foptimize-sibling-calls -ansi
-march=i686 -DCPU=686
gcc 3.1
-O3 -finline-functions -ffast-math -foptimize-sibling-calls
-ansi -march=i686 -DCPU=686
gcc 3.1 (new options)
-march=athlon-xp -O3 -finline-functions -ffast-math
-foptimize-sibling-calls
-DCPU=686 -mfpmath="sse,387" -fomit-frame-pointer
icc 6
-O3 -tpp6 -xK -unroll -ip
Best regards
Thomas
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
begin quote
On Mon, 01 Jul 2002 10:01:17 +0200
Thomas Willhalm <tho### [at] uni-konstanzde> wrote:
>
> > also, since we're not using debugging here, it should be considered
> > to use-fomit-frame-pointer on gcc, thus freeing up another register,
> > not always desirable or noticeable in desktop applications, but this
> > is a "special case" so it should be ok :)
>
> Good points, at least the running time says so:
> Running time in seconds:
>
> gcc 2.95.3 7048s
> gcc 3.0.1 6574s
> gcc 3.1 5908s
> gcc 3.1 5749s (new options)
> icc 6 5699s
>
Hmm, Thats an interesting change in runtime, still not down at icc's
level, which may not be possible either, but its definitely closing in
here :)
since both gcc 3.1 and ICC support Profile Guided Optimization, that
could be another interesting thing to do tests on, although this may
border on doing it merely to get the most possible instead of doing it
for the usability of it ;)
Heres an excerpt from our ebuild where we use icc pgo and normal icc,
I'm not editing this since the comments may be nice for others who
follow this thread, please note, this is not copyrighted by me, but
GPL'ed (cute isnt it) where the copyright is to Gentoo Technologies Inc.
if [ "`use icc`" ]; then
# ICC CFLAGS
echo "s/gcc/icc/" >> makefile.sed
# Should pull from /etc/make.conf
# If you have a P4 add -tpp7 after the -O3
# If you want lean/mean replace -axiMKW with -x? (see icc docs for -x)
# Note: -ipo breaks povray
# Note: -ip breaks povray on a P3
echo "s/^CFLAGS =/CFLAGS = -O3 -axiMKW /" >> makefile.sed
# This is optimized for my Pentium 2:
#echo "s/^CFLAGS =/CFLAGS = -O3 -xM -ip /" >> makefile.sed
# This is optimized for Pentium 3 (semi-untested, I don't own one):
#echo "s/^CFLAGS =/CFLAGS = -O3 -xK /" >> makefile.sed
# This is optimized for Pentium 4 (untested, I don't own one):
#echo "s/^CFLAGS =/CFLAGS = -O3 -xW -ip -tpp7 /" >> makefile.sed
if [ "`use icc-pgo`" ]; then
IPD=${BUILDDIR}/icc-pgo
echo "s:^CFLAGS =:CFLAGS = -prof_dir ${IPD} :" >> makefile.sed
if [ ! -d "${IPD}" ]; then
mkdir -m 777 -p ${IPD}
echo "s/^CFLAGS =/CFLAGS = -prof_gen /" >> makefile.sed
else
echo "s/^CFLAGS =/CFLAGS = -prof_use /" >> makefile.sed
fi
fi
else
# GCC CFLAGS
echo "s/^CFLAGS =/CFLAGS = -finline-functions -ffast-math /" >>
makefile.sed
echo "s/^CFLAGS =/CFLAGS = ${CFLAGS} /" >> makefile.sed
fi
sed -f makefile.sed makefile.orig > makefile
Well, this should be pretty much selfexplaining. BUILDIR is where the
data is stored and our compile time data as well.
this is re-edited as to not be wrapped too much in mail, but it still
needs checking for this.
//Spider
--
begin .signature
This is a .signature virus! Please copy me into your .signature!
See Microsoft KB Article Q265230 for more information.
end
Post a reply to this message
Attachments:
Download 'us-ascii' (1 KB)
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On Wed, 12 Jun 2002 07:18:23 +1200, Nicolas Calimet wrote:
> Interesting. Looks like the last gcc is worth installing.
> Well, I would have been glad to see gcc-3.0.4 intead of 3.0.1 :o) Do
> you have any idea about the advantage of -foptimize-sibling-calls ? And
> what is icc by the way ?
Hi all, I just installed gcc3.1, and rebuilt povray 3.1 (from a rh7.2 src
rpm package)
On a scene I am playing with, rendertime is 124 seconds, previous version
built with gcc3.0.4 was 144 seconds. It's still a small scene, but
thats an aproximate 20% speed up on a PII300.
Just another benifit of open source software...
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|