POV-Ray: Newsgroups: povray.unofficial.patches: PovRay faster

POV-Ray : Newsgroups : povray.unofficial.patches : PovRay faster		Server Time 5 Jul 2025 03:27:17 EDT (-0400)

<<< Previous 10 Messages

Goto Initial 10 Messages

From: Peter J Holzer
Subject: Re: PovRay faster
Date: 4 Feb 2001 12:01:48
Message: <slrn97qvc2.hut.hjp-usenet@teal.h.hjp.at>

On 2001-02-01 16:12, Daniel Jungmann <DSJ### [at] gmxnet> wrote:
>For 1 I need not to rewrite the compiler, I need to write a lot of macros
>( not exactly the same, a lot of work) and I need to analyse the source
>code, because the code must bee "parallel" so the SIMD instructions can
>work.

No, this was possibility number 2, not 1. Please read what people write,
or ask if something isn't expressed clearly enough.

Number 1 was the ideal case where you don't have to change the source
code at all, because the compiler is smart enough to figure out inherent
parallelisms.

Number 2 was the less ideal case, where the compiler knows about SIMD
instructions, but can use them only if the code is arranged in a certain
way. 

In reality, early versions of optimizers will recognize only a few
constructs (situation number 2), and each successive version will
recognize a few more. At the same time, programmers will adapt their
programming style to the capabilities of the optimizers, so the
situation will asymptotically approach situation number 1.

	hp

-- 
   _  | Peter J. Holzer    | All Linux applications run on Solaris,
|_|_) | Sysadmin WSR       | which is our implementation of Linux.
| |   | hjp### [at] wsracat      | 
__/   | http://www.hjp.at/ |	-- Scott McNealy, Dec. 2000

Post a reply to this message

From: Thorsten Froehlich
Subject: Re: PovRay faster
Date: 4 Feb 2001 21:04:09
Message: <3a7e0a19@news.povray.org>

In article <3a7d7dbf@news.povray.org> , "Daniel Jungmann" <DSJ### [at] gmxnet> 
wrote:

> Which compiler does these things?

Every compiler should do basic optimisations.  Sometimes they don't tell you
want exactly they do.  What you are seeking can be called any of the terms
below, but other names or methods are possible [1].  The list of
optimizations below is not complete and not precise, but should give you an
idea what to look for in your compiler documentation.

> 1) loops

- loop-invariant code motion (takes out code that doesn't change in a loop)
- loop unrolling (on expense of code size, reduces branches for loops)

> 2) float divisions (replace mutiple divisions with one division and
> multiplications)

- Replacing division by constants with multiplaction

See important issues regarding applying other optimisations to
floating-point numbers in [1, section 12.3.2]!

> 3) if then / ? :

Taken care of by the intermediate code representation in any compiler.

> 4) integer multiplications and divisions (replace them with shift or
> additions)

- Strength reduction (this is exactly what you are suggesting)
- Algebraic simplification and reassociation (associativity, commutativity,
distributivity)
- Value numbering (eliminating one of two equivalent calculations), note
that some compilers can do this on a global level!

> 5) multiple integer calculations

- constant folding (calculates constants before they are compiled)

In addition, compilers can do things that are hard to do for a human in
assembly language, such as register coloring (reusing registers to hold
different variables that are both in scope but not used at the same time),
instruction scheduling accoring to the processor manufacture's instruction
execution time tables (this is one of the things why you can pick the

instruction scheduling (avoid stalls in processor pipeline).

Now, all these listed optimisations and many more are probably in every
compiler you have, but to be specific, all of them are supported by the
Intel compiler according to [1].  And I am sure most of them (if working
seems to be another matter, as I am told) are in Visual C++, too.

  Thorsten

____________________________________________________
Thorsten Froehlich
e-mail: mac### [at] povrayorg

I am a member of the POV-Ray Team.
Visit POV-Ray on the web: http://mac.povray.org

[1] Muchnick, Steven S., Advanced Compiler Design and Implementation
    ISBN 1-55860-320-4

____________________________________________________
Thorsten Froehlich, Duisburg, Germany
e-mail: tho### [at] trfde

Visit POV-Ray on the web: http://mac.povray.org

Post a reply to this message

From: Thorsten Froehlich
Subject: Re: PovRay faster
Date: 5 Feb 2001 19:56:29
Message: <3a7f4bbd$1@news.povray.org>

In article <3a7e0a19@news.povray.org> , "Thorsten Froehlich" 
<tho### [at] trfde> wrote:

> Now, all these listed optimisations and many more are probably in every
> compiler you have, but to be specific, all of them are supported by the
> Intel compiler according to [1].  And I am sure most of them (if working
> seems to be another matter, as I am told) are in Visual C++, too.

To add to this, Intel actually has a readable and up-to-date feature list on
their site at <http://www.intel.com/software/products/compilers/c50/opt.pdf>


     Thorsten


____________________________________________________
Thorsten Froehlich, Duisburg, Germany
e-mail: tho### [at] trfde

Visit POV-Ray on the web: http://mac.povray.org

Post a reply to this message

From: Warp
Subject: Re: PovRay faster
Date: 6 Feb 2001 08:17:57
Message: <3a7ff985@news.povray.org>

Thorsten Froehlich <tho### [at] trfde> wrote:
: It is a complete waste of time to optimize these by hand.  Every compiler
: can do a much better job on these and the code stays readable on your site!

  This is true, of course.

  It's important to know what kind of code does the compiler generate from
certain operations so that you can concentrate on the important parts when
optimizing and forget the obsolete parts (because the compiler handles them).
  For example, trying to optimize an operation where an integer variable
is multiplied/divided with an integer constant by replacing it with shifts
and or operations is a complete waste of time since the compiler will do
it itself anyways.
  In some platforms it can be even slower to make all those shifts and
ors instead of letting the compiler optimize the code. That is, for example
the compiler-generated code for this:

  a = b*1040;

may be in some platforms faster than the compiler-generated code for this:

  a = (b<<10)|(b<<4);

because the compiler might be able to use some platform-dependant optimizations
for the multiplication which it can't do to the shifts (for example in some
platforms integer multiplication takes just 1 clock while the shifts and the
or take 3 clocks).

  By the way, some times the compiler can go too far in this.
  This is the case with gcc. When generating the assembler code, it will
always convert the multiplication of an integer variable and an integer
constant to shifts and ors/additions/substractions, no matter what is the
constant value.
  For example something like a*123456789 generates about 10-20 assembler
operations.
  One could think that after a certain amount of operations a plain integer
multplication would be faster (specially with current processors).
  Go figure.

-- 
char*i="b[7FK@`3NB6>B:b3O6>:B:b3O6><`3:;8:6f733:>::b?7B>:>^B>C73;S1";
main(_,c,m){for(m=32;c=*i++-49;c&m?puts(""):m)for(_=(
c/4)&7;putchar(m),_--?m:(_=(1<<(c&3))-1,(m^=3)&3););}    /*- Warp -*/

Post a reply to this message

From: Peter J Holzer
Subject: Re: PovRay faster
Date: 6 Feb 2001 18:02:40
Message: <slrn980sf6.fb6.hjp-usenet@teal.h.hjp.at>

On 2001-02-06 13:17, Warp <war### [at] tagpovrayorg> wrote:
>  By the way, some times the compiler can go too far in this.
>  This is the case with gcc. When generating the assembler code, it will
>always convert the multiplication of an integer variable and an integer
>constant to shifts and ors/additions/substractions, no matter what is the
>constant value.
>  For example something like a*123456789 generates about 10-20 assembler
>operations.

Not always. E.g., egcs-2.91.66 for Intel will compile

    int foo (int a) {
	return a*123456789;
    }

    int bar (int a) {
	return a*4;
    }


into:


	    .file	"foo.c"
	    .version	"01.01"
    gcc2_compiled.:
    .text
	    .align 4
    .globl foo
	    .type	 foo,@function
    foo:
	    pushl %ebp
	    movl %esp,%ebp
	    imull $123456789,8(%ebp),%eax
	    leave
	    ret
    .Lfe1:
	    .size	 foo,.Lfe1-foo
	    .align 4
    .globl bar
	    .type	 bar,@function
    bar:
	    pushl %ebp
	    movl %esp,%ebp
	    movl 8(%ebp),%edx
	    leal 0(,%edx,4),%eax
	    leave
	    ret
    .Lfe2:
	    .size	 bar,.Lfe2-bar
	    .ident	"GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)"

As you can see, the first multiplication is translated to an imul
instruction, while the second one is translated to a leal (which is
essentially a shift-and-add operation.

As far as I can remember, gcc has behaved like this since at least the
1.3x releases.

	hp

-- 
   _  | Peter J. Holzer    | All Linux applications run on Solaris,
|_|_) | Sysadmin WSR       | which is our implementation of Linux.
| |   | hjp### [at] wsracat      | 
__/   | http://www.hjp.at/ |	-- Scott McNealy, Dec. 2000

Post a reply to this message

From: Warp
Subject: Re: PovRay faster
Date: 7 Feb 2001 08:19:17
Message: <3a814b55@news.povray.org>

What optimization options did you use?

-- 
char*i="b[7FK@`3NB6>B:b3O6>:B:b3O6><`3:;8:6f733:>::b?7B>:>^B>C73;S1";
main(_,c,m){for(m=32;c=*i++-49;c&m?puts(""):m)for(_=(
c/4)&7;putchar(m),_--?m:(_=(1<<(c&3))-1,(m^=3)&3););}    /*- Warp -*/

Post a reply to this message

From: Peter J Holzer
Subject: Re: PovRay faster
Date: 7 Feb 2001 20:02:23
Message: <slrn983nat.pho.hjp-usenet@teal.h.hjp.at>

On 2001-02-07 13:19, Warp <war### [at] tagpovrayorg> wrote:
>  What optimization options did you use?

-O3, but it produces the same code with -O and -O2. 

Even without any optimization the same imull and leal instructions are
generated, there are only a few superfluous moves and jmps.

	hp

-- 
   _  | Peter J. Holzer    | All Linux applications run on Solaris,
|_|_) | Sysadmin WSR       | which is our implementation of Linux.
| |   | hjp### [at] wsracat      | 
__/   | http://www.hjp.at/ |	-- Scott McNealy, Dec. 2000

Post a reply to this message

From: Warp
Subject: Re: PovRay faster
Date: 8 Feb 2001 07:39:39
Message: <3a82938a@news.povray.org>

Peter J. Holzer <hjp### [at] hjpat> wrote:
: Even without any optimization the same imull and leal instructions are
: generated, there are only a few superfluous moves and jmps.

  It may be that you have a newer version of gcc than I do (the gcc version
here is actually quite old).

-- 
char*i="b[7FK@`3NB6>B:b3O6>:B:b3O6><`3:;8:6f733:>::b?7B>:>^B>C73;S1";
main(_,c,m){for(m=32;c=*i++-49;c&m?puts(""):m)for(_=(
c/4)&7;putchar(m),_--?m:(_=(1<<(c&3))-1,(m^=3)&3););}    /*- Warp -*/

Post a reply to this message

From: Thorsten Froehlich
Subject: Re: PovRay faster
Date: 8 Feb 2001 09:46:40
Message: <3a82b150$1@news.povray.org>

In article <3a82938a@news.povray.org> , Warp <war### [at] tagpovrayorg>  wrote:

> Peter J. Holzer <hjp### [at] hjpat> wrote:
> : Even without any optimization the same imull and leal instructions are
> : generated, there are only a few superfluous moves and jmps.
>
>   It may be that you have a newer version of gcc than I do (the gcc version
> here is actually quite old).

gcc 2.95.2 is current (and has been for years).  There seems to be going to
be an update (finally) - version 3.0 within a few weeks according to the
official gcc website.


      Thorsten


____________________________________________________
Thorsten Froehlich, Duisburg, Germany
e-mail: tho### [at] trfde

Visit POV-Ray on the web: http://mac.povray.org

Post a reply to this message

<<< Previous 10 Messages

Goto Initial 10 Messages