POV-Ray: Newsgroups: povray.general: Improving POV-Ray.: Re: Improving POV-Ray.

POV-Ray : Newsgroups : povray.general : Improving POV-Ray. : Re: Improving POV-Ray.		Server Time 2 Jul 2025 11:28:31 EDT (-0400)
From: Nieminen Mika
Date: 19 Feb 1999 08:16:15
Message: <36cd641f.0@news.povray.org>
Roland Mas <rol### [at] casimirrezelenstfr> wrote:
: No.  At least with dece^H^H^H^Hrecent compilers, the generated code is
: as fast as (if not faster than) the one done by hand.

  This is and isn't true.
  This isn't true, because compilers are not magic programs which can
make asm-code impossible to make by humans. The compilers are made by
humans and the optimizations made by compilers are coded by humans. So
compilers can't do anything a human couldn't do.
  Actually, the optimizations made by a compiler are very schematic. They
always have a limited way of optimizing a code. Humans have not this limit.
They can use their creativity to create a specific asm code which is
optimal for a certain code. Compilers can't do anything that they are not
programmed to do. Humans can.
  But the statement is also true. Although humans can make better code,
they are only humans, with human limits. Humans can't optimize 10^6 lines
of asm code while compilers can. Humans can optimize a little piece of
code, but nothing more. Also the amount of knowledge needed to make
a perfectly optimal code is tremendous, and almost no human can learn that
amount of information. Compilers can.

  Another problem with optimizing programs is the limited expressive power
of programming languages. For example (in PC with C) you can't rotate the
contents of a variable with a single operation (if you use a=(a<<10)|(a>>22);
some compilers see the rotation there and can compile that to one single
asm instruction, but those compilers are very rare), you can't multiply
two 32-bit integers and get the upper 32 bits of the result (which is
a 64-bit number stored into two registers), you can't use the carry flag
in any way (for example for adding two numbers which are larger than 32 bits),
etc. With asm you can.

  I once tested asm vs C. I made a C-code which calculated the mandelbrot
fractal to memory. I tried to make the C-code as optimal as possible. Then
I made the same thing with asm. The asm-code was about two times as fast
as the C-code.
  I used many optimizing tricks in the asm code which are impossible to
achieve with the C-code (because they use intel cpu specific features).
For example, I could store _all_ the required floating point values into
the FPU and didn't have to load and store them in each loop (as the
C-compiler did). The C-compiler was unable to see that he actually didn't
have to temporarily store the values to memory from the FPU. (The mandelbrot
fractal is a bit unfair example since you need only about 7 values to
calculate it, and there are 8 register into the FPU. It seems that the
compiler can't see this.) Of course this is compiler-dependant. Perhaps
another compiler could see this.
  Another funny trick was that with the intel asm I could test two things
with just one jump command. In the inner loop you have to test if a value
is bigger than 4. If so, you have to end the loop. Also you have to test if
you have looped the maximum number of iterations. Ie. something like this:
  n=MaxIterations;
  do
  { ...
    if(a>4) break;
    n--;
  } while(n>0);

  I could do it this way:

  cmp 4,[a]  ; (Actually it's not 4 but the floating point presentation of 4)
  dec Cl     ; equivalent to n--;
  ja Loop    ; if [a]<=4 and Cl!=0 then loop

(I'm not completely sure about the 'ja' command, but at least this is the
idea)
  This is possible because the 'dec' instruction doesn't change the carry
flag.
  And here is another optimization. As I commented, the value to which I
compare the 'a' is not 4 but the floating point representation of it (I
don't remember how is it); of course 'a' is also in floating point format.
Since 'a' is always positive (it's the sum of two squared values) and the
floating point format in the PC is the IEEE standard format, I can compare
them directly, as if they were just integers.
  The compiler can't do this. That comparation doesn't work if 'a' is
negative, but the compiler has no way to know that it will never be
a negative value, so it just can't optimize that. Also the
2-compares-in-1-jump is also almost impossible to see by a compiler.


  Of course this is just a very very short program. It isn't affected by
cache miss penalties or anything. With a bigger program which uses more
memory it would be a lot harder to beat the compiler.

  As someone said, a good algorithm will speed up more with less work than
optimizing the code by hand.
  Of course a Perfectly Optimized Povray would use the best algorithms and
would be made entirely with perfeclty optimized asm. The making of such
program would take centuries... (just for one platform).

-- 
main(i){char*_="BdsyFBThhHFBThhHFRz]NFTITQF|DJIFHQhhF";while(i=
*_++)for(;i>1;printf("%s",i-70?i&1?"[]":" ":(i=0,"\n")),i/=2);} /*- Warp -*/
Post a reply to this message