POV-Ray: Newsgroups: povray.programming: Fixing mandelx

POV-Ray : Newsgroups : povray.programming : Fixing mandelx_pattern		Server Time 30 Jun 2025 16:49:24 EDT (-0400)

Goto Latest 10 Messages

Next 4 Messages >>>

From: Algo
Subject: Fixing mandelx_pattern
Date: 6 Jun 2008 19:35:00
Message: <web.4849c8f7f7b98f16d9b2aafa0@news.povray.org>

Hello

I have been using POV on and off since the early 90s.  It was a fabulous product
then, and it is fabulous now.

Small as it is, I would like to offer a contribution of my own: fixing
mandelx_pattern.  I have hacked together a demonstration of my proposed
improvement:

1.) Accuracy is not damaged.  In fact it is improved.  In a typical test case:

2.) Speed is improved.  The enclosed graph shows measured performance

2x to 20x]
3.) Maintainability  is improved.  The code is easier to read and, for instance,
the binomial_coeff stuff is no longer needed.

If you would be interested, drop me a note.

Algo

Post a reply to this message

From: Warp
Subject: Re: Fixing mandelx_pattern
Date: 7 Jun 2008 03:22:56
Message: <484a3750@news.povray.org>

Algo <Gem### [at] hotmailcom> wrote:
> Small as it is, I would like to offer a contribution of my own: fixing
> mandelx_pattern.

  What was broken with it, that needed fixing?

> I have hacked together a demonstration of my proposed improvement:

  Where is it?

-- 
                                                          - Warp

Post a reply to this message

From: Algo
Subject: Re: Fixing mandelx_pattern
Date: 8 Jun 2008 16:30:01
Message: <web.484c40b056e5854eaeb59a6c0@news.povray.org>

Warp <war### [at] tagpovrayorg> wrote:
> Algo <Gem### [at] hotmailcom> wrote:
> > Small as it is, I would like to offer a contribution of my own: fixing
> > mandelx_pattern.
>
>   What was broken with it, that needed fixing?

[Speed-ups ranging from 2x to 20x]
[Also accuracy is improved]
[Less source code, too]

>
> > I have hacked together a demonstration of my proposed improvement:
>
>   Where is it?
>

Do you want to see source code or an algorithm description?

Algo

Post a reply to this message

From: Algo
Subject: Re: Fixing mandelx_pattern
Date: 9 Jun 2008 16:20:00
Message: <web.484d8f3356e5854ed9b2aafa0@news.povray.org>

PROPOSED NEW VERSION OF mandelx_pattern

The basis of this improvement is the following algorithm.  This algorithm could
not possibly be original to me, though I did not get it from any source.

PROPOSED ALGORITHM
How to compute powers in logarithmic time.  The POV docs inaccurately imply that
computing the pth power takes p-1 multiplications.  In fact it always takes
fewer than 2*log2(p) multiplications.  Here is how that speedup works:

/////////////////////////////////////////////////////////////////
// Let z=a+ib, want zToP = z^p
//    for example, take p=37
//
// Compute z, z^2, z^4, z^8, ... by successive squaring.
// Call this sequence zTo2ToN.
//
// Write these below binary expansion of p:
//
//           1      0      0      1      0      1
//           z^32   z^16   z^8    z^4    z^2    z^1
//    zToP = z^32                *z^4          *z^1  = z^37
//
// Notice that this takes:
//    5 complex multiplies to get the sequence zTo2ToN
//    2 complex multiplies accumulating -> zNew
//  ----------------------
//    7 complex multiplies total
//
// i,e., WAY less than the 36 required if computed via
//
//           z^1, z^2, z^3,..., z^34, z^35
//
// In general, to compute a pth power, it takes
//
//    CountZeros(p)+2*CountOnes(p)-2 complex multiplies
//
// For certain values of p (like 15 or 30), there are methods
// that are somewhat faster, but the extra code burden would
// probably not be worth it.
/////////////////////////////////////////////////////////////////



================================================================================
This obviates a LOT of code.....

I propose that the code below replaces:
    1.) mandelx_pattern
    2.) mandel_pattern
    3.) mandel3_pattern
    4.) mandel4_pattern
    5.) MANDEL_PATTERN
    6.) MANDEL3_PATTERN
    7.) MANDEL4_PATTERN

and all that machinery.

The same trick applies to the Julia Set as well, NEW_juliax_pattern eliminating:
    8.) juliax_pattern
    9.) julia_pattern
    10.) julia3_pattern
    11.) julia4_pattern
    12.) JULIA_PATTERN
    13.) JULIA3_PATTERN
    14.) JULIA4_PATTERN

and all that machinery.

When these are both done, there is no further need for:
    15.) binomial_coeff
    16.) InitializeBinomialCoefficients
    17.) BinomialCoefficients
    18.) BinomialCoefficientsInited

and all that machinery.

================================================================================
And it goes a LOT faster.  The enclosed figure, ALGO_TIMES.JPG, shows the
speedups.
Since I cannot post it to this forum, I will place it in some more permissive
forum on the POVRAY site.

================================================================================
Finally, it is a LOT more accurate.  This is mostly a matter of peace-of-mind
for coders.  I doubt very much that users will notice any artistic improvement
from this, but, for exponent=30, there actually were two pixels different even
on my 401x401 hacked up grid (but no smaller exponent differed at all).


================================================================================
================================================================================
Here is a hacked-up version of the code.  I have never operated it in situ (for
some reason [missing DLL?] I am unable to execute my POV compilations).  But I
am pretty sure that it works.

/********NEW_mandelx_pattern begins********************************************
#define CMPLX_SQ(A,B)      {DBL w = A*A - B*B; B = 2.0*A*B;   A = w;}
#define CMPLX_MUL(A,B,C,D) {DBL w = A*C - B*D; B = A*D + B*C; A = w;}

static DBL NEW_mandelx_pattern(double EPoint[3])
{//  assert(exponent>0);
  int col,ExpShifted;
  DBL ReZ, ImZ, ReX, ImX, MinAbsZ2;

  ReZ = ReX = EPoint[X];  // Think of ReZ+iImZ as z
  ImZ = ImX = EPoint[Y];  // Think of ReX+iImX as x, start with z=x
  MinAbsZ2 = ReZ*ReZ + ImZ*ImZ;

  for(col = 0; col < it_max; col++)
  {
      DBL ReZTo2ToN = ReZ; // Will become z^(2^bit_pos)
      DBL ImZTo2ToN = ImZ;

      for(ExpShifted=exponent; (ExpShifted&1)==0; ExpShifted >>= 1)
          CMPLX_SQ(ReZTo2ToN,ImZTo2ToN); // Next trailing zero in exponent

      DBL ReZToE = ReZTo2ToN; // Least significant 1 in exponent,
      DBL ImZToE = ImZTo2ToN; // This will become z^exponent

      for(ExpShifted >>= 1   ; ExpShifted!=0    ; ExpShifted >>= 1)
      {// Next bit-position in exponent
          CMPLX_SQ(ReZTo2ToN,ImZTo2ToN); // Square previous z^(2^bit_pos)
          if((ExpShifted&1)!=0)          // This bit-position = 1?
              CMPLX_MUL(ReZToE,ImZToE,ReZTo2ToN,ImZTo2ToN); // -> z^exponent
      }

      ReZ = ReZToE + ReX; // z <= z^exponent + x
      ImZ = ImZToE + ImX;

      DBL AbsZ2 = ReZ*ReZ + ImZ*ImZ;

      if(AbsZ2 < MinAbsZ2) MinAbsZ2 = AbsZ2;
      if(AbsZ2 > 4.0)
          return(fractal_exterior_color(col, ReZ, ImZ));
  }

  return(fractal_interior_color(col, ReZ, ImZ, MinAbsZ2));
}
********NEW_mandelx_pattern ends**********************************************/

Post a reply to this message

From: Warp
Subject: Re: Fixing mandelx_pattern
Date: 10 Jun 2008 07:44:40
Message: <484e6928@news.povray.org>

Sounds cool.

  You also claim that your version is faster than mandel_pattern().
Given that this function does nothing more than calculate the regular
z^2+c mandelbrot, I find it strange that a generic version which calculates
the same for any exponent will be faster. Are you sure?

-- 
                                                          - Warp

Post a reply to this message

From: Algo
Subject: Re: Fixing mandelx_pattern
Date: 12 Jun 2008 16:25:01
Message: <web.4851849556e5854ed9b2aafa0@news.povray.org>

No I didn't claim that it is faster than the exponent=2 special code.  I haven't
measured that.  I have measured vs. the mandelx with exponent = 2 and it turned
out to be a few x faster there.

If speed is the only consideration, then one picks the fastest choice,
obviously.  I will measure the new code vs. the exponent=2 special code.

If there is a threshhold for trading off some amount of speed in exchange for
some amount of code reduction, then it's parameters.  Better judged by experts
like yourself who know the values of such trade-offs.

I'm an enthusiastic code pruner, myself.

Algo

Post a reply to this message

From: Warp
Subject: Re: Fixing mandelx_pattern
Date: 12 Jun 2008 19:21:28
Message: <4851af78@news.povray.org>

Algo <Gem### [at] hotmailcom> wrote:
> No I didn't claim that it is faster than the exponent=2 special code.  I haven't
> measured that.  I have measured vs. the mandelx with exponent = 2 and it turned
> out to be a few x faster there.

> If speed is the only consideration, then one picks the fastest choice,
> obviously.  I will measure the new code vs. the exponent=2 special code.

> If there is a threshhold for trading off some amount of speed in exchange for
> some amount of code reduction, then it's parameters.  Better judged by experts
> like yourself who know the values of such trade-offs.

> I'm an enthusiastic code pruner, myself.

  As you may have noticed, the current source has specializations for
the exponents 2, 3 and 4, and the larger exponents are calculated with
the generalized function. Your improvement would basically replace the
latter.

  It may be good to measure the speed of your generic function to the
speed of the specialized exponent 4 function to see which is faster
(my guess is that the specialized one is, but I can't know without
actually measuring). If the specialized function results to be slower,
then it may be removed as well.

-- 
                                                          - Warp

Post a reply to this message

From: Algo
Subject: Re: Fixing mandelx_pattern
Date: 15 Jun 2008 18:15:01
Message: <web.4855939456e5854eaeb59a6c0@news.povray.org>

A beautiful sunny Sunday in Seattle.  Outside my window, ships go through the
locks, salmon swim up the ladder, a bagpipe band plays in the garden and I am
inside coding.  Actually I love coding.  And I have kept it down to two hours.

NEW_mandelx_pattern was identified all along as a hack.  Perfectly adequate for
demonstrating the improved power computations, but for timing purposes
representing a worst-case promise rather than any basis for quantitative
comparison.

NEWER_mandelx_pattern is a speed-optimized version.  Ready for quantitative
comparisons.  My main avenue of attack was to unroll my awful interior loops.
Short, badly branch-predicted.  Ideal candidates.  The measured result is
posted among the images.

Also I have extended NEWER_mandelx_pattern to treat exponents up to 255.

The old mandel_pattern.  A fine code.  I would have said speed optimal, in fact.
 I was hoping to come within 10% of its speed.    But the measurements are
showing NEWER_mandelx_pattern to be 6.8% FASTER, for some unexpected reason.  I
will not be satisfied until I have tracked down why this is.

Will post NEWER_mandelx_pattern after I understand it vs. mandel_pattern and
prune its code way down.

Algo

Post a reply to this message

From: Warp
Subject: Re: Fixing mandelx_pattern
Date: 16 Jun 2008 10:58:12
Message: <48567f84@news.povray.org>

Algo <Gem### [at] hotmailcom> wrote:
> Also I have extended NEWER_mandelx_pattern to treat exponents up to 255.

  Since I assume that the extra exponents come "for free" (ie. without
requiring any additional resources such as arrays), that's completely ok.
However, I really doubt anyone would use exponents that large, as the
resulting image starts more and more approaching just a circle (and
the border of the fractal becomes more and more boring as the exponent
grows).

  OTOH, I have never seen nor zoomed into a z = z^255+c mandelbrot.
Could be interesting, just out of curiosity. :)

-- 
                                                          - Warp

Post a reply to this message

From: Algo
Subject: Re: Fixing mandelx_pattern
Date: 22 Jun 2008 17:35:00
Message: <web.485ec47e56e5854ed9b2aafa0@news.povray.org>

How did NEWER_mandelx_pattern of exponent=2 manage to go 7% faster than
mandel_pattern?

THIS EXPLANATION IS HERE BECAUSE I SAID I WOULD FIGURE IT OUT.
FEEL FREE NOT TO READ THIS.  IT IS VERY TECHNICAL MATERIAL OF LITTLE
INTEREST EXCEPT TO HARD-CORE CODERS.  ALL THE LESS RELEVANT NOW,
SINCE MY FINAL CODE NOW BEATS mandel_pattern BY 19%.  MY ADVICE:
STUDY THAT INSTEAD.

The answer is by a combination of 75% Good Luck and 25% Right Living.  The
following are the disassembled code executing in the FPU on each pass through
the col loop (when neither internal "if" succeeds).

This code is MS VS 2008 C++, "release".

As expected, both codes succeed in keeping the loop totally free
from memory access.

Recall that "fld st(n)" is about three times faster than fadd, fmul or
fsub.

Recall that the first "fxch" is free as long as it executes after an
fadd, fmul or fsub.


             NEWER_mandelx_pattern:
                       NEXT_SQUARE;
             004017BE  fsubrp      st(1),st     add#1
             004017C0  fxch        st(3)
             004017C2  fadd        st(0),st     add#2
             004017C4  fmulp       st(1),st     mul#1
                       ReZ = ZTo2ToN[2] + ReX;
             004017C6  fxch        st(2)
             004017C8  fadd        st,st(3)     add#3
                       ImZ = ZTo2ToN[3] + ImX;
             004017CA  fxch        st(2)
             004017CC  fadd        st,st(1)     add#4
                       DBL AbsZ2 = ReZ*ReZ + ImZ*ImZ;
             004017CE  fld         st(0)        fld#1
             004017D0  fmul        st,st(1)     mul#2
             004017D2  fld         st(3)        fld#2
             004017D4  fmul        st,st(4)     mul#3
             004017D6  fld         st(0)        fld#3
             004017D8  fadd        st,st(2)     add#5


             mandel_pattern:
             b  = 2.0 * a * b + y;
             0040102B  fxch        st(4)        xch#1
             0040102D  fadd        st(0),st     add#1
             0040102F  fmulp       st(1),st     mul#1
             00401031  fadd        st,st(1)     add#2
                 a  = a2 - b2 + x;
             00401033  fxch        st(2)
             00401035  fsubrp      st(3),st     add#3
             00401037  fxch        st(2)
             00401039  fadd        st,st(3)     add#4

                 a2 = Sqr(a);
             0040103B  fld         st(0)        fld#1
             0040103D  fmul        st,st(1)     mul#2
                 b2 = Sqr(b);
             0040103F  fld         st(2)        fld#2
             00401041  fmul        st,st(3)     mul#3

             0040101F  fxch        st(2)
             00401021  fxch        st(4)        xch#2
             00401023  fxch        st(1)        xch#3
             00401025  fxch        st(3)        xch#4
             00401027  fxch        st(1)        xch#5
             00401029  fxch        st(2)        xch#6

A rough cycle count comparison is:
                                add     mul     fld     fxch    total
        NEWER_mandelx_pattern:  5*3     3*3     3*1     0       27
        mandel_pattern:         4*3     3*3     2*1     6*1     29

The time difference seems to be due mandel_pattern's chain of 6 fxch ops
at the end of the loop.  This is due to the cunning re-use of a^2 and b^2
from one pass through the loop to the next, upping the number of re-used
register values from 3 to 5 (out of 8).  These must be returned to the same
positions from pass to pass and that requires a bunch of register exchanges,
wasting more time than was saved by squeezing out that last FPU arithmetic op.

An infinitely clever compiler could have avoided these by unrolling the loop
in mandel-pattern.

---Algo

Post a reply to this message

Goto Latest 10 Messages

Next 4 Messages >>>