POV-Ray: Newsgroups: povray.programming: Fixing mandelx

POV-Ray : Newsgroups : povray.programming : Fixing mandelx_pattern		Server Time 4 Jul 2025 09:10:36 EDT (-0400)

<<< Previous 4 Messages

Goto Initial 10 Messages

From: Warp
Subject: Re: Fixing mandelx_pattern
Date: 10 Jun 2008 07:44:40
Message: <484e6928@news.povray.org>

Sounds cool.

  You also claim that your version is faster than mandel_pattern().
Given that this function does nothing more than calculate the regular
z^2+c mandelbrot, I find it strange that a generic version which calculates
the same for any exponent will be faster. Are you sure?

-- 
                                                          - Warp

Post a reply to this message

From: Algo
Subject: Re: Fixing mandelx_pattern
Date: 12 Jun 2008 16:25:01
Message: <web.4851849556e5854ed9b2aafa0@news.povray.org>

No I didn't claim that it is faster than the exponent=2 special code.  I haven't
measured that.  I have measured vs. the mandelx with exponent = 2 and it turned
out to be a few x faster there.

If speed is the only consideration, then one picks the fastest choice,
obviously.  I will measure the new code vs. the exponent=2 special code.

If there is a threshhold for trading off some amount of speed in exchange for
some amount of code reduction, then it's parameters.  Better judged by experts
like yourself who know the values of such trade-offs.

I'm an enthusiastic code pruner, myself.

Algo

Post a reply to this message

From: Warp
Subject: Re: Fixing mandelx_pattern
Date: 12 Jun 2008 19:21:28
Message: <4851af78@news.povray.org>

Algo <Gem### [at] hotmailcom> wrote:
> No I didn't claim that it is faster than the exponent=2 special code.  I haven't
> measured that.  I have measured vs. the mandelx with exponent = 2 and it turned
> out to be a few x faster there.

> If speed is the only consideration, then one picks the fastest choice,
> obviously.  I will measure the new code vs. the exponent=2 special code.

> If there is a threshhold for trading off some amount of speed in exchange for
> some amount of code reduction, then it's parameters.  Better judged by experts
> like yourself who know the values of such trade-offs.

> I'm an enthusiastic code pruner, myself.

  As you may have noticed, the current source has specializations for
the exponents 2, 3 and 4, and the larger exponents are calculated with
the generalized function. Your improvement would basically replace the
latter.

  It may be good to measure the speed of your generic function to the
speed of the specialized exponent 4 function to see which is faster
(my guess is that the specialized one is, but I can't know without
actually measuring). If the specialized function results to be slower,
then it may be removed as well.

-- 
                                                          - Warp

Post a reply to this message

From: Algo
Subject: Re: Fixing mandelx_pattern
Date: 15 Jun 2008 18:15:01
Message: <web.4855939456e5854eaeb59a6c0@news.povray.org>

A beautiful sunny Sunday in Seattle.  Outside my window, ships go through the
locks, salmon swim up the ladder, a bagpipe band plays in the garden and I am
inside coding.  Actually I love coding.  And I have kept it down to two hours.

NEW_mandelx_pattern was identified all along as a hack.  Perfectly adequate for
demonstrating the improved power computations, but for timing purposes
representing a worst-case promise rather than any basis for quantitative
comparison.

NEWER_mandelx_pattern is a speed-optimized version.  Ready for quantitative
comparisons.  My main avenue of attack was to unroll my awful interior loops.
Short, badly branch-predicted.  Ideal candidates.  The measured result is
posted among the images.

Also I have extended NEWER_mandelx_pattern to treat exponents up to 255.

The old mandel_pattern.  A fine code.  I would have said speed optimal, in fact.
 I was hoping to come within 10% of its speed.    But the measurements are
showing NEWER_mandelx_pattern to be 6.8% FASTER, for some unexpected reason.  I
will not be satisfied until I have tracked down why this is.

Will post NEWER_mandelx_pattern after I understand it vs. mandel_pattern and
prune its code way down.

Algo

Post a reply to this message

From: Warp
Subject: Re: Fixing mandelx_pattern
Date: 16 Jun 2008 10:58:12
Message: <48567f84@news.povray.org>

Algo <Gem### [at] hotmailcom> wrote:
> Also I have extended NEWER_mandelx_pattern to treat exponents up to 255.

  Since I assume that the extra exponents come "for free" (ie. without
requiring any additional resources such as arrays), that's completely ok.
However, I really doubt anyone would use exponents that large, as the
resulting image starts more and more approaching just a circle (and
the border of the fractal becomes more and more boring as the exponent
grows).

  OTOH, I have never seen nor zoomed into a z = z^255+c mandelbrot.
Could be interesting, just out of curiosity. :)

-- 
                                                          - Warp

Post a reply to this message

From: Algo
Subject: Re: Fixing mandelx_pattern
Date: 22 Jun 2008 17:35:00
Message: <web.485ec47e56e5854ed9b2aafa0@news.povray.org>

How did NEWER_mandelx_pattern of exponent=2 manage to go 7% faster than
mandel_pattern?

THIS EXPLANATION IS HERE BECAUSE I SAID I WOULD FIGURE IT OUT.
FEEL FREE NOT TO READ THIS.  IT IS VERY TECHNICAL MATERIAL OF LITTLE
INTEREST EXCEPT TO HARD-CORE CODERS.  ALL THE LESS RELEVANT NOW,
SINCE MY FINAL CODE NOW BEATS mandel_pattern BY 19%.  MY ADVICE:
STUDY THAT INSTEAD.

The answer is by a combination of 75% Good Luck and 25% Right Living.  The
following are the disassembled code executing in the FPU on each pass through
the col loop (when neither internal "if" succeeds).

This code is MS VS 2008 C++, "release".

As expected, both codes succeed in keeping the loop totally free
from memory access.

Recall that "fld st(n)" is about three times faster than fadd, fmul or
fsub.

Recall that the first "fxch" is free as long as it executes after an
fadd, fmul or fsub.


             NEWER_mandelx_pattern:
                       NEXT_SQUARE;
             004017BE  fsubrp      st(1),st     add#1
             004017C0  fxch        st(3)
             004017C2  fadd        st(0),st     add#2
             004017C4  fmulp       st(1),st     mul#1
                       ReZ = ZTo2ToN[2] + ReX;
             004017C6  fxch        st(2)
             004017C8  fadd        st,st(3)     add#3
                       ImZ = ZTo2ToN[3] + ImX;
             004017CA  fxch        st(2)
             004017CC  fadd        st,st(1)     add#4
                       DBL AbsZ2 = ReZ*ReZ + ImZ*ImZ;
             004017CE  fld         st(0)        fld#1
             004017D0  fmul        st,st(1)     mul#2
             004017D2  fld         st(3)        fld#2
             004017D4  fmul        st,st(4)     mul#3
             004017D6  fld         st(0)        fld#3
             004017D8  fadd        st,st(2)     add#5


             mandel_pattern:
             b  = 2.0 * a * b + y;
             0040102B  fxch        st(4)        xch#1
             0040102D  fadd        st(0),st     add#1
             0040102F  fmulp       st(1),st     mul#1
             00401031  fadd        st,st(1)     add#2
                 a  = a2 - b2 + x;
             00401033  fxch        st(2)
             00401035  fsubrp      st(3),st     add#3
             00401037  fxch        st(2)
             00401039  fadd        st,st(3)     add#4

                 a2 = Sqr(a);
             0040103B  fld         st(0)        fld#1
             0040103D  fmul        st,st(1)     mul#2
                 b2 = Sqr(b);
             0040103F  fld         st(2)        fld#2
             00401041  fmul        st,st(3)     mul#3

             0040101F  fxch        st(2)
             00401021  fxch        st(4)        xch#2
             00401023  fxch        st(1)        xch#3
             00401025  fxch        st(3)        xch#4
             00401027  fxch        st(1)        xch#5
             00401029  fxch        st(2)        xch#6

A rough cycle count comparison is:
                                add     mul     fld     fxch    total
        NEWER_mandelx_pattern:  5*3     3*3     3*1     0       27
        mandel_pattern:         4*3     3*3     2*1     6*1     29

The time difference seems to be due mandel_pattern's chain of 6 fxch ops
at the end of the loop.  This is due to the cunning re-use of a^2 and b^2
from one pass through the loop to the next, upping the number of re-used
register values from 3 to 5 (out of 8).  These must be returned to the same
positions from pass to pass and that requires a bunch of register exchanges,
wasting more time than was saved by squeezing out that last FPU arithmetic op.

An infinitely clever compiler could have avoided these by unrolling the loop
in mandel-pattern.

---Algo

Post a reply to this message

From: Algo
Subject: Re: Fixing mandelx_pattern
Date: 22 Jun 2008 17:50:01
Message: <web.485ec59856e5854ed9b2aafa0@news.povray.org>

This is my algorithmically final mandel_julia_pattern.

The posted performance graph tells it all.

It includes several innovations over the previous:

    1.)Mandelbrot and Julia sets unified into a single subroutine.
    2.)mandel_julia_pattern allows artistic innovation of smooth
       interpolation between pure Mandelbrot and pure Julia sets
       via new parameter, MJFract.
    3.)My previous awful interior loops have been eliminated.  Their

       into static data.  Their repetitive execution is now reduced to
       a simple switch/case.
    4.)I have added special small exponent code.  This is a debatable
       advantage:  For instance, it now is a pretty impressive 19% faster
       than mandel_pattern for Exponent=2.  On the other hand, it adds an
       ugly outer layer of switch/case.  If you deem the latter too awful,
       the default of the outer switch will handle all Exponents with no
       worse than a factor of 2 slow-down.  The best solution would be to
       figure out how to move the outer switch inside the col loop without
       harming performance too much.  But a factor of 2 is a factor of 2 and
       19% is 19% and these sorts of speedups are quite fragile.  A better
       coder than myself could probably do it.
    5.)Exponents up to 255 are now treated.  Of course these big exponents
       take longer than smaller ones.  But their presence does not handicap
       the smaller ones in any way in a uniprocessor situation.  If you want
       to discuss problems of multi-core parallelization, I think I can add
       views based on experience.
    6.)I have added an exit based on the initial value of MinAbsZ2.
       I do not have a scientific basis for this, but it appears to
       improve logical coherence.


speed-optimized algorithm, not a code.  In fact, I have edited it for clarity
so that there is some possibility of its not even compiling.

I would be glad to include this algorithm into an object oriented version
of POV.

==========================================================================


#define NEXT_SQUARE {p[2]=p[0]*p[0]-p[1]*p[1]; p[3]=2.0*p[0]*p[1]; p+=2;}
#define ACCUM_POWER(N) {DBL work=ReZToE*p[Nth1[N]]-ImZToE*p[Nth1[N]+1];
ImZToE=ReZToE*p[Nth1[N]+1]+ImZToE*p[Nth1[N]]; ReZToE=work;}

static DBL mandel_julia_pattern(double EPoint[3], int Exponent, DBL MJFract)
{
  static int LastExponent=-1;
  int col;

/////////////////////////////////////////////////////////////////////////////
///  The following is done only once whenever the value of Exponent changes//
///  This means that Exponent should be in a class (perhaps in TPATTERN,/////
///  perhaps in a new class that embraces both mandel_patterns and///////////
///  julia_patterns).  What here are "static ints" should be members of//////
///  that class, set whenever Exponent is changed!///////////////////////////
  static int NBits,NOnes;                                                 ///
  static int Nth1[8];  //N.B.: Loc (in reals) within complex array!       ///
  if(Exponent!=LastExponent)                                              ///
  {//Exponent changed, parse out bits.                                    ///
      assert((1<Exponent)&&(Exponent<256));                               ///
      LastExponent = Exponent;                                            ///
      NOnes=0;                                                            ///
      for(NBits=0; NBits<8; NBits++)                                      ///
          if((Exponent>>NBits)==0)                                        ///
              break;                                                      ///
          else if(Exponent&(1<<NBits))                                    ///
              Nth1[NOnes++] = 2*NBits;  //2 = Stride (in DBLS) of complex ///
      assert(NBits>1);                                                    ///
      assert(NOnes>0);                                                    ///
  }                                                                       ///
///  End of hacked up version of Exponent parsing////////////////////////////
/////////////////////////////////////////////////////////////////////////////

  DBL ReZ, ImZ, ReX, ImX, MinAbsZ2, ReZToE, ImZToE;
  DBL ReC=0.353, ImC=0.288; //default values, un-comment 2 lines below...



  ReZ = EPoint[X];  // Think of ReZ+iImZ as z
  ImZ = EPoint[Y];  // Think of ReX+iImX as x, x=z in Mandelbrot

/////////////////////////////////////////////////////////////////////////////
/// Interpolate between pure Mandelbrot and pure Julia sets://///////////////
///                                                                       ///
///         MJFract = 0       Pure Mandelbrot                             ///
///         MJFract = 1       Pure Julia                                  ///
///                                                                       ///
//  ReC = TPat->Vals.Fractal.Coord[U];  //Uncomment this!!!               ///
//  ImC = TPat->Vals.Fractal.Coord[V];  //Uncomment this!!!               ///
  ReX = (1.0-MJFract)*ReZ + MJFract*ReC;                                  ///
  ImX = (1.0-MJFract)*ImZ + MJFract*ImC;                                  ///
/////////////////////////////////////////////////////////////////////////////

  MinAbsZ2 = ReZ*ReZ + ImZ*ImZ;

  if(MinAbsZ2 > 4.0)
      return(fractal_exterior_color(col, ReZ, ImZ));


  DBL ZTo2ToN[16]; // ZTo2ToN[2*n]+%i*ZTo2ToN[2*n+1] will become (z^(2^n))
  switch(Exponent)
  {
      case 2:
      {
          for(col = 0; col < it_max; col++)
          {
              double *p=ZTo2ToN;
              p[0] = ReZ;   // z^1
              p[1] = ImZ;
              NEXT_SQUARE;  // z^2
              ReZ = ZTo2ToN[2] + ReX;
              ImZ = ZTo2ToN[3] + ImX;
              DBL AbsZ2 = ReZ*ReZ + ImZ*ImZ;
              if(AbsZ2 > 4.0)
                  return(fractal_exterior_color(col, ReZ, ImZ));
              //Look, Ma, no ifs!
              MinAbsZ2 = 0.5*(MinAbsZ2+AbsZ2-fabs(MinAbsZ2-AbsZ2));
          }
          break;
      }
      case 3:
      {
          for(col = 0; col < it_max; col++)
          {
              double *p=ZTo2ToN;
              p[0] = ReZ;   // z^1
              p[1] = ImZ;
              NEXT_SQUARE;  // z^2
              ReZ = ZTo2ToN[0]*ZTo2ToN[2] - ZTo2ToN[1]*ZTo2ToN[3] + ReX;
              ImZ = ZTo2ToN[0]*ZTo2ToN[3] + ZTo2ToN[1]*ZTo2ToN[2] + ImX;
              DBL AbsZ2 = ReZ*ReZ + ImZ*ImZ;
              if(AbsZ2 > 4.0)
                  return(fractal_exterior_color(col,ReZ,ImZ));
              MinAbsZ2 = 0.5*(MinAbsZ2+AbsZ2-fabs(MinAbsZ2-AbsZ2));
          }
          break;
      }
      case 4:
      {
          for(col = 0; col < it_max; col++)
          {
              double *p=ZTo2ToN;
              p[0] = ReZ;   // z^1
              p[1] = ImZ;
              NEXT_SQUARE;  // z^2
              NEXT_SQUARE;  // z^4
              ReZ = ZTo2ToN[4] + ReX;
              ImZ = ZTo2ToN[5] + ImX;
              DBL AbsZ2 = ReZ*ReZ + ImZ*ImZ;
              if(AbsZ2 > 4.0)
                  return(fractal_exterior_color(col, ReZ, ImZ));
              MinAbsZ2 = 0.5*(MinAbsZ2+AbsZ2-fabs(MinAbsZ2-AbsZ2));
          }
          break;
      }
      default:
          for(col = 0; col < it_max; col++)
          {
              double *p=ZTo2ToN;
              p[0] = ReZ;   // z^1
              p[1] = ImZ;

              switch (NBits)
              {//Fills up NBits complex entries of ZTo2ToN
                  case 8: NEXT_SQUARE;   //Uneeded if Exp < 128
                  case 7: NEXT_SQUARE;   //Uneeded if Exp <  64
                  case 6: NEXT_SQUARE;   //Uneeded if Exp <  32
                  case 5: NEXT_SQUARE;   //Uneeded if Exp <  16
                  case 4: NEXT_SQUARE;   //Uneeded if Exp <   8
                  case 3: NEXT_SQUARE;   //Uneeded if Exp <   4
                  default: NEXT_SQUARE;  //Always needed because Exp>1
              }

              p=ZTo2ToN;
              ReZToE = ZTo2ToN[Nth1[0]  ];  //One power comes in for free!
              ImZToE = ZTo2ToN[Nth1[0]+1];


              switch (NOnes)
              {//Fills up NBits complex entries of ZTo2ToN
                  case 8: ACCUM_POWER(7);    //Uneeded if NOnes < 8
                  case 7: ACCUM_POWER(6);    //Uneeded if NOnes < 7
                  case 6: ACCUM_POWER(5);    //Uneeded if NOnes < 6
                  case 5: ACCUM_POWER(4);    //Uneeded if NOnes < 5
                  case 4: ACCUM_POWER(3);    //Uneeded if NOnes < 4
                  case 3: ACCUM_POWER(2);    //Uneeded if NOnes < 3
                  case 2: ACCUM_POWER(1);    //Uneeded if NOnes < 2
              }

              ReZ = ReZToE + ReX; // z <= z^Exp + x
              ImZ = ImZToE + ImX;
              DBL AbsZ2 = ReZ*ReZ + ImZ*ImZ;
              if(AbsZ2 > 4.0)
                  return(fractal_exterior_color(col, ReZ, ImZ));
              MinAbsZ2 = 0.5*(MinAbsZ2+AbsZ2-fabs(MinAbsZ2-AbsZ2));
          }
  }

  return(fractal_interior_color(col, ReZ, ImZ, MinAbsZ2));
}

Post a reply to this message

From: Thorsten Froehlich
Subject: Re: Fixing mandelx_pattern
Date: 22 Jun 2008 22:12:57
Message: <485f06a9$1@news.povray.org>

Algo wrote:
> A rough cycle count comparison is:
>                                 add     mul     fld     fxch    total
>         NEWER_mandelx_pattern:  5*3     3*3     3*1     0       27
>         mandel_pattern:         4*3     3*3     2*1     6*1     29

So just that I get you right: Are you suggesting because your code can trick 
a specific compiler on a specific architecture to produce slightly better 
code for some processors of that architecture, the current code in POV-Ray 
should be changed? Or is your code universally better on an algorithmic level?

	Thorsten

Post a reply to this message

From: Warp
Subject: Re: Fixing mandelx_pattern
Date: 23 Jun 2008 10:25:41
Message: <485fb265@news.povray.org>

Thorsten Froehlich <tho### [at] trfde> wrote:
> So just that I get you right: Are you suggesting because your code can trick 
> a specific compiler on a specific architecture to produce slightly better 
> code for some processors of that architecture, the current code in POV-Ray 
> should be changed? Or is your code universally better on an algorithmic level?

  The current mandel_pattern() function could be kept, and his version
called for all the other variants.

  If enough compilers and platforms show that his variant is always faster
than the regular mandel_pattern(), then the latter could be dropped at some
point.

-- 
                                                          - Warp

Post a reply to this message

From: Algo
Subject: Re: Fixing mandelx_pattern
Date: 29 Jun 2008 19:10:01
Message: <web.4868151e56e5854eaeb59a6c0@news.povray.org>

New Fractal

Just as Warp said, the large exponents lead to standard Mandelbrot and Julia
patterns that are nearly round and quite boringly repetitive around their
perimeters.

However the new Mandelbrot/Julia hybrids (especially with values of the MJFract
extrapolating past Julia, i.e. > 1) lead to football shapes with one
Mandelbrot-ish side and one Julia-ish.  The interest comes at the cusps where
the two regimes interfere.

I am unable to post the 1-Giga-pixel image that took about 20 minutes to render:

 Exponent = 255
 JuliaX   = [0.353,0.288]
 MJFract  = 2.0
 Rgn Rect = [[-0.2566,+0.8909],[-0.1316,+1.0159]]
 Res      = [32K,32K]


Of course, the Exponent=2 case is very interesting everywhere.  And this
suggests how to proceed:  A more subtle hybridization which has its number of
cusps equal to its exponent.


detail from it (Exponent=31) has now been posted.  I think it is a rather
interesting pattern.

--Algo

Post a reply to this message

<<< Previous 4 Messages

Goto Initial 10 Messages