|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Sounds cool.
You also claim that your version is faster than mandel_pattern().
Given that this function does nothing more than calculate the regular
z^2+c mandelbrot, I find it strange that a generic version which calculates
the same for any exponent will be faster. Are you sure?
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
No I didn't claim that it is faster than the exponent=2 special code. I haven't
measured that. I have measured vs. the mandelx with exponent = 2 and it turned
out to be a few x faster there.
If speed is the only consideration, then one picks the fastest choice,
obviously. I will measure the new code vs. the exponent=2 special code.
If there is a threshhold for trading off some amount of speed in exchange for
some amount of code reduction, then it's parameters. Better judged by experts
like yourself who know the values of such trade-offs.
I'm an enthusiastic code pruner, myself.
Algo
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Algo <Gem### [at] hotmailcom> wrote:
> No I didn't claim that it is faster than the exponent=2 special code. I haven't
> measured that. I have measured vs. the mandelx with exponent = 2 and it turned
> out to be a few x faster there.
> If speed is the only consideration, then one picks the fastest choice,
> obviously. I will measure the new code vs. the exponent=2 special code.
> If there is a threshhold for trading off some amount of speed in exchange for
> some amount of code reduction, then it's parameters. Better judged by experts
> like yourself who know the values of such trade-offs.
> I'm an enthusiastic code pruner, myself.
As you may have noticed, the current source has specializations for
the exponents 2, 3 and 4, and the larger exponents are calculated with
the generalized function. Your improvement would basically replace the
latter.
It may be good to measure the speed of your generic function to the
speed of the specialized exponent 4 function to see which is faster
(my guess is that the specialized one is, but I can't know without
actually measuring). If the specialized function results to be slower,
then it may be removed as well.
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
A beautiful sunny Sunday in Seattle. Outside my window, ships go through the
locks, salmon swim up the ladder, a bagpipe band plays in the garden and I am
inside coding. Actually I love coding. And I have kept it down to two hours.
NEW_mandelx_pattern was identified all along as a hack. Perfectly adequate for
demonstrating the improved power computations, but for timing purposes
representing a worst-case promise rather than any basis for quantitative
comparison.
NEWER_mandelx_pattern is a speed-optimized version. Ready for quantitative
comparisons. My main avenue of attack was to unroll my awful interior loops.
Short, badly branch-predicted. Ideal candidates. The measured result is
posted among the images.
Also I have extended NEWER_mandelx_pattern to treat exponents up to 255.
The old mandel_pattern. A fine code. I would have said speed optimal, in fact.
I was hoping to come within 10% of its speed. But the measurements are
showing NEWER_mandelx_pattern to be 6.8% FASTER, for some unexpected reason. I
will not be satisfied until I have tracked down why this is.
Will post NEWER_mandelx_pattern after I understand it vs. mandel_pattern and
prune its code way down.
Algo
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Algo <Gem### [at] hotmailcom> wrote:
> Also I have extended NEWER_mandelx_pattern to treat exponents up to 255.
Since I assume that the extra exponents come "for free" (ie. without
requiring any additional resources such as arrays), that's completely ok.
However, I really doubt anyone would use exponents that large, as the
resulting image starts more and more approaching just a circle (and
the border of the fractal becomes more and more boring as the exponent
grows).
OTOH, I have never seen nor zoomed into a z = z^255+c mandelbrot.
Could be interesting, just out of curiosity. :)
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
How did NEWER_mandelx_pattern of exponent=2 manage to go 7% faster than
mandel_pattern?
THIS EXPLANATION IS HERE BECAUSE I SAID I WOULD FIGURE IT OUT.
FEEL FREE NOT TO READ THIS. IT IS VERY TECHNICAL MATERIAL OF LITTLE
INTEREST EXCEPT TO HARD-CORE CODERS. ALL THE LESS RELEVANT NOW,
SINCE MY FINAL CODE NOW BEATS mandel_pattern BY 19%. MY ADVICE:
STUDY THAT INSTEAD.
The answer is by a combination of 75% Good Luck and 25% Right Living. The
following are the disassembled code executing in the FPU on each pass through
the col loop (when neither internal "if" succeeds).
This code is MS VS 2008 C++, "release".
As expected, both codes succeed in keeping the loop totally free
from memory access.
Recall that "fld st(n)" is about three times faster than fadd, fmul or
fsub.
Recall that the first "fxch" is free as long as it executes after an
fadd, fmul or fsub.
NEWER_mandelx_pattern:
NEXT_SQUARE;
004017BE fsubrp st(1),st add#1
004017C0 fxch st(3)
004017C2 fadd st(0),st add#2
004017C4 fmulp st(1),st mul#1
ReZ = ZTo2ToN[2] + ReX;
004017C6 fxch st(2)
004017C8 fadd st,st(3) add#3
ImZ = ZTo2ToN[3] + ImX;
004017CA fxch st(2)
004017CC fadd st,st(1) add#4
DBL AbsZ2 = ReZ*ReZ + ImZ*ImZ;
004017CE fld st(0) fld#1
004017D0 fmul st,st(1) mul#2
004017D2 fld st(3) fld#2
004017D4 fmul st,st(4) mul#3
004017D6 fld st(0) fld#3
004017D8 fadd st,st(2) add#5
mandel_pattern:
b = 2.0 * a * b + y;
0040102B fxch st(4) xch#1
0040102D fadd st(0),st add#1
0040102F fmulp st(1),st mul#1
00401031 fadd st,st(1) add#2
a = a2 - b2 + x;
00401033 fxch st(2)
00401035 fsubrp st(3),st add#3
00401037 fxch st(2)
00401039 fadd st,st(3) add#4
a2 = Sqr(a);
0040103B fld st(0) fld#1
0040103D fmul st,st(1) mul#2
b2 = Sqr(b);
0040103F fld st(2) fld#2
00401041 fmul st,st(3) mul#3
0040101F fxch st(2)
00401021 fxch st(4) xch#2
00401023 fxch st(1) xch#3
00401025 fxch st(3) xch#4
00401027 fxch st(1) xch#5
00401029 fxch st(2) xch#6
A rough cycle count comparison is:
add mul fld fxch total
NEWER_mandelx_pattern: 5*3 3*3 3*1 0 27
mandel_pattern: 4*3 3*3 2*1 6*1 29
The time difference seems to be due mandel_pattern's chain of 6 fxch ops
at the end of the loop. This is due to the cunning re-use of a^2 and b^2
from one pass through the loop to the next, upping the number of re-used
register values from 3 to 5 (out of 8). These must be returned to the same
positions from pass to pass and that requires a bunch of register exchanges,
wasting more time than was saved by squeezing out that last FPU arithmetic op.
An infinitely clever compiler could have avoided these by unrolling the loop
in mandel-pattern.
---Algo
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
This is my algorithmically final mandel_julia_pattern.
The posted performance graph tells it all.
It includes several innovations over the previous:
1.)Mandelbrot and Julia sets unified into a single subroutine.
2.)mandel_julia_pattern allows artistic innovation of smooth
interpolation between pure Mandelbrot and pure Julia sets
via new parameter, MJFract.
3.)My previous awful interior loops have been eliminated. Their
into static data. Their repetitive execution is now reduced to
a simple switch/case.
4.)I have added special small exponent code. This is a debatable
advantage: For instance, it now is a pretty impressive 19% faster
than mandel_pattern for Exponent=2. On the other hand, it adds an
ugly outer layer of switch/case. If you deem the latter too awful,
the default of the outer switch will handle all Exponents with no
worse than a factor of 2 slow-down. The best solution would be to
figure out how to move the outer switch inside the col loop without
harming performance too much. But a factor of 2 is a factor of 2 and
19% is 19% and these sorts of speedups are quite fragile. A better
coder than myself could probably do it.
5.)Exponents up to 255 are now treated. Of course these big exponents
take longer than smaller ones. But their presence does not handicap
the smaller ones in any way in a uniprocessor situation. If you want
to discuss problems of multi-core parallelization, I think I can add
views based on experience.
6.)I have added an exit based on the initial value of MinAbsZ2.
I do not have a scientific basis for this, but it appears to
improve logical coherence.
speed-optimized algorithm, not a code. In fact, I have edited it for clarity
so that there is some possibility of its not even compiling.
I would be glad to include this algorithm into an object oriented version
of POV.
==========================================================================
#define NEXT_SQUARE {p[2]=p[0]*p[0]-p[1]*p[1]; p[3]=2.0*p[0]*p[1]; p+=2;}
#define ACCUM_POWER(N) {DBL work=ReZToE*p[Nth1[N]]-ImZToE*p[Nth1[N]+1];
ImZToE=ReZToE*p[Nth1[N]+1]+ImZToE*p[Nth1[N]]; ReZToE=work;}
static DBL mandel_julia_pattern(double EPoint[3], int Exponent, DBL MJFract)
{
static int LastExponent=-1;
int col;
/////////////////////////////////////////////////////////////////////////////
/// The following is done only once whenever the value of Exponent changes//
/// This means that Exponent should be in a class (perhaps in TPATTERN,/////
/// perhaps in a new class that embraces both mandel_patterns and///////////
/// julia_patterns). What here are "static ints" should be members of//////
/// that class, set whenever Exponent is changed!///////////////////////////
static int NBits,NOnes; ///
static int Nth1[8]; //N.B.: Loc (in reals) within complex array! ///
if(Exponent!=LastExponent) ///
{//Exponent changed, parse out bits. ///
assert((1<Exponent)&&(Exponent<256)); ///
LastExponent = Exponent; ///
NOnes=0; ///
for(NBits=0; NBits<8; NBits++) ///
if((Exponent>>NBits)==0) ///
break; ///
else if(Exponent&(1<<NBits)) ///
Nth1[NOnes++] = 2*NBits; //2 = Stride (in DBLS) of complex ///
assert(NBits>1); ///
assert(NOnes>0); ///
} ///
/// End of hacked up version of Exponent parsing////////////////////////////
/////////////////////////////////////////////////////////////////////////////
DBL ReZ, ImZ, ReX, ImX, MinAbsZ2, ReZToE, ImZToE;
DBL ReC=0.353, ImC=0.288; //default values, un-comment 2 lines below...
ReZ = EPoint[X]; // Think of ReZ+iImZ as z
ImZ = EPoint[Y]; // Think of ReX+iImX as x, x=z in Mandelbrot
/////////////////////////////////////////////////////////////////////////////
/// Interpolate between pure Mandelbrot and pure Julia sets://///////////////
/// ///
/// MJFract = 0 Pure Mandelbrot ///
/// MJFract = 1 Pure Julia ///
/// ///
// ReC = TPat->Vals.Fractal.Coord[U]; //Uncomment this!!! ///
// ImC = TPat->Vals.Fractal.Coord[V]; //Uncomment this!!! ///
ReX = (1.0-MJFract)*ReZ + MJFract*ReC; ///
ImX = (1.0-MJFract)*ImZ + MJFract*ImC; ///
/////////////////////////////////////////////////////////////////////////////
MinAbsZ2 = ReZ*ReZ + ImZ*ImZ;
if(MinAbsZ2 > 4.0)
return(fractal_exterior_color(col, ReZ, ImZ));
DBL ZTo2ToN[16]; // ZTo2ToN[2*n]+%i*ZTo2ToN[2*n+1] will become (z^(2^n))
switch(Exponent)
{
case 2:
{
for(col = 0; col < it_max; col++)
{
double *p=ZTo2ToN;
p[0] = ReZ; // z^1
p[1] = ImZ;
NEXT_SQUARE; // z^2
ReZ = ZTo2ToN[2] + ReX;
ImZ = ZTo2ToN[3] + ImX;
DBL AbsZ2 = ReZ*ReZ + ImZ*ImZ;
if(AbsZ2 > 4.0)
return(fractal_exterior_color(col, ReZ, ImZ));
//Look, Ma, no ifs!
MinAbsZ2 = 0.5*(MinAbsZ2+AbsZ2-fabs(MinAbsZ2-AbsZ2));
}
break;
}
case 3:
{
for(col = 0; col < it_max; col++)
{
double *p=ZTo2ToN;
p[0] = ReZ; // z^1
p[1] = ImZ;
NEXT_SQUARE; // z^2
ReZ = ZTo2ToN[0]*ZTo2ToN[2] - ZTo2ToN[1]*ZTo2ToN[3] + ReX;
ImZ = ZTo2ToN[0]*ZTo2ToN[3] + ZTo2ToN[1]*ZTo2ToN[2] + ImX;
DBL AbsZ2 = ReZ*ReZ + ImZ*ImZ;
if(AbsZ2 > 4.0)
return(fractal_exterior_color(col,ReZ,ImZ));
MinAbsZ2 = 0.5*(MinAbsZ2+AbsZ2-fabs(MinAbsZ2-AbsZ2));
}
break;
}
case 4:
{
for(col = 0; col < it_max; col++)
{
double *p=ZTo2ToN;
p[0] = ReZ; // z^1
p[1] = ImZ;
NEXT_SQUARE; // z^2
NEXT_SQUARE; // z^4
ReZ = ZTo2ToN[4] + ReX;
ImZ = ZTo2ToN[5] + ImX;
DBL AbsZ2 = ReZ*ReZ + ImZ*ImZ;
if(AbsZ2 > 4.0)
return(fractal_exterior_color(col, ReZ, ImZ));
MinAbsZ2 = 0.5*(MinAbsZ2+AbsZ2-fabs(MinAbsZ2-AbsZ2));
}
break;
}
default:
for(col = 0; col < it_max; col++)
{
double *p=ZTo2ToN;
p[0] = ReZ; // z^1
p[1] = ImZ;
switch (NBits)
{//Fills up NBits complex entries of ZTo2ToN
case 8: NEXT_SQUARE; //Uneeded if Exp < 128
case 7: NEXT_SQUARE; //Uneeded if Exp < 64
case 6: NEXT_SQUARE; //Uneeded if Exp < 32
case 5: NEXT_SQUARE; //Uneeded if Exp < 16
case 4: NEXT_SQUARE; //Uneeded if Exp < 8
case 3: NEXT_SQUARE; //Uneeded if Exp < 4
default: NEXT_SQUARE; //Always needed because Exp>1
}
p=ZTo2ToN;
ReZToE = ZTo2ToN[Nth1[0] ]; //One power comes in for free!
ImZToE = ZTo2ToN[Nth1[0]+1];
switch (NOnes)
{//Fills up NBits complex entries of ZTo2ToN
case 8: ACCUM_POWER(7); //Uneeded if NOnes < 8
case 7: ACCUM_POWER(6); //Uneeded if NOnes < 7
case 6: ACCUM_POWER(5); //Uneeded if NOnes < 6
case 5: ACCUM_POWER(4); //Uneeded if NOnes < 5
case 4: ACCUM_POWER(3); //Uneeded if NOnes < 4
case 3: ACCUM_POWER(2); //Uneeded if NOnes < 3
case 2: ACCUM_POWER(1); //Uneeded if NOnes < 2
}
ReZ = ReZToE + ReX; // z <= z^Exp + x
ImZ = ImZToE + ImX;
DBL AbsZ2 = ReZ*ReZ + ImZ*ImZ;
if(AbsZ2 > 4.0)
return(fractal_exterior_color(col, ReZ, ImZ));
MinAbsZ2 = 0.5*(MinAbsZ2+AbsZ2-fabs(MinAbsZ2-AbsZ2));
}
}
return(fractal_interior_color(col, ReZ, ImZ, MinAbsZ2));
}
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Algo wrote:
> A rough cycle count comparison is:
> add mul fld fxch total
> NEWER_mandelx_pattern: 5*3 3*3 3*1 0 27
> mandel_pattern: 4*3 3*3 2*1 6*1 29
So just that I get you right: Are you suggesting because your code can trick
a specific compiler on a specific architecture to produce slightly better
code for some processors of that architecture, the current code in POV-Ray
should be changed? Or is your code universally better on an algorithmic level?
Thorsten
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Thorsten Froehlich <tho### [at] trfde> wrote:
> So just that I get you right: Are you suggesting because your code can trick
> a specific compiler on a specific architecture to produce slightly better
> code for some processors of that architecture, the current code in POV-Ray
> should be changed? Or is your code universally better on an algorithmic level?
The current mandel_pattern() function could be kept, and his version
called for all the other variants.
If enough compilers and platforms show that his variant is always faster
than the regular mandel_pattern(), then the latter could be dropped at some
point.
--
- Warp
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
New Fractal
Just as Warp said, the large exponents lead to standard Mandelbrot and Julia
patterns that are nearly round and quite boringly repetitive around their
perimeters.
However the new Mandelbrot/Julia hybrids (especially with values of the MJFract
extrapolating past Julia, i.e. > 1) lead to football shapes with one
Mandelbrot-ish side and one Julia-ish. The interest comes at the cusps where
the two regimes interfere.
I am unable to post the 1-Giga-pixel image that took about 20 minutes to render:
Exponent = 255
JuliaX = [0.353,0.288]
MJFract = 2.0
Rgn Rect = [[-0.2566,+0.8909],[-0.1316,+1.0159]]
Res = [32K,32K]
Of course, the Exponent=2 case is very interesting everywhere. And this
suggests how to proceed: A more subtle hybridization which has its number of
cusps equal to its exponent.
detail from it (Exponent=31) has now been posted. I think it is a rather
interesting pattern.
--Algo
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|