POV-Ray : Newsgroups : povray.beta-test : Fwd: Stack Size Testers Wanted Server Time
26 Dec 2024 19:10:16 EST (-0500)
  Fwd: Stack Size Testers Wanted (Message 1 to 10 of 12)  
Goto Latest 10 Messages Next 2 Messages >>>
From: clipka
Subject: Fwd: Stack Size Testers Wanted
Date: 15 Feb 2019 19:44:47
Message: <5c675cff$1@news.povray.org>
Forgot to cross-post to `povray.beta-test`:

-------- Weitergeleitete Nachricht --------
Betreff: Stack Size Testers Wanted
Datum: Sat, 16 Feb 2019 01:40:46 +0100
Von: clipka <ano### [at] anonymousorg>
Newsgruppen: povray.macintosh

Hi folks,

I'm looking for guinea pigs for a particular test.


Here's the background:

A while ago we (well, some of you) were having issues with crashes due 
to insufficient thread stack size on Mac OS X, or in one case even on 
Linux; we solved these issues with a workaround to override the 
per-thread stack size (or, in the Linux case, increase that override).

In the meantime, some changes have been made to the code which should 
have reduced POV-Ray's stack requirements, so my hope is that the 
workaround is no longer necessary - which would be great news, because 
we could get rid of the entire boost thread library.


Here's where you come in:

If you have ever experienced problems related to the thread stack size, 
and still have a scene available that ran into these issues, then I'd 
like to enlist your help.

@dick balaska:

I'm not sure if you're aware, but I know you were affected; see your 
post on 2017-02-12 in povray.beta-test titled "crash in origin/master" 
(http://news.povray.org/povray.beta-test/thread/%3C58a1e32f%241%40news.povray.org%3E/)


Here's what I'd like you to do:


(1) Reproduce the old problem and workaround [optional]

- Grab the source code of a sufficiently OLD v3.8.0-alpha (BEFORE 
v3.8.0-alpha.9436902; anything built BEFORE December 2017 should do), 
v3.7.1-alpha/beta or even the latest v3.7.0.

- In the file `source/backend/configbackend.h`, place the following 
lines at the end of the file:

     #undef POV_THREAD_STACK_SIZE
     #define POV_THREAD_STACK_SIZE (512 * 1024) // 512 KiB

- Build POV-Ray.

- Run whatever scene you remember crashing on you.

- Verify that the scene does indeed crash. (If not, your scene does not 
seem to be a suitable test candidate.)

- If you want to go the extra mile, increase POV_THREAD_STACK_SIZE to 
see at which size the scene ceases to crash. (I recommend doubling the 
value; you shouldn't have to go any further than 8*1024*1024.)


(2) Test whether the workaround is still required

- Grab the source code of a sufficiently NEW v3.8.0-alpha (AT LEAST 
v3.8.0-alpha.9436902 or newer; anything built in 2018 or 2019 should do; 
I'd recommend the newest tagged alpha though).

- Apply the same changes to `source/backend/configbackend.h` as 
described above.

- Build POV-Ray.

- Run your test scene.

- Observe whether the scene crashes or not.

- If you want to go the extra mile, change POV_THREAD_STACK_SIZE to see 
at which size the behaviour changes: If the scene crashes, increase the 
value until it ceases to; if the scene does not crash, decrease the 
value until it does. (I recommend doubling / halving the value.)


(3) Report your observations.


Your help is very much appreciated!


Post a reply to this message

From: William F Pokorny
Subject: Re: Fwd: Stack Size Testers Wanted
Date: 16 Feb 2019 10:17:34
Message: <5c68298e$1@news.povray.org>
On 2/15/19 7:44 PM, clipka wrote:
> Forgot to cross-post to `povray.beta-test`:
> 
> -------- Weitergeleitete Nachricht --------
> Betreff: Stack Size Testers Wanted
> Datum: Sat, 16 Feb 2019 01:40:46 +0100
> Von: clipka <ano### [at] anonymousorg>
> Newsgruppen: povray.macintosh
> 
> Hi folks,
> 
> I'm looking for guinea pigs for a particular test.
> 
> 
> Here's the background:
> 
> A while ago we (well, some of you) were having issues with crashes due 
> to insufficient thread stack size on Mac OS X, or in one case even on 
> Linux; we solved these issues with a workaround to override the 
> per-thread stack size (or, in the Linux case, increase that override).
> 
...

I've attached an attractors.zip file to a comment to the github issue:

https://github.com/POV-Ray/povray/issues/239

which is a related open issue. One which should be closed if recent 
updates have better addressed the problem.

The zip contains a set of test cases from ThH (Thorsten) at the bottom 
should others want to test in their environment(s). Two or more of which 
failed still after the updates in early 2017. Third down being one of 
these.

These now all run cleanly for me. Ubuntu 18.04 at master (v38) at commit 
054e75c.

And 3rd fails still for me going back to v3.71 release branch at commit 
9808f53 (Jun 24 2017 with updates).

Christoph, Remember we at some point ran across some information 
suggesting windows runs with stack monitoring and splitting/growing the 
stack. Perhaps why windows could always run with a smaller stack and why 
we never saw such fails on windows. Stack splitting as needed due growth 
could be turned on with the gnu compiler - at a performance hit. This 
suggested we might want to turn off such splitting in windows for a 
performance gain. All two year old looks though, and nothing ever tried 
as far as I know, so who knows how things stand today. If just a windows 
compiler setting for you though, might be worth just trying a windows 
compile with splitting off if we are in good shape thread-stack size 
wise.

This leads to me close with being somewhat worried about the suggested 
test size of 512KB. We still have some largish stack allocations - the 
sturm solver when we increased the max order from 15 to 35(why ?) in 
v3.7 now causes a 600KB plus (??? don't recall exactly) allocation on 
the stack - at a considerable performance penalty I'll add(1) - no 
matter the actual incoming equation order. This alone might cause your 
suggested size of 512KB to fail. Well, except perhaps on windows where 
presumably splitting still in place.

I did not run master with 512KB as you suggested or a clang compile with 
the current larger default. I'm focused elsewhere at the moment and 
changes to the more common headers are painfully slow for me to compile 
and try.

Bill P.

(1) - My C++ ish attempts to address this have all been really slow or 
not worked at all when I try to get fancy/fast with raw memory 
allocations and pointers. C and some C++ compilers (IBM's XLC being one) 
support the needed dynamic structure array-size allocation mechanism 
with high performance. This a reason - among others - why I'm toying 
some with taking the common solver code back to straight C. We will 
soon, I think? - move to a mixed C++/C mode in picking up FreeType 
which, as I remember, is actually a C library - so perhaps not so crazy.


Post a reply to this message

From: clipka
Subject: Re: Fwd: Stack Size Testers Wanted
Date: 16 Feb 2019 16:39:11
Message: <5c6882ff$1@news.povray.org>
Am 16.02.2019 um 16:17 schrieb William F Pokorny:

> This leads to me close with being somewhat worried about the suggested 
> test size of 512KB.

The "target" size of 512 kiB is an invariant - that's the per-thread 
stack size we would get on OS X by default.

> We still have some largish stack allocations - the 
> sturm solver when we increased the max order from 15 to 35(why ?) in 
> v3.7 now causes a 600KB plus (??? don't recall exactly) allocation on 
> the stack - at a considerable performance penalty I'll add(1) - no 
> matter the actual incoming equation order. This alone might cause your 
> suggested size of 512KB to fail. Well, except perhaps on windows where 
> presumably splitting still in place.

Where would those 600 kB be allocated?

The only out-of-the-ordinary local variable I see in all of the 
polynomial solver code is `sseq` in `polysolve()`, which is 36 elements 
of type `polynomial`, which in turn consists of 1 integer and 36 
doubles. That's about 10 kB, not 600.

> (1) - My C++ ish attempts to address this have all been really slow or 
> not worked at all when I try to get fancy/fast with raw memory 
> allocations and pointers.

If the variable is local to a function that is guaranteed to never be 
called recursively, the easiest solution to guarantee speedy allocation 
and not hogging the stack is to change the variable declaration to 
`thread_local`. This is essentially the same as `static`, but with each 
thread having its own copy of the variable.

> C and some C++ compilers (IBM's XLC being one) 
> support the needed dynamic structure array-size allocation mechanism 
> with high performance. This a reason - among others - why I'm toying 
> some with taking the common solver code back to straight C. We will 
> soon, I think? - move to a mixed C++/C mode in picking up FreeType 
> which, as I remember, is actually a C library - so perhaps not so crazy.

Veto to that: All of POV-Ray should be valid C++11 code, so that POV-Ray 
can be compiled with only a single C++ compiler. 3rd party libraries are 
another matter; we expect those to be present in binary form (except on 
Windows, where we expect their source code to be compatible with MSVC's 
dialect of C/C++), and the headers of modern C libs are almost 
invariably designed as hybrid C/C++ headers, in the sense that they use 
pre-processor features to behave as either C or C++ code, depending on 
the compiler used.

There is currently only one genuine C source file ihn POV-Ray proper, 
namely `povms.c`, and even that one is designed as a C/C++ hybrid and 
compiled by including it from a `.cpp` file. (The rationale for this C 
file's existence being that it - in theory - allows 3rd party genuine C 
programs to drive the POV-Ray back-end via POVMS.)


Post a reply to this message

From: clipka
Subject: Re: Fwd: Stack Size Testers Wanted
Date: 16 Feb 2019 19:06:37
Message: <5c68a58d$1@news.povray.org>
Just to clear up some confusion:

On Windows, stack size is _not_ "virtually unlimited" - nor does Windows 
support split stacks. Rather, each thread's stack occupies a contiguous 
fixed block of address space. The only thing that's dynamic about it is 
that only a small portion is initially backed by physical (or even 
virtual) memory.

The per-thread stack size is configured at link time, and just happens 
to be set to a humongously large size in the POV-Ray project.


Post a reply to this message

From: clipka
Subject: Re: Fwd: Stack Size Testers Wanted
Date: 16 Feb 2019 20:53:51
Message: <5c68beaf$1@news.povray.org>
Am 16.02.2019 um 16:17 schrieb William F Pokorny:

> I've attached an attractors.zip file to a comment to the github issue:
> 
> https://github.com/POV-Ray/povray/issues/239

Thanks - yes, that seems to be the right stuff for testing this.

To reduce parsing and rendering time, I'm testing a slightly modified 
version with 100 instead of 1000 iterations, i.e.:

clifford (1.6,1.9, 0.5,-1.4, -0.4,-0.1, -1.6,1.8, 100, 0.0005, 0.00375, 
cliff_mat, cliff_trans)


On my Windows machine and the x64 Debug version (which should be a bit 
more memory-hungry than the others), I've just compared commit accc779d 
(which did away with the `FixedSimpleVector`, back in 2017-11-10) with 
the commit immediately before, with the following results:

- Before: Required a POV_THREAD_STACK_SIZE of 2 MiB.
- After: Requires a lousy 16 kiB.

I also ran the last test with 1000 iterations just for giggles, with the 
same result: 16 kiB was still sufficient.


I also see no trouble with the Sturmian root solver at 16 kiB (tested 
with `objects/torus1.pov` sample scene, with `sturm` added to torus 
definitions).


So, bottom line: Judging from this first test, even OS X's 512 kiB 
should be enough for everybody now ;)


Post a reply to this message

From: William F Pokorny
Subject: Re: Fwd: Stack Size Testers Wanted
Date: 17 Feb 2019 08:07:41
Message: <5c695c9d$1@news.povray.org>
On 2/16/19 4:39 PM, clipka wrote:
>> We still have some largish stack allocations - the sturm solver when 
>> we increased the max order from 15 to 35(why ?) in v3.7 now causes a 
>> 600KB plus (??? don't recall exactly) allocation on the stack - at a 
>> considerable performance penalty I'll add(1) - no matter the actual 
>> incoming equation order. This alone might cause your suggested size of 
>> 512KB to fail. Well, except perhaps on windows where presumably 
>> splitting still in place.
> 
> Where would those 600 kB be allocated?

No clue. :-( My mind not being so good again of late is my bet... When I 
look at the comments I've added to polynomialsolver.h they read:

/// @def MAX_ORDER
/// Maximum supported polynomial order.
///
/// @todo
///     This currently carries a large, fixed per polysolve() call
///     memory allocation on the stack. Size is on the order of
///     (MAX_ORDER+1)*int + PRECISE_FLOAT * (MAX_ORDER+1)^2
///     which impacts performance and performance stability especially
///     for threads > cores. Allocation based on current equation order
///     would be better. My C++ attempts at this all badly slower. C itself
///     supports a struct "flexible array member" feature, however this
///     not a C++11 standard though some C++ compilers support it.
///
#define MAX_ORDER 35

which aligns with what you state below so at some point last year I had 
it more or less right...

> 
> The only out-of-the-ordinary local variable I see in all of the 
> polynomial solver code is `sseq` in `polysolve()`, which is 36 elements 
> of type `polynomial`, which in turn consists of 1 integer and 36 
> doubles. That's about 10 kB, not 600.
> 
>> (1) - My C++ ish attempts to address this have all been really slow or 
>> not worked at all when I try to get fancy/fast with raw memory 
>> allocations and pointers. 

I have in my head I tried thread_local... I'll try this again.

However, a substantial part of the performance degrade looked to be 
related to impacts to caching behavior - at least on my i3 machine. I've 
tested hard coding to known orders. Doing this I see jumps in 
performance improvement as the size drops which seem to be related to 
how well the allocations fit to some memory alignment.

Said another way, it looks like sseg being 36 doubles + int in size for 
each polynomial means multiple equations are not optimally folded into 
some memory alignment. Allocating int + 5 doubles for the sseq size when 
that's all you need is substantially faster(2-3% as I remember - for 
what my memory is worth these days).

And still I wonder about the value of having a max order 35 over 15... 
Did someone actually need 35? Accuracy degrades as order increases - 
even 15 is pushing it hard in the general case working with floats. If 
the >15 order polynomials are very sparse as presented to polysolve() or 
your working with some polynomials that happen to be stable - maybe 
things work acceptably beyond 15, but I have doubts... All very high 
order solvers I've looked at are using exact - or extendable accuracy - 
math (ie integers) for coefficient representation and calculation.

Bill P.


Post a reply to this message

From: clipka
Subject: Re: Fwd: Stack Size Testers Wanted
Date: 17 Feb 2019 14:30:34
Message: <5c69b65a$1@news.povray.org>
Am 17.02.2019 um 14:07 schrieb William F Pokorny:

> /// @def MAX_ORDER
> /// Maximum supported polynomial order.
> ///
> /// @todo
> ///     This currently carries a large, fixed per polysolve() call
> ///     memory allocation on the stack. Size is on the order of
> ///     (MAX_ORDER+1)*int + PRECISE_FLOAT * (MAX_ORDER+1)^2
> ///     which impacts performance and performance stability especially
> ///     for threads > cores. Allocation based on current equation order
> ///     would be better. My C++ attempts at this all badly slower. C itself
> ///     supports a struct "flexible array member" feature, however this
> ///     not a C++11 standard though some C++ compilers support it.
> ///
> #define MAX_ORDER 35

BTW, that "flexible array member" feature won't help you speed up 
things, as it requires dynamic allocation on the heap, which is 
super-expensive; whatever you allocate on the stack must necessarily 
have a fixed size.

The feature is nothing more than syntactical sugar for accessing the 
memory right after the struct as a pointer to a particular type, i.e. 
allowing to rewrite code like this:

     typedef struct { /* ... */ } MyStructT;
     MyStructT* myStructPtr;
     ...
     unsigned char* startOfStruct = (unsigned char*)(myStructPtr);
     unsigned char* endOfStruct   = startOfStruct + sizeof(*myStructPtr);
     T* myArray = (T*)(endOfStruct);
     Foo = myArray[i];

as:

     typedef struct { /* ... */ T myArray[]; } MyStructT;
     ...
     Foo = myStructPtr->myArray[i];

but it doesn't magically solve the problem of how to allocate the proper 
amount of memory for the object.


Also, "threads > cores" is not a sane use case (unless you mean "threads 
 > physical cores"), as there's virtually no way this might increase 
performance by any considerable amount.


> However, a substantial part of the performance degrade looked to be 
> related to impacts to caching behavior - at least on my i3 machine. I've 
> tested hard coding to known orders. Doing this I see jumps in 
> performance improvement as the size drops which seem to be related to 
> how well the allocations fit to some memory alignment.
> 
> Said another way, it looks like sseg being 36 doubles + int in size for 
> each polynomial means multiple equations are not optimally folded into 
> some memory alignment. Allocating int + 5 doubles for the sseq size when 
> that's all you need is substantially faster(2-3% as I remember - for 
> what my memory is worth these days).

I can see two ways how to address these things:

(A) Use more C++ instead of less, turning the solver into a template 
with a maximum order parameter.

(B) Refactor the code to use a flat array, and place the data into the 
first N*N elements.


Also, constructing `sseq` in such a way that its `polynomial` elements 
end up aligned with cache line boundaries might already do the trick.

This might be achieved by declaring `polynomial` to be aligned to 
multiples of 64 or even 128 bytes, using e.g. `struct alignas(64) 
polynomial { ... };`.

Alternatively, we could reduce MAX_ORDER to the next lower value such 
that sizeof(polynomial) ends up a multiple of 64 or 128, and hope that 
the compiler is smart enough to align sseq on a cache line boundary. 
This means we would want MAX_ORDER+2 to be a multiple of 8 or 16 
(presuming sizeof(int)<=sizeof(double) and 
alignof(double)==sizeof(double)), which would be satisfied by MAX_ORDER=30.


> And still I wonder about the value of having a max order 35 over 15... 
> Did someone actually need 35? Accuracy degrades as order increases - 
> even 15 is pushing it hard in the general case working with floats. If 
> the >15 order polynomials are very sparse as presented to polysolve() or 
> your working with some polynomials that happen to be stable - maybe 
> things work acceptably beyond 15, but I have doubts... All very high 
> order solvers I've looked at are using exact - or extendable accuracy - 
> math (ie integers) for coefficient representation and calculation.

IIRC there was a use case for a `polynomial` primitive that needed an 
order of 16, and Jerome decided it would be nonsense to increase it by 1 
just to wait for the next use case to require an order of 17; so he went 
ahead and increased the limit to something that was not totally 
arbitrary. 35 was a natural choice because above this threshold the 
`binomials` array of constants would overflow the data type on some 
machines (32-bit `unsigned int`).

If we want to use a lower maximum order, 19 would be a logical choice, 
as it's the threshold at which we could change the datat type of the 
`binomials` array from `unsigned int` to `unsigned short` to save some 
memory.


Last not least, don't forget that a performance change of 2% in the root 
solver does not translate to a change of 2% in total render time. You 
probably have a clearer picture of how much time POV-Ray spends in the 
root solver, but remember that there is such a thing as over-optimization.


Post a reply to this message

From: William F Pokorny
Subject: Re: Fwd: Stack Size Testers Wanted
Date: 17 Feb 2019 18:41:58
Message: <5c69f146$1@news.povray.org>
On 2/17/19 2:30 PM, clipka wrote:
> Am 17.02.2019 um 14:07 schrieb William F Pokorny:
> 
...
> 
> BTW, that "flexible array member" feature won't help you speed up 
> things, as it requires dynamic allocation on the heap, which is 
> super-expensive; whatever you allocate on the stack must necessarily 
> have a fixed size.

OK. Surprised the heap allocation is slower than one on the stack. I 
figured the allocation and free mechanisms were more or less the same - 
just being a difference of where in memory it happened. Guess not?

Also surprised allocations on the stack have to be of fixed size. Is it 
so on the function call just one allocation can happen on the stack for 
all the dynamic variables in the function?

> 
> 
> Also, "threads > cores" is not a sane use case (unless you mean "threads 
>  > physical cores"), as there's virtually no way this might increase 
> performance by any considerable amount.
> 

Yes, "threads > physical cores."

> 

I'm working with the solver branch today and was able to run the 
following without too much effort.

My i3 has two physical cores, 2 threads each. lemonSturm.pov scene long 
used for solver work. Below comparing today's MAX_ORDER 35 vs the 
necessary MAX_ORDER 4 allocation. Commands being of the form:

/usr/bin/time povray lemonSturm.pov -j +wt3 -cc -fn -d -p

+wt4 (what POV-Ray would use by default)
--------------------------------------------
22.99 -> 21.49 ---> -6.52%

+wt3
----
19.11 -> 18.11 ---> -5.23%

+wt2
----
14.32 -> 13.80 ---> -3.63%

+wt1
----
14.27 -> 13.74 ---> -3.71%


> 
> I can see two ways how to address these things:
> 
> (A) Use more C++ instead of less, turning the solver into a template 
> with a maximum order parameter.
> 
> (B) Refactor the code to use a flat array, and place the data into the 
> first N*N elements.
> 
> 
> Also, constructing `sseq` in such a way that its `polynomial` elements 
> end up aligned with cache line boundaries might already do the trick.
> 
> This might be achieved by declaring `polynomial` to be aligned to 
> multiples of 64 or even 128 bytes, using e.g. `struct alignas(64) 
> polynomial { ... };`.
> 
> Alternatively, we could reduce MAX_ORDER to the next lower value such 
> that sizeof(polynomial) ends up a multiple of 64 or 128, and hope that 
> the compiler is smart enough to align sseq on a cache line boundary. 
> This means we would want MAX_ORDER+2 to be a multiple of 8 or 16 
> (presuming sizeof(int)<=sizeof(double) and 
> alignof(double)==sizeof(double)), which would be satisfied by MAX_ORDER=30.
> 

Thanks for the ideas. I add them to my notes to try at some point.

> 
>> And still I wonder about the value of having a max order 35 over 15... 
...
> 
> IIRC there was a use case for a `polynomial` primitive that needed an 
> order of 16, and Jerome decided it would be nonsense to increase it by 1 
> just to wait for the next use case to require an order of 17; so he went 
> ahead and increased the limit to something that was not totally 
> arbitrary. 35 was a natural choice because above this threshold the 
> `binomials` array of constants would overflow the data type on some 
> machines (32-bit `unsigned int`).
> 
> If we want to use a lower maximum order, 19 would be a logical choice, 
> as it's the threshold at which we could change the datat type of the 
> `binomials` array from `unsigned int` to `unsigned short` to save some 
> memory.
> 

Thanks for the background. I'll put it on my list to look at what 19 
would give us performance wise.

> 
> Last not least, don't forget that a performance change of 2% in the root 
> solver does not translate to a change of 2% in total render time. You 
> probably have a clearer picture of how much time POV-Ray spends in the 
> root solver, but remember that there is such a thing as over-optimization.

Understand. My measurements are almost always in total render time.

I admit to chasing performance harder than some might. Comes from my 
belief you need to stay after it. Otherwise you get fooled, due degraded 
performance everywhere over time, that certain improvements don't matter 
enough to do them. A reason I think about that 3-6% being more than it 
is as measured is I know there are at least two other substantial 
performance issues in the current sturm solver. If those can be 
addressed, we'll be talking about a good bit more than 3-6% improvement 
for exact-order allocations.

Bill P.


Post a reply to this message

From: dick balaska
Subject: Re: Fwd: Stack Size Testers Wanted
Date: 17 Feb 2019 21:05:09
Message: <5c6a12d5$1@news.povray.org>
On 2/17/19 6:41 PM, William F Pokorny wrote:

> 
> OK. Surprised the heap allocation is slower than one on the stack. I
> figured the allocation and free mechanisms were more or less the same -
> just being a difference of where in memory it happened. Guess not?

The compiler figures out how much stack space a function call needs, and
allocating that memory is simply a matter of moving the stack pointer
*once*, a single register operation, upon entry to a function.

Heap requires a separate function call for each object to be allocated.
That function examines its tables trying to find a "best fit" for the
size requested, mark those tables as allocated, and return a pointer.
For the reverse, free'ing tries to merge freed memory [1].  Much, much
more expensive than stack.

[1] Old MS-DOS systems you could kill it by allocating all of memory as
1KB blocks, free all those blocks, then try to allocate a 2KB block. Out
of memory.

> 
> Also surprised allocations on the stack have to be of fixed size. Is it
> so on the function call just one allocation can happen on the stack for
> all the dynamic variables in the function?

See above.

-- 
dik
Rendered 1024 of 921600 pixels (0%)


Post a reply to this message

From: clipka
Subject: Re: Fwd: Stack Size Testers Wanted
Date: 17 Feb 2019 21:39:02
Message: <5c6a1ac6$1@news.povray.org>
Am 18.02.2019 um 00:41 schrieb William F Pokorny:

>> BTW, that "flexible array member" feature won't help you speed up 
>> things, as it requires dynamic allocation on the heap, which is 
>> super-expensive; whatever you allocate on the stack must necessarily 
>> have a fixed size.
> 
> OK. Surprised the heap allocation is slower than one on the stack. I 
> figured the allocation and free mechanisms were more or less the same - 
> just being a difference of where in memory it happened. Guess not?

Definitely not.

On the stack, allocation and release only ever happens at the top of the 
stack, and the free area is always a single contiguous block; that's 
trivial to keep track of: A single pointer to the current start of the 
free region (with most CPUs providing a dedicated register for that: the 
Stack Pointer), and some means to detect when you're running past the 
end of the memory section you have set aside for the stack (typically 
done by setting up a "guard page" at the end of the stack, and crying 
bloody murder if that guard page is ever accessed).

At least that's the classic stack implementation; new fancy stuff like 
split stacks change the picture, but that's why they're slower.

On the heap, release can happen anywhere anytime, and you may end up 
with a gazillion separate blocks of free memory you have to keep track 
of, test if you can merge any of them, and search through whenever you 
need to allocate memory again to find a region of suitable size. "Heap 
fragmentation" being the buzzword there.


> Also surprised allocations on the stack have to be of fixed size. Is it 
> so on the function call just one allocation can happen on the stack for 
> all the dynamic variables in the function?

It's essentially a matter of the programm language used. If you were to 
implement a function in assembler, there is nothing stopping you from 
advancing the stack pointer by a computed value (which is all that's 
necessary to allocate memory on the stack), provided you later make sure 
you revert that change to the stack pointer (and also follow some other 
conventions; e.g. some conventions require the stack pointer to be 
16-byte aligned when invoking some other function).

As a matter of fact, as I just discovered, C99 _does_ support such a 
thing; but it's not "flexible array member" but "variable length 
arrays"; and they can't be nested in structs, and C++ did not adopt them.

One reason why C++ did not adopt them is some entirely unrelated 
fundamental stumbling block rooted in the C++ type concept. But I guess 
they might also mess up - or at least complicate - exception handling; 
"stack unwinding" would be the buzzword there.


Post a reply to this message

Goto Latest 10 Messages Next 2 Messages >>>

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.