Re: Thread Stacks
- From: "Eric P." <eric_pattison@xxxxxxxxxxxxxxxxxx>
- Date: Wed, 30 Nov 2005 16:44:08 -0500
Jeremy Linton wrote:
>
> Eric P. wrote:
>
> > Dave Hansen wrote:
> >
> >>On Tue, 29 Nov 2005 10:40:24 -0500 in comp.arch, "Eric P."
> >><eric_pattison@xxxxxxxxxxxxxxxxxx> wrote:
> >>>I conclude therefore that the correct solution is to specify the
> >>>reserved linear stack size for each individual thread at create
> >>>and that means it is as an argument to the ThreadCreate function.
> >>>Note that this is NOT how Win32 works. As an optimization one could
> >>
> >>What, then, is the purpose of the dwStackSize parameter to the Win32
> >>function CreateThread?
> >
> >
> > It sets the Commit size. I have no idea why MS thinks I would want
> > to since the commit pages are automatically expanded.
>
> For performance of course... That way you don't have to worry about
> taking individual page faults for each 4k page in the stack.
Which wouldn't be an issue if they didn't expand the commit
space one page at a time in the first place.
> But your
> partially wrong about the commit vs reserve issue (which should be
> thought of as the MAX stack size ever). The SDK documentation says:
>
> "To change the initially committed stack space, use the dwStackSize
> parameter of the CreateThread, CreateRemoteThread, or CreateFiber
> function. This value is rounded up to the nearest page. Generally, the
> reserve size is the default reserve size specified in the executable
> header. However, if the initially committed size specified by
> dwStackSize is larger than the default reserve size, the reserve size is
> this new commit size rounded up to the nearest multiple of 1 MB. "
You are correct. My apologies.
The CreateThread docs have been updated and missed that.
That info was previously buried at the end of a knowledge base article.
I was going on what CreateThread used to say:
"dwStackSize: Specifies the size, in bytes, of the stack for the new
thread. If 0 is specified, the stack size defaults to the same size
as that of the primary thread of the process. <...> CreateThread tries
to commit the number of bytes specified by dwStackSize, and fails if
the size exceeds available memory."
Even so, I believe my concerns below still apply.
> So basically the linker option sets the default, which is always used
> unless someone specifies a stack size larger, in which case it is grown.
>
> Now 1Meg seems a tiny amount of stack today, but its been that way for a
> long time, and I'm sure that back in '93 or so when everyone had 32-bit
> machines with 8 megs or so it seemed like a lot of space for a stack.
> Its easy to change, just flip the linker switch and give yourself a few
> gigs with windows-64, but it also directly influences the number of
> threads that can be created. So, it is sort of a trade off, how big is
> your max stack vs how many threads you can create. You can't grow past
> the reserve amount because there is a good chance some other thread has
> placed its stack right before yours. On 32-bit linux, it seems the
> decision (not true anymore) had been made to limit the system to 256
> threads so the stack size could be bigger. The application I currently
> work on needs a lot of threads (large SMP hardware, with lots of blocked
> threads just sitting around) so it would have been nice to have more,
> since we consume very little stack space. So, the problem can go both
> ways. At least in windows, its easy to mix and match the stack sizes
> based on the thread requirements rather than having them fixed (unless
> there is a way to get linux to dynamically set the thread size, that I
> don't know about). So, this is an instant win for windows in my book....
My concern is what happens when you make the default smaller
so you can run more threads. Any CreateThread calls in library
functions that specified a commit size of 0 will also change.
I have had situations where I know that my threads do not require
1 MB. But I cannot lower the stack size for just those threads,
and if I change the global default it may affect all threads,
even ones my code did not create.
Based on this experience I conclude that it would have been better
design for CreateThread to require the caller to specify the both
commit & reserve as arguments for each thread.
The global process defaults look easy, but seem just plain dangerous
to me because they allow stacks to be adjusted without regard to usage.
> >>Windoze commits only a page at a time (by default) to a thread's stack
> >>as each page boundary is exceeded. How is this different from what
> >>you describe?
> >>
> >>I'm not saying you are wrong. I'm trying to understand how your
> >>solution differs from Win32.
> >
> >
> > Currently, for every allocation > 4 KB or by alloca, it touches every
> > page one by one. That not only wastes time in a loop touching pages
> > over and over, as far as I can tell it only extends the stack one
> > page at a time. Dumb. Dumb. Dumb.
> Hu? That is what the commit flags to ZwAllocateVirtualMemory() (exposed
> through VirtualAllocEx() does for you... The NT kernel doesn't really
> seem to know anything about user space stacks. That is all done through
> the win32 subsystem. So in that regard, its sort of silly to complain
> about the OS just demand paging the one page you ask for.
I know that, but I am not referring to the commit space.
I am referring to how the committed stack grows into the reserve space.
It was not actually clear, to me anyway, which code is doing this.
The documentation does not say, but tends to imply that it should
be the Win32 user mode code. However I have looked for exceptions
generated into user mode that would trigger the stack expansion and
find none. That is why I put the 'kernel(?)' in my previous message.
(It was long winded enough as it was.)
Anyway, I think you should try single stepping through he assembler
for 'alloca' or a _chkstk call. Alloca calls __alloca_probe which is
almost identical to the code of __chkstk that is also called whenever
you declare a local object larger than 4 KB on the stack.
Both just loop touching pages one by one. Every time it is invoked.
If a routine ever declared a 20 MB or 50 MB array...
void Foo (void)
{ int vec[20*(1<<20)];
}
every call to Foo () does the same check.
> >
> > To do this optimally is simple.
> > It requires 3 TEB values: Stack Top, Bottom and Low Water Mark.
> > Top and Bottom cover the reserved range.
> > Mark is the low water commit point, between top and bottom.
> How well this works is probably application dependent. You could
> probably implement your own version using SetThreadStackGuarantee(). The
> other option is to call ZwWriteWatch() on your stack region and check to
> see which pages have been accessed, and allocate/commit extra space as
> you see necessary.
That is great, but SetThreadStackGuarantee is only in Win64.
I also notice an update to CreateThread for XP that allows
you to explicitly specify the reserve size in CreateThread with
the STACK_SIZE_PARAM_IS_A_RESERVATION, somewhat similar to my rants.
These don't help all the billions of lines of existing code.
Eric
.
- Follow-Ups:
- Re: Thread Stacks
- From: Andy Glew
- Re: Thread Stacks
- From: robertwessel2@xxxxxxxxx
- Re: Thread Stacks
- From: Jeremy Linton
- Re: Thread Stacks
- References:
- Re: Thread Stacks
- From: Jeremy Linton
- Re: Thread Stacks
- Prev by Date: Re: Thread Stacks
- Next by Date: Re: The memory wall problem: a quantiative approach?
- Previous by thread: Re: Thread Stacks
- Next by thread: Re: Thread Stacks
- Index(es):
Relevant Pages
|