Re: integer pthread_t vs. DCE threads
- From: David Schwartz <davids@xxxxxxxxxxxxx>
- Date: Mon, 02 Jul 2007 19:56:17 -0700
On Jul 2, 2:17 pm, Hallvard B Furuseth <h.b.furus...@xxxxxxxxxxx>
wrote:
David Schwartz writes:
On Jul 2, 6:09 am, Hallvard B Furuseth <h.b.furus...@xxxxxxxxxxx>
wrote:
I'm not sure why thread-locals are so wrong though.
They are wrong because they associate state with a thread even though
that state is not logically associated with that thread.
You seem to think of a thread as something different or maybe more
general than I do:
A thread is simply an execution vehicle. It runs the code. Anything
that applies to the code currently running belongs, logically, to the
thread.
They save can save us from passing a heap of extra parameters or
structs around to routines which will rarely need it.
Right, but that is state that is associated with whatever that thread
is doing right now, not what that thread itself. Storing such state as
thread local storage means that which thread is doing this particular
job cannot be changed, which is totally illogical.
Nope, totally logical. The thread holds the stack. The stack holds
much of the state. Disassociating the thread from the task can be
useful, but that's extra work, not the default way it works. At least
to my point of view. Though admittedly I haven't thought much about it.
You should generally try to minimize how much state you keep on the
stack. There are a large number of reasons, but the primary reason is
that the stack is difficult to impossible to sensibly virtualize and
manipulate.
Worse, you can't stop what you're doing and keep your stack. So you
can't put things on the stack if you need to keep them while you stop
what you're doing and come back to it later. It's nearly impossible to
save a stack for later.
If you later need to do more things at once than you can sensibly run
threads, you are stuck.
Suppose, for example, you've stored some state in a thread-local
variable. Somewhere in the call chain, the library needs to put down
what it's doing and pick it back up later. But it can't, because that
'later' might be in another thread. So the library has no choice but
to block this thread until 'later' comes, because the work must be
done with this thread because some of its state is in a thread-
specific variable. UGH.
I don't get it. The task has to save any state it has on the stack, to
release the thread and so another thread can pick up where it left. Why
is it harder to save and reset thread-local variables?
The whole point of thread-local variables is so that you can call
function B and not have to pass things to it from function A. If you
pass stuff around, there's no need for it to be thread local. So how
can function B know what it needs to save when the whole point of the
thread local variable was to hide that?
If you use a lot
of them, that's an UGH. But the same can be said of a lot of info on
the stack to save.
Exactly. Keeping state that's associated with a job on the stack is
asking for trouble too. The stack and global variables and thread-
specific data should be limited to things that are nastier done some
other way.
For that matter, I'd partly say the same about errno. Threads didn't
exist when it was invented, so globals were not such a design bug.
Though signals did, which are. Anyway, errno saves a parameter for
routines what return non-integers, and thus simplifies and probably
speeds up execution when there is no error at the expense of when there
is an error. The bummer was when that became an excuse for EOF which
could be == a character value and such things.
Except that 'errno' is a nightmare even without threads. One function
can change it such that you don't get the error you expect from a
previous function. This is one of the problems with globals.
Yes, you have to pick it up right away and not at all. And do a lot of
save/restore. for low-level functions, I guess. And its definition is
all wrong with the 'a function may update or not as it pleases'
definition.
Yes, we're stuck with 'errno'. But it's a good example of why TSD can
be a bad idea in some cases. It's also a good example of why TSD can
be a good workaround to a thorny problem.
I have a dark suspicion that this was a feature and not a bug. "Error
checking, what's that?" Then, if the program eventually got into enough
trouble to actually notice, there would at least be _some_ error
indication which with a bit of luck would say what was wrong.
Except their might not be because the error code could be lost or
(potentially worse) replaced with a meaningless error code. This is a
better argument to make in support of OpenSSL's error stack. Though
I've already bitched about the problems with that.
Pretty much every design choice has its good and bad points. That's
why good programming is hard. I often take a contrary view to make
sure the other side gets aired. It doesn't mean that what I'm
complaining about is necessarily wrong in some particular case.
So with thread-specific globals, you get all the problems of globals,
plus the limitation that you can't switch threads if you need to. It's
kind of a lose-lose.
The speed up of execution is bone-headed optimization. I doubt any
alleged savings is even measurable, and probably could be achieved
without the problems of a global.
Possibly, though I think you are speaking from the wrong decade.
And it's an "optimization" of the interface as well as speed.
Fair enough. It's too bad the C language didn't have some sane way to
have multiple return values from a function or an error reporting
mechanism like C++'s, although that has its problems too. (Surprise,
surprise.)
OpenSSL's thread error stacks are probably a good example of such
optimization gone wrong. The reports are seldom useful because you
have to sift through reams of non-errors from previous operations.
I'll take your word for it. Don't even know what you are talking about.
OpenSSL has an "error stack" for each thread. Errors accumulate on the
stack, and you can check the stack to get a very detailed list of what
went wrong. The problem is that "soft errors" accumulate on the stack,
so when you go to get the stack, it's full of non-errors. You can
clear the stack all the time, but that's a pessimization.
It really is worth it do get
threading right, and alarm bells should go off in your head if thread-
specific data is used to hold data that is not logically associated
with the thread. *Especially* if it is logically associated with what
the thread happens to be doing at that time.
What _is_ logically associated with a thread, in your view?
Whatever the thread is *currently* doing. If it holds a lock, that
lock is associated with that thread. If it's working on behalf of a
particular client, that client is associated with that thread for as
long as it continues to do so. Ideally, light short-term associations.
It's generally best to consider threads fluid and interchangeable.
You don't want to keep things associated with threads when you don't
have to because this ties the scheduler's hands and creates extra
context switches when the 'wrong' thread is running.
DS
.
- Follow-Ups:
- Re: integer pthread_t vs. DCE threads
- From: Hallvard B Furuseth
- Re: integer pthread_t vs. DCE threads
- References:
- Re: integer pthread_t vs. DCE threads
- From: David Schwartz
- Re: integer pthread_t vs. DCE threads
- From: Hallvard B Furuseth
- Re: integer pthread_t vs. DCE threads
- From: David Schwartz
- Re: integer pthread_t vs. DCE threads
- From: Hallvard B Furuseth
- Re: integer pthread_t vs. DCE threads
- From: David Schwartz
- Re: integer pthread_t vs. DCE threads
- From: Hallvard B Furuseth
- Re: integer pthread_t vs. DCE threads
- From: David Schwartz
- Re: integer pthread_t vs. DCE threads
- From: Hallvard B Furuseth
- Re: integer pthread_t vs. DCE threads
- Prev by Date: Re: integer pthread_t vs. DCE threads
- Next by Date: pthread priority doesn't work in Linux
- Previous by thread: Re: integer pthread_t vs. DCE threads
- Next by thread: Re: integer pthread_t vs. DCE threads
- Index(es):