Re: lazy pthread_cond_timedwait (linux)
- From: Eric Sosman <Eric.Sosman@xxxxxxx>
- Date: Mon, 30 Jul 2007 16:27:31 -0400
antoine.bouthors@xxxxxxxxx wrote On 07/30/07 15:32,:
I'm using pthread_cond_timedwait to let a child thread do some
computation for a given amount of time. pthread_cond_timedwait often
takes several seconds to several *minutes* (!!) delay after abstime to
unblock, even though it had plenty of opportunities to lock the mutex.
(Code snippet below.)
Some details :
- processes are run by MPI on several SMP (2 CPU) nodes (pretty much
like the MPI with pthreads example). I've tried with one process per
CPU and one process per node, it always happens. Even with only one
process all in all. So MPI does not appear to be the cause of that.
- the child thread (Simulation) *does* (printf-proven...) unlock the
mutex a lot of times after abstime has passed, but the parent thread
(ControlLoop) does not seem to want to lock it. Looks like it prefers
to take its time...
So now to the questions:
- Why does it behave this way ? Shouldn't pthread_cond_timedwait try
to take back the mutex right after abstime has passed ? (and not 2
minutes after that)
It should start *trying* to re-lock the mutex "soon"
after abstime passes. The amount of time it take to
succeed in re-locking is unpredictable.
- Does calling shed_yield() have any use here ?
Probably not, but possibly. Much depends on the nature
of the thread scheduler, on the relative priorities of the
threads, and maybe on the phase of the moon.
- It looks like adding usleep(1) in Simulation() after unlocking the
mutex solves the problem. But I find it odd to have to force the
thread to go to sleep to allow the other to run. What's the point of
scheduling, then ?
This supports my suspicion: Your Simulation thread
unlocks the mutex but then re-locks it almost immediately.
The mutex is unclaimed for maybe a few hundred instructions
at most -- and that's probably not enough for the other
CPU to notice the unlock, take ControlThread ouf of stasis,
load up its MMU registers and stuff, and context-switch out
to userland where it can start contending for the mutex.
By the time all this running around is done, the Simulation
thread (already running and "hot" on its own CPU) has had
plenty of time to re-acquire the mutex.
How to fix it? Well, why does Simulation hold the
SimMutex for such a long time, thus keeping ControlThread
from making progress? I can't see a reason for holding
SimMutex for macroscopic time -- but then, your code is
so heavily snipped that I can't even see what SimMutex
is supposed to protect, so there's lots I may not know
about.
--
Eric.Sosman@xxxxxxx
.
- Follow-Ups:
- Re: lazy pthread_cond_timedwait (linux)
- From: antoine.bouthors@xxxxxxxxx
- Re: lazy pthread_cond_timedwait (linux)
- References:
- lazy pthread_cond_timedwait (linux)
- From: antoine.bouthors@xxxxxxxxx
- lazy pthread_cond_timedwait (linux)
- Prev by Date: lazy pthread_cond_timedwait (linux)
- Next by Date: Re: lazy pthread_cond_timedwait (linux)
- Previous by thread: lazy pthread_cond_timedwait (linux)
- Next by thread: Re: lazy pthread_cond_timedwait (linux)
- Index(es):
Relevant Pages
|