Re: lazy pthread_cond_timedwait (linux)



antoine.bouthors@xxxxxxxxx wrote On 07/30/07 15:32,:
I'm using pthread_cond_timedwait to let a child thread do some
computation for a given amount of time. pthread_cond_timedwait often
takes several seconds to several *minutes* (!!) delay after abstime to
unblock, even though it had plenty of opportunities to lock the mutex.
(Code snippet below.)
Some details :
- processes are run by MPI on several SMP (2 CPU) nodes (pretty much
like the MPI with pthreads example). I've tried with one process per
CPU and one process per node, it always happens. Even with only one
process all in all. So MPI does not appear to be the cause of that.
- the child thread (Simulation) *does* (printf-proven...) unlock the
mutex a lot of times after abstime has passed, but the parent thread
(ControlLoop) does not seem to want to lock it. Looks like it prefers
to take its time...

So now to the questions:
- Why does it behave this way ? Shouldn't pthread_cond_timedwait try
to take back the mutex right after abstime has passed ? (and not 2
minutes after that)

It should start *trying* to re-lock the mutex "soon"
after abstime passes. The amount of time it take to
succeed in re-locking is unpredictable.

- Does calling shed_yield() have any use here ?

Probably not, but possibly. Much depends on the nature
of the thread scheduler, on the relative priorities of the
threads, and maybe on the phase of the moon.

- It looks like adding usleep(1) in Simulation() after unlocking the
mutex solves the problem. But I find it odd to have to force the
thread to go to sleep to allow the other to run. What's the point of
scheduling, then ?

This supports my suspicion: Your Simulation thread
unlocks the mutex but then re-locks it almost immediately.
The mutex is unclaimed for maybe a few hundred instructions
at most -- and that's probably not enough for the other
CPU to notice the unlock, take ControlThread ouf of stasis,
load up its MMU registers and stuff, and context-switch out
to userland where it can start contending for the mutex.
By the time all this running around is done, the Simulation
thread (already running and "hot" on its own CPU) has had
plenty of time to re-acquire the mutex.

How to fix it? Well, why does Simulation hold the
SimMutex for such a long time, thus keeping ControlThread
from making progress? I can't see a reason for holding
SimMutex for macroscopic time -- but then, your code is
so heavily snipped that I can't even see what SimMutex
is supposed to protect, so there's lots I may not know
about.

--
Eric.Sosman@xxxxxxx
.



Relevant Pages

  • Re: lazy pthread_cond_timedwait (linux)
    ... takes several seconds to several *minutes* delay after abstime to ... even though it had plenty of opportunities to lock the mutex. ... CPU and one process per node, ... The scheduler is that of Linux. ...
    (comp.programming.threads)
  • Re: is it possible to implement condition variables using mutex and/or semaphore?
    ... time specified by abstime passes (that is, ... the real time at which a thread is re-scheduled after mutex acquisition, ... absolute time specified by abstime has already been passed at the time ... This would require that ETIMEDOUT is returned if the timeout expires ...
    (comp.programming.threads)
  • Re: is it possible to implement condition variables using mutex and/or semaphore?
    ... I would susggest to change it into: ... time specified by abstime passes (that is, ... wait(cond, mutex, timeout) { ...
    (comp.programming.threads)
  • Re: lazy pthread_cond_timedwait (linux)
    ... takes several seconds to several *minutes* delay after abstime to ... unblock, even though it had plenty of opportunities to lock the mutex. ...
    (comp.programming.threads)
  • Re: race on multi-processor solaris
    ... Let's say I am OS providing the mutex ... >little hardware contention, but lots of software contention (ie lots ... I'm not sure I'd call that "lots of cycles"; ... thread doesn't use CPU; so CPU cycles are available to do ...
    (comp.unix.solaris)