Re: Recursive mutex that can be waited upon (pthread)



TGOS wrote:
In article <44cbbce0$1@xxxxxxxxxxxxxxxxxxx>,
Dave Butenhof <david.butenhof@xxxxxx> wrote:

Yeah, well this is the stupidest aspect of Java's threading support, so you're better off NOT trying to copy its mistakes.

I see nothing stupid in it. It is working just fine in the Java world.

No, but not all code shows a problem and when it does appear the diagnosis is so difficult that it's quite likely to be written off as a freak incident that can't be reproduced. (I'm sure you've never had one of them. I certainly haven't. [It's OK, I've got my fingers crossed.])

The important point is that safety depends on ASSUMPTIONS about how the interacting methods are coded that cannot be enforced or even tested by the compiler (or runtime environment) given the information it possesses. That's an enormous hole.

If the "outer" method has no dependencies on the state of data across a call to an "inner" method, or if the inner method doesn't change the state, or changes it only in ways anticipated by the caller; AND if the outer method never calls the inner method with any "intermediate state" (broken invariants) that would disrupt the inner method if called independently, then you're OK. But each one of those phrases is a disaster waiting to happen.

The problem with all of those conditions is that you can only be sure you're safe if the method implementations are tightly coupled. If I'm coding a set of tightly coupled methods I'd never use recursive mutexes because they represent only a minor convenience with substantial costs; but that's certainly a personal bias to which you need not subscribe. As long as you're positive that you're avoiding all of the real problems.

I guess some people here have a misunderstanding of how I'm using the mutex. While you can use a mutex to avoid that data is changed, for me having a mutex does not mean that data is not changed, it only means that data is not changed by a different thread. So no matter where in my code I have a mutex locked, that does not mean that functions I call may not change the data, it only means that no other threads my change the data. My own thread may of course change the data, hence functions I call may want to change the data and if they do so, they must be sure that these changes are atomically, hence they must lock the object and they simply can't rely that I locked the object before -> thus I need recursive locks.

Perhaps "some people" have misunderstood, but I understand quite well. I also understand that when two decoupled modular routines (whether C routines or C++ or Java methods on the same object) interact, you're generally safer assuming they're independent. If you're going to assume that the outer method presents a viable initial state for the inner method, and that the inner method can't disturb any state assumed by the outer, you're committing to tightly coupled design and maintenance. There's no "OK, I need a method with this signature, and I can fill in the blanks later or let someone else do it because it doesn't matter HOW it's done". If you're aware of that, and it's OK for you, great.

Without recursive locks, my code would end up like this:

void onEvent() {
LOCK(myObject);
// Do zillions of things
ULOCK(myObject);
}

And that's how every other thread would look as well, since they are all calling functions along the way that may change the data or not and if these all perform no locking on their own, I have to perform the locking all myself and then I have to lock the object the whole time.

But this again destroys any purpose of threads. No two threads could ever run at the same time. Then I could as well throw out threads of my code and just use a single thread going through an event queue. Might make no difference on a single-CPU/single-Core machine, but with multi-core and multi-CPU systems, that makes a *huge* difference.

Since you brought it up, that's another problem with many if not most uses of recursive mutexes. It's awfully easy to inadvertently extend lock regions ("critical sections") further into your code than "anyone ever imagined", seriously reducing or even precluding concurrency. You can avoid this risk, like all the others, by design; but unlike with non-recursive locking you can't do anything at all about the problem with strictly local/modular design because you've got a maze of dependencies twining throughout the code.

Java's mistake, to be clear, is not necessarily the "full release" wait, but the fact that the language has no support at all for invariants and cannot possibly detect any problems.

Give me an example where you'd need an invariant. Despite the fact that if you really need a non-recursive lock, that is very, very easy to be implemented in Java, thanks to the synchronized statement.

The "synchronized" statement in Java is no more non-recursive than the synchronized method attribute. Or at least, in all literature I've seen the semantics are equivalent, just more localized.

If you're waiting, you have a predicate condition on an invariant. If your intent is to make the current thread sleep for some arbitrary time until some other component feels like waking it up, just for the heck of it, you might as well terminate the thread. Otherwise, there's a predicate expression describing the desired state of shared data (an invariant) under which the thread will continue.

And detecting problems, I don't know of any system that can do that. That's usually what a programmer does in a debugger ;-)

With primitive concurrent languages that don't understand predicates and invariants, yeah. That is, for the problems you actually manage to provoke yourself, or that prove reproducible at customer sites. (But they're more often written off as non-reproducible one-time "freaks".) You can do better with some less-common languages, but most of the time that's not a viable solution. So, yes, "that's usually what a programmer does". Doesn't mean that's how it SHOULD be. When the Java team grafted thread syntax and semantics onto their language they COULD have done better. They didn't. I'm disappointed. You're right that people "get by". But then, we could be "getting by" without objects or even high level languages. Doesn't mean I should want to.

Of course there's another more basic problem with recursive mutexes. You usually use them when you don't know the locking requirements of the routines you call (or that call you).

Correct. That is the whole point of independent software design by multiple developers. See, the issue is you are thinking too much in C right now. In C you have got data and you have got mutexes, both are independent data objects you can pass around independently and you can handle them independently.

No so in our code. In our code a mutex is part of an object, every object has a mutex (though it's not initialized before the mutex is used for the first time), so by passing an object around, you also always pass a mutex around to lock on that object.

"Data invariants", known and consistent states of the shared data protected by a mutex, are generally only defined at the point where a lock is acquired, and at the point where it's released. A queue header, for example, must point at a valid queue entry containing a valid queue link. But sometimes those conditions do not (and cannot) hold in the middle of the lock region. Certainly they WILL not for the queue insert or remove operations. IF you call another method on the queue object in the midst of an insert or remove, then the method you call must be carefully coded to recognize that the invariant may be broken; that the queue header might point nowhere, or the object to which it points might not have a valid link. You can DECIDE not to do that, you can sometimes convince yourself by testing that the current code doesn't HAPPEN to do that; but those are quite different from saying the problem can't occur.

Can't happen to you? Great programming discipline and design process. But if so, you ought to know that it's not always trivial to accomplish; even when data is shared only within a single object. You can argue all you want that it's not a problem for YOU, but I'm only interested in making sure that everyone understands how it CAN be a problem of which they should be aware.
.



Relevant Pages

  • ktr/alq/witness failure.
    ... ref4# sysctl debug.ktr.mask=0x2fffffff ... 14 vm page queue mutex -- ... ATA disk bioqueue lock -- ...
    (freebsd-current)
  • Re: crash via vm_page_sleep_if_busy() and contigmalloc
    ... and unlock its mutex. ... > page queue mutex was relinquished. ... I think callers need to either lock the vm_object in every case ...
    (freebsd-hackers)
  • crash via vm_page_sleep_if_busy() and contigmalloc
    ... and unlock its mutex. ... page queue mutex was relinquished. ... from other callers of vm_page_sleep_if_busysuch that they know the ... I think callers need to either lock the vm_object in every case ...
    (freebsd-hackers)
  • Queue can result in nested monitor deadlock
    ... I think there's a slight design flaw in the Queue class that makes it ... The problem is that the mutex ... so you are subject to nested monitor deadlock. ... "Optionally takes a lock to ...
    (comp.lang.python)
  • [patch 6/8] mutex subsystem, core
    ... mutex implementation, core files: just the basic subsystem, no users of it. ... straightforward mutexes with strict semantics: ... + * Block on a lock - add ourselves to the list of waiters. ...
    (Linux-Kernel)