Re: Library design for canceling a running operation?



David Schwartz wrote:
On May 15, 5:05 am, Jef Driesen <jefdrie...@xxxxxxxxxxxxxxxxxxx>
wrote:
.
Isn't the reference counting supposed to solve this problem?

Sure, but you're over-simplifying.

int
main ()
{
cancel_t *cancel;
cancel_create (&cancel, ...);

/* Increase the refcount for the worker thread */
cancel_ref (cancel);

pthread_t thread;
pthread_create (&thread, NULL, worker, cancel);

So, right here you passed the 'cancel' object to another thread at
super-high cost (creating a thread). If you're going to create a
thread for every event, avoiding a fast lock is silly.

Now, if you're not going to create a thread, you're going to dispatch
to an existing thread. That's going to require taking some lock. While
you have that lock, you can just bump the cancel count without an
atomic operation.

I won't create a thread for every single sub-operation, but rather one thread for the entire series of operations (downloading, processing, etc). But your comment remains valid of course.

Since downloading data is typically a one time operation (cfr a digital camera), I won't have an existing thread available for doing the work. At least that's how I would do it. But users of my library should be free to do things differently of course.

/* This could be a GUI event loop. */
while (worker_thread_is_running) {
if (user_requested_cancel)
cancel_set_cancelled (cancel, 1)
}

cancel_unref (cancel);

Your 'while (worker_thread_is_running)' loop is going to have to
acquire some lock to check on the status of the worker thread, right?
So what your code does is it *releases* a lock and the does an atomic
operation. Do you see how silly that is? Why not just do the cancel
decrement while you hold the lock, then the overhead of an atomic lock
is not needed.

If you are choosing an atomic operation to avoid the overhead of a
lock, you are losing on all fronts. You get the overhead of the lock
with operations like checking on the status of another thread and then
you do the atomic operation *anyway*.

True, but this is not the place where I want to avoid the lock overhead.

BTW, acquiring the lock for checking the status of the worker thread will probably never block because the worker thread will only have to acquire that lock once to set the is_running flag to false.

void *
worker (void *arg)
{
cancel_t *cancel= (cancel_t *) arg;

device_t *device = NULL;
device_open (&device, ...);

/* Attach the cancel object to the device handle,
which increases the refcount again, and decreases
automatically when the device handle is destroyed.*/
device_set_cancel (device, cancel);

At some point, this function must make the cancellation object visible
to another thread, so it's going to have to take some lock either here
or later.

I'm not really sure I understand what you are saying here. The device_set_cancel function would simply increase the refcount and store the pointer in the device handle. The main thread still has a reference to the object, to be able to set it to the canceled state.

/* We don't need the cancel object any longer */
cancel_unref (cancel);

So why waste the overhead of an atomic operation here? We just took a
lock.

Increasing the refcount in the previous step would lock, increase the refcount and immediately unlock again. So I won't have the lock anymore.

/* Call a slow library function that
support cancellation through the
attached cancel object. */
device_read (device, ...);

device_close (device);

}

In this case, the cancel object remains alive for as long as someone
holds a reference to it, regardless of whether that's the main thread,
the worker thread or the device handle. Even if I wouldn't wait for the
worker thread to finish, the cancel object remains valid.

Right, but the whole point of the atomic operations was to avoid
having to take a lock. As your code shows, you have to take a lock
*anyway* to coordinate the thread that calls the function with the
thread that cancels it. So the atomic operation is pure overhead. You
have to hold a lock anyway, so just do a normal increment/decrement
while you hold the lock.

All true, but the point where the atomic operation would matter most in my code is inside the device_read() function. This is the function that does the slow I/O and thus will need to check the attached cancel object repeatedly. This is also where the timing requirements are located. The ref/unref'ing needs to be done only a few times.

device_read (device_t *device, ...)
{
while (!cancel_get_cancelled (device->cancel)) {
/* do some slow stuff */
}
}

But at that point I don't have the lock already. So I'll have to take the lock or use an atomic operation.

I think that the main issue is that you seem to assume that the same mutex can be accessed both by the application (for monitoring the worker thread) and the library (for the cancel object). But that's not the case. The cancel object would be implemented as an opaque object, which means an application won't be able to access its internals, including the internal mutex. But on the other hand, if the application already uses a mutex, my library doesn't know anything about that one. How would it know whether the application has a lock or not?. Locking the app mutex for the entire duration is obviously not a good idea either. So how can this work?

I think that even when I implement the cancel object with a mutex instead of atomic operations, the application will still need to have its own mutex to monitor the worker thread. Or am I missing something?
.



Relevant Pages

  • Re: Library design for canceling a running operation?
    ... right here you passed the 'cancel' object to another thread at ... That's going to require taking some lock. ... decrement while you hold the lock, then the overhead of an atomic lock ... If you are choosing an atomic operation to avoid the overhead of a ...
    (comp.programming.threads)
  • Re: PreFast and NDISUIO
    ... acquire the cancel lock ... since the cancel routine is called with the cancel lock held, ... cancel spinlock for any type of synchronization. ... A driver must never call IoCompleteRequest while ...
    (microsoft.public.win32.programmer.kernel)
  • Re: IRP cancelation with IOM supplied device queue
    ... Another important role for the lock (cancel spinlock or your private lock) ... > Yes, in IOM queue, it's superflous. ...
    (microsoft.public.development.device.drivers)
  • Re: IRP cancelation with IOM supplied device queue
    ... > Another important role for the lock (cancel spinlock or your private lock) ... > private queuing, the synchronization for cancellation will be taken care ... in IOM queue implementation... ...
    (microsoft.public.development.device.drivers)
  • Re: PreFast and NDISUIO
    ... Remove the IRP from the pending structure _before_ releasing the ... > your own lock before releasing the cancel spinlock, ...
    (microsoft.public.win32.programmer.kernel)