Re: Thread pool; trouble dividing tasks into work items.



Wow, thanks; this is a really great response with a lot of good
advice. I really appreciate it. I have some remaining questions about
the network threads, and about the deferral queue + connection pooling
implementation:


On Jan 4, 11:15 am, David Schwartz <dav...@xxxxxxxxxxxxx> wrote:
On Jan 4, 4:36 am, JC <jason.cipri...@xxxxxxxxx> wrote:
[snip]
When a task cannot make further forward progress until it's possible
to read from a socket, do the following:

1) Set a timer to expire the task should the socket not become ready.

2) Register your timer and socket with the network monitoring thread.

3) If no action is possible, the timer will fire and you can fail the
task.

4) If the network monitoring thread gets a 'select' hit on the thread,
it will fire your timer for you, and you can succeed the task.

And for sending, the connection is typically going to be busy:

If the connection is typically
going to be busy, and you can't send the data when you're ready anyway
(because you're waiting for previous operations) then it doesn't much
matter.

Hrm; currently I have a single TCP connection to the remote machine
with a single socket and a "session ID" that I assign to data items to
associate groups of related data being sent. I have full control over
remote server implementation. I am already kind of doing the steps you
describe but I'm registering timers and session IDs with the main
timer queue, and using that to expire tasks.

From a threading standpoint, maybe it's better to establish a separate
connection and separate socket per "session"? That way a network
monitor thread can sit there and select(), and each work item can be
responsible for actually receiving data from the network, rather than
having the single network monitor thread receive and pass data to new
work items.

This reduces thread context switches (right?) but the trade off is
multiple threads receive data on the same physical connection at the
same time (I don't know enough about networking to know if this is a
performance hit on a lower protocol level), and also using separate
connections seems like it could hurt send performance. The send
connection is busy and I'd much rather have threads do something else
and wait their turn than all rush to send data at the same time at a
fraction of the total bandwidth each.

*If* I switch to separate connections per session, does this seem
reasonable for a send implementation:

1. Create a work item to send data over network, add to thread pool.
2. That work item "can't run" if another work item is currently
sending data. Put it on the deferral queue.

That way, I eliminate the network send thread, and threads in the
thread pool are sending data, one at a time. I'm not sure if this
provides an advantage or not. It does seem to add some consistency to
the application, though, by moving more work to the thread pool
instead of specialized threads.

Then, the only special thread I need is the network input monitoring
thread, which maintains a list of sockets that it's looking for data
on, and creating work items to read data from those sockets as
necessary.

I like the idea of using separate connections and having the work
items read the data from a stability standpoint as well. If something
goes wrong on the remote machine (you never know) and it, say, sends
some corrupt data for a session, it won't affect any other open
sessions or running threads on this machine.

I'm not sure why I'm always coming back to the networking side of
this. I may be overthinking things.

The problem with having a single "send data" thread is that you force
a context switch to that thread every time you want to send data. If
any thread that needs to can simply acquire a lock, queue the data,
and if possible send it, then context switches will be reduced. This
may or may not be significant.

Ah, good idea. I think, unless I'm missing something, then, that
having work items in the thread pool be responsible for network sends,
but putting them on the deferral queue if another work item is already
sending, ends up providing the same benefits, and also is nicely
covered with the existing thread pool logic. I might be wrong.


2. There is a significantly lower number of database connections
available than there are threads in the thread pool. There *may* be a
lower number of connections available than data items being
simultaneously processed by that is something I have no idea about
yet. B3 and B4 can both happen at the same time, but does it actually
make sense to separate B3 and B4 (a database A lookup followed by a
database B lookup, these have no effect on eachother) in to separate
work items? Or does it make more sense to have a single work item
perform B3 then B4?

It sounds like each of these tasks are two tasks. When you do a
database lookup, don't you have to first send the query and then
process the response? These sound like four work items, not two. Or is
it required that a thread wait for the data? (If so, try to get around
that requirement. It will force *massive* extra context switches.)

The database access API that I am using may lend itself to that, but
my data entity layer abstracts the fact that a database is being used
away from the application, and provides lazy loading for certain
fields. I can't split it up like that, but I personally have explicit
knowledge of when lazy database access is possible.


If the item cannot run:

1) Check if there's already a deferral queue for this type of item. If
there is, put the item on the tail of the deferral queue.

2) If there isn't, create a deferral queue for this type of item. Put
this item on that queue. Arrange to have the queue serviced (or the
head item from the queue serviced) when possible.


This is a really great idea. I didn't think of it that way at all,
thanks.


So, to return a database connection to the pool you:

1) Put the connection in the pool.
2) Check if there's a deferral queue waiting for connections.
3) If so, mark the queue ready. (Or move the head item from the queue
to the regular queue.)


Would you recommend this approach (the connection pool actively
changing queue status) over something like having work items perform
quick checks to see if any connections are available in order to
determine if they can run or not?

I suppose, I could share a semaphore between the connection pool and
deferral queue, having a worker thread check to see if it can acquire
the semaphore, and if it can, run a work item from that queue using an
available connection. Then the connection pool can release the
semaphore when a connection becomes available.

This means a "deferral queue" consists of a "type", a queue of work
items, and a semaphore, and shares that semaphore with whatever is
managing the resource that is holding up the deferral queue in the
first place. Is that a typical approach?


Good luck.

Thanks again! I owe you a virtual beer or something. :-)

Jason
.



Relevant Pages

  • Re: Thread pool; trouble dividing tasks into work items.
    ... Another common pattern is to have a thread pool with a 'get more work' ... If the queue of tasks is empty, ... given that there's only a single network connection to the ...   loop { ...
    (comp.programming.threads)
  • Re: Detecting network connection
    ... network card has an established connection. ... to use CreateMsgQueue() to create a special message queue, ... disconnects from the 'network' ... you'll have to do this in a thread to allow your user interface ...
    (microsoft.public.dotnet.framework.compactframework)
  • Re: Detecting network connection
    ... I agree with Thore. ... > network card has an established connection. ... > to use CreateMsgQueue() to create a special message queue, ...
    (microsoft.public.dotnet.framework.compactframework)
  • Re: Delivery to the following recipients has been delayed.
    ... >The connection does not work from inside this network but I can get to it ... >> Is it every queue that's showing a delay or just some ... >All queues are showing delay's. ...
    (microsoft.public.exchange.admin)
  • Re: Connecting to a remote thermal printer
    ... remote location I changed the IP address and gateway ... There is one parameter on the create out queue that I ... the remote system name as *INTNETADR and the connection type is *IP ... I can access my network from a PC at the remote location so ...
    (comp.sys.ibm.as400.misc)