Re: very strange pthread problem (solved)



Mark wrote:

Jim Langston wrote:

Comments in line.

"Mark" <spamrs_goodbye@xxxxxx> wrote in message
news:kFaFg.13312$Z1.6964@xxxxxxxxxxx

We have a single thread system at work that
we're extending so some threads can do some
of the hard work on some other cpus.

I use a boss worker design, a main thread does the majority of the
processing, which is the code that is difficult to run
in parallel because of the nature of the processing. Then I use mutexes
to pause the main thread and broadcast to the 4 worker threads to start
that have been paused in a condition wait since the last processing
cycle. They all have to run to completion and pause and I use
pthread_cond_signal to get the main thread out of pause. By testing, I'm
convinced this all works and that the main and the workers are
alternately working and pausing.

The worker threads need access to a class member function or two.
Since the thread is a void* function, I pass the address of the
class to it via the 4th argument in the pthread_create which
is a pointer to a user defined structure:
struct {
int thread_id;
Class* myclass;
} THREAD_ARG;

the thread then casts that data back to get the thread id and
class pointer so it can get to the various class
functions and by knowing its id number each
thread can find its list of data to work on. Each thread
is unlocked from the shared mutex so they can all be in the
common work function. I keep the threads workqueue distinct by having a
STL vector so each index into the vector points to the list of pointers
to the data the thread should work on, sort of like this:

vector<STRUCT> workqueue;
..
workqueue[thread_id].dataset;

dataset --> list<data*>*

so thread 0 can get its data from
workqueue[0].dataset->begin(), ..., and so on towards the end.
I figured this would prevent interactions between threads
since each has its own container. The main thread
puts 1/4th the data on each dataset list for a quad cpu host
or 1/2 for a dual....

Are you aware that std::vector is by no means thread safe? If you make
any changes to the vector (push_back, delete) all interators and pointers
into
the vector become invalid. To test to see if this is your problem,
consider using an array instead to at least test.

No changes are made to the vector at all by the worker threads,
the vector is like this (generic names being used):

struct {
list <Class*>* indata;
list <Class*>* outdata;
} THREAD_DATA;
vector <THREAD_DATA> workqueue;

at startup, the vector is loaded for four threads

workqueue[0].indata = new list<ClassX*>;
workqueue[0].outdata = new list<ClassY*>;
...
workqueue[3].indata = new list<ClassX*>;
workqueue[3].outdata = new list<ClassY*>;

So at the level of the vector nothing is really happening,
it's just storing the pointers to the STL lists and the vector
is only read from. The "outdata" STL is pushed back onto but
each list is unique to each thread via the thread id#

I've had luck in the past when I wanted to use std::queue to pass
messages in a thread by writing a thread safe wrapper around it and using
locks around any call that could conceivably change the queue (push, pop,
etc..).

It really depends on how you're using the std::vector of course and it
may
not be a problem for you. But if one thread is push_back to the vector
while another is trying to read the data, you'll get undefined behavior.

As above no pushing happens to the vector, I liked the vector
because it's easily grows as the software is run on different
boxes with different numbers of cpus

The problem is, the old software works fine (I can
manually turn off the threading feature and run it
the 'old' way as one thread. But one of the many
child functions in the main function is working
inconsistently in threaded mode than in legacy mode.

"inconsistenly" isn't much data to go on. Does it produce garbage
output?
Is it crashing? Does it say there is no data when there is? Have you
tried logging the threads to text files to see where the inconsistantcy
is being introduced?

The numbers calculated are different, but no crash occurs.
Kind of like if ptr->foo(A, B) returns 0.112 when it's
running correctly but 0.2111 when you do the same thing via
the threads


Some of the child functions used static variables for
speed reasons, but they're setup so they can be
changed back to local variables easily by recompiling
so I made sure those were all back to being local
variables --> same result.
So right away I thought it was some kind of thread
interaction so I made another mutex and wrapped
it around the work function so only one thread
can be in there at a time --> same result.
To make really sure, I ran it so that pthread_create
was only making one worker thread --> same result.
Then I moved the whole thing to a single cpu Linux
machine and compiled and ran there with one worker
thread. On that box
it should be impossible for simultaneous interaction
since there's just one cpu and one thread -->same result.

So basically, in the thread function, the threads all
go into a "work" function, and in the work function is
a child function doing some mundane math (it computes
dot products and stuff to figure out two signals
seen by antennas can come from a single source) and
this subfunction somehow knows I'm using pthreads
and breaks but then works fine when I'm not.

Again, "breaks" isn't enough to go on.
It even breaks when there's just one cpu and there's
only one pthread running in the work function. I've
looked over the work function and it looks thread safe,
all the operations are reading and not writing data
in there (there is writing but only to each thread's
local variables). There's writing of pointers to
the STL lists via push_back but there's a unique
list for each thread and that's not the data that
look bad anyway.

The only idea I had was stack overflow in the thread
so that data sometimes gets corrupted when running
with the thread but not with the original code
but when I run getstacksize() on linux it says 10+ megs
which is huge. I was thinking the corruption only happening
sometimes might be due to the stack usage changing because
the function can be left by various returns, so the lower
you go the more stack is used so maybe the screwed up
one was the path through the function that got low enough
down.

The threads are all DETACHED, not JOINED
because the threads stick around for the whole runtime
and don't get destroyed and recreated. When developing
all this I setup all the mutexes to be ERRORCHECK type
and found no error codes occurring from the pthread
calls. I've since changed them back to default/NORMAL
for faster speed, however I check the rc codes for
all the thread calls for != 0. Another thing
is that I've run the software on Solaris X86 (Sun V40Z)
and Linux (single cpu embedded board computer) using
Fedora Core 4 and get the same behavior.

I was just wondering if anyone has some ideas.
Mark

Update, my problem is solved, there was a minor logic bug
in an if statement that was copied from the
non-threaded version to the new threaded version,
Mark
.



Relevant Pages

  • Re: very strange pthread problem
    ... The worker threads need access to a class member function or two. ... STL vector so each index into the vector points to the list of pointers ... it's just storing the pointers to the STL lists and the vector ... Then I moved the whole thing to a single cpu Linux ...
    (comp.programming.threads)
  • Re: very strange pthread problem
    ... The worker threads need access to a class member function or two. ... STL vector so each index into the vector points to the list of pointers to ... Then I moved the whole thing to a single cpu Linux ... the STL lists via push_back but there's a unique ...
    (comp.programming.threads)
  • very strange pthread problem
    ... The worker threads need access to a class member function or two. ... changed back to local variables easily by recompiling ... Then I moved the whole thing to a single cpu Linux ... the STL lists via push_back but there's a unique ...
    (comp.programming.threads)
  • Re: [patch 0/7] [RFC] SLUB: Improve allocpercpu to reduce per cpu access overhead
    ... for example require back pointers. ... cpu area then you may corrupt another cpus per cpu area. ... List insert and delete is only allowed on local CPU lists. ...
    (Linux-Kernel)
  • SCSI bus reset with Adaptec 29320ALP and Eonstor RAID
    ... I am trying to use a 1.5TB Eonstor raid array with FreeBSD 7.0, but I don't understand whether it is the raid or the scsi card or something else that is causing the computer problems when accessing the raid. ... CPU: IntelXeonCPU 3.20GHz ... Kernel Free SCB lists: ... Sequencer Complete DMA-inprog list: ...
    (freebsd-stable)