Re: LAM/MPI MPICH-2 Compatibility



In article <1176525944.853140.172830@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>,
<ananghudaya@xxxxxxxxx> wrote:
Hi Massingill,

Here is the error code that I have obtained:

LAM 7.0.6/MPI 2 C++/ROMIO - Indiana University

MPI_Recv: process in local group is dead (rank 1, MPI_COMM_WORLD)
MPI_Recv: process in local group is dead (rank 2, MPI_COMM_WORLD)
MPI_Recv: process in local group is dead (rank 4, MPI_COMM_WORLD)
Rank (2, MPI_COMM_WORLD): Call stack within LAM:
Rank (2, MPI_COMM_WORLD): - MPI_Recv()
Rank (2, MPI_COMM_WORLD): - MPI_Bcast()
Rank (2, MPI_COMM_WORLD): - main()
Rank (1, MPI_COMM_WORLD): Call stack within LAM:
Rank (1, MPI_COMM_WORLD): - MPI_Recv()
Rank (1, MPI_COMM_WORLD): - MPI_Bcast()
Rank (1, MPI_COMM_WORLD): - main()
Rank (4, MPI_COMM_WORLD): Call stack within LAM:
Rank (4, MPI_COMM_WORLD): - MPI_Recv()
Rank (4, MPI_COMM_WORLD): - MPI_Bcast()
Rank (4, MPI_COMM_WORLD): - main()
MPI_Recv: process in local group is dead (rank 16, MPI_COMM_WORLD)
Rank (16, MPI_COMM_WORLD): Call stack within LAM:
Rank (16, MPI_COMM_WORLD): - MPI_Recv()
Rank (16, MPI_COMM_WORLD): - MPI_Bcast()
Rank (16, MPI_COMM_WORLD): - main()


Hope that it could help...

Not as much as I had hoped -- apparently I'm not as good at
interpreting these messages as I might have thought -- but I Googled,
and GWMF. Here's a useful-looking FAQ:

http://lam-mpi.miscellaneousmirror.org/faq/category6.php3

which says that, for example, the first "process is dead" message
means that some process tried to receive from process 1 and couldn't,
because process 1 had ended already. I'm not sure that's entirely
consistent with the messages about call stack, which seem to indicate
that process 1 was trying to do an MPI_Recv and failed because the
process to be received from had ended. But surely it's one or the
other, and maybe this will help a bit in narrowing down the problem?

Is it something to do with my usage of
MPI_ANY_SOURCE?

I dont't spot anything obviously wrong in the calls that use
MPI_ANY_SOURCE. At first I thought maybe it didn't make sense
to use this when your program logic appears to need to be able
to distinguish between messages, but then I noticed that you're
using tags for that. So okay.

I guess my suggestion at this point would be to try, as best you
can. to be sure that all the sends and receives match up -- i.e.,
for every MPI_Send there's exactly one corresponding MPI_Recv,
and vice versa. If it were my code, and a bit of rethinking about
matching sends/receives didn't find the problem, I'd start putting
in debug print statements to try to trace all sends and receives.

Hope this helps, and good luck.

--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.
.



Relevant Pages

  • using -ftracer stops buildworld at shutdown.c
    ... When the C runtime executes a longjmp, the flow of control passes back ... In the process, the stack (which ... prevented from being optimized by declaring them as volatile. ... *** Error code 1 ...
    (freebsd-hackers)
  • Re: Clear Stack
    ... To unwind the stack, just raise an exception that's handled by a procedure ... Reserve a dedicated error code for this ... In all your local error handlers, test for this explicit error code. ... That reserved error code will eventually find its way to your root ...
    (microsoft.public.vb.general.discussion)
  • Re: integer pthread_t vs. DCE threads
    ... that the stack is difficult to impossible to sensibly virtualize and ... the library needs to put down ... This is one of the problems with globals. ... replaced with a meaningless error code. ...
    (comp.programming.threads)
  • Re: Updated procstat(1)
    ... *** Error code 1 ... -stack_save(struct stack *st) ... +static void ...
    (freebsd-hackers)
  • Re: Class does not exist error
    ... > consistent with the comment in winerror.h for error 1411. ... VB reports "Class does not exist" because this is the interpretation of ... the number you've chosen as an error code to the constant vbObjectError. ... Unfortunately "Lindsay" seems to have lost all interest in this thread. ...
    (microsoft.public.vb.winapi)