[warped-users] Re: Warped v1.02 simulation aborts
Ezequiel Glinsky
eglinsky at sce.carleton.ca
Wed Nov 10 16:01:44 EST 2004
Hi Dale,
Thanks a lot for your reply.
>Has the application successfully processed non-stragglers prior to this
>error?
>...
>(So I guess it would have processed some events OK to have all of these
>elements in the list.)
Exactly. The simulation works fine if no straggler messages are received,
although it terminates with the described error if a straggler is
detected.
>> p4_error: latest msg from perror: Bad file descriptor
>I assume this is post-abort? I think that's just MPI saying the other
guy
>aborted?
Again, you're right. I just included it thinking that it might provide you
more
information.
>Have you tried loading your application into gdb on multiple nodes and
>seeing what is causing the abort by walking back up the stack? Or doing
>"gdb <application-exe> core" and "where" to get a stack dump?
I haven't yet, but I actually have isolated the place that triggers the
end of the
simulation.
It is in the following function:
bool
LTSFInputQueue::insert(BasicEvent *toInsert, int id)
(see file LTSFInputQueue.cc)
More specifically, the problem arises when the new message is inserted:
during that process, msgCancel is set to true. That's why, since
(msgCancel == false) holds, I see that
a) a call is made to print the event list (i.e., print(*outFile);)
b) the simulation aborts (i.e., abort(); -immediately after a) in the
Warped original source code-).
I've made no changes to the LTSFInputQueue class.
Thanks again for your help and I hope this provides you useful
information.
Cheers,
Ezequiel.
More information about the warped-users
mailing list