[warped-users] Re: Warped v1.02 simulation aborts

Ezequiel Glinsky eglinsky at sce.carleton.ca
Wed Nov 10 16:01:44 EST 2004


Hi Dale,

Thanks a lot for your reply.

>Has the application successfully processed non-stragglers prior to this
>error?
>...
>(So I guess it would have processed some events OK to have all of these
>elements in the list.)

Exactly. The simulation works fine if no straggler messages are received,
although it terminates with the described error if a straggler is
detected.


>>     p4_error: latest msg from perror: Bad file descriptor
>I assume this is post-abort?  I think that's just MPI saying the other
guy
>aborted?

Again, you're right. I just included it thinking that it might provide you
more

information.

>Have you tried loading your application into gdb on multiple nodes and
>seeing what is causing the abort by walking back up the stack?  Or doing
>"gdb <application-exe> core" and "where" to get a stack dump?

I haven't yet, but I actually have isolated the place that triggers the
end of the

simulation.

It is in the following function:

bool
LTSFInputQueue::insert(BasicEvent *toInsert, int id)

(see file LTSFInputQueue.cc)

More specifically, the problem arises when the new message is inserted:
during that process, msgCancel is set to true. That's why, since
(msgCancel == false) holds, I see that

a) a call is made to print the event list (i.e., print(*outFile);)
b) the simulation aborts (i.e., abort(); -immediately after a) in the
Warped original source code-).

I've made no changes to the LTSFInputQueue class.

Thanks again for your help and I hope this provides you useful
information.

Cheers,
Ezequiel.




More information about the warped-users mailing list