Subject: warped update
From: Jorgen Dahl (dahlj@ECECS.UC.EDU)
Date: Tue Mar 13 2001 - 08:30:13 MST
Update of /home/paw/CVS/warped/kernel/src
In directory viking.ececs.uc.edu:/work/dahlj/warped/kernel/src
Modified Files:
CommunicationManagerImplementationBase.cpp
Log Message:
Fixed a start-up problem.
The termination problems we have experienced may actually have been a
start-up problem. When I was trying to debug the termination problem,
I had severe trouble just starting the application on several nodes.
This problems exists because of how the initializations currently work.
First, the CommunicationManagerImplementationBase will send initialization
messages to all its peers. This will establish simulationObjectProxies and
so forth in the TimeWarpSimulationManager. When simulation managers (id !=
0) has received init messages from all their peers, they will sit and wait
for a Start message from simulation manager 0. Simulation manager 0, on
the other hand, will start sending Start messages to all its peers as soon
as it has received init messages from all of them. Note that we currently
assume FIFO messaging in WARPED, but we can only guarantee that messages
sent from one node to another arrives in order. We do not have a total
ordering of messages in the system at any given time, and this is why
Start messages will often arrive before Init messages.
In other words, we need to synchronize the communication managers in the
initialization phase before we synchronize the simulation managers with the
Start messages. I have done this by adding a step in the initialization
phase. After comm-manager 0 has received all InitializationMessages
it will send a CirculateInitializationMessage to the next comm-mgr. The
next comm-mgr is not allowed to send this circulate message until it has
received all its InitializationMessages. When the circulate message comes
back to comm-manager 0 we know that all comm-managers have received all
init messages and we are now ready to send Start messages.
Note that we do not necessarily need to send Start message anymore.
However, without the sending of Start messages, some simulation managers
will start to itch and want to send out termination tokens. This will be
handled and is not an error state, but it might be unecessary to do this,
and we use the Start messages to avoid it.
With this check-in I think I have eliminated the termination problem as
well. Before, some application messages could have been lost in the
start-up phase. I haven't been able to reproduce the termination problem
with these changes in place. Please let me know, however, if you still
experience a termination problem. If so, send me a copy of your
simulation.config file, ssl file and procgroup so I can continue to
debug that problem.
This archive was generated by hypermail 2b25 : Mon Mar 18 2002 - 13:00:09 MST