[Warped-devel] Secondary rollbacks, Event lists

Philip A. Wilsey philip.wilsey at cliftonlabs.com
Wed Mar 30 11:41:16 EST 2005


William,

On Mon, Mar 28, 2005 at 06:15:57PM -0300, William Voorsluys wrote:
> 
> Some weeks ago, when we were running some simulations with the PHOLD
> model, we've noticed that the number of events in the system was
> increasing without bound. That seemed strange as the a PHOLD object
> sends only one event for each event it receives. Well, after some
> investigation we found out that Warped was not correctly propagating
> the rollbacks, dealing with anti-messages, etc. Basically, what we've
> done was to make Warped correctly rollback the objects when they
> received a anti-message containing information about an already
> processed event. To facilitate the task, we've made several small
> modifications in different places of the kernel. The solution itself
> was not so hard to implement, although it was kind of hard to find
> what was causing the problem (as it is everything in programming, I
> guess :) ). Now, it seems the secondary rollbacks are working as
> expected. As soon as we finish the necessary tests we can contribute
> with the new code.

Great.  Send me the patch and I'll apply it.

> Another thing I'd like to mention is about event lists management.
> When running some tests with large models (10,000+ objects) with the
> default event set, we've noticed that the operation of peeking a new
> event from the unprocessed events set is the major bottleneck of the
> system. More precisely, it corresponds to almost 99% of the total
> execution time. The following code has been inserted in the
> executeObjects() function to measure that time:
> 
> /* code */    
> StopWatch sw1;
> sw1.start();
> t1 = sw1.elapsed();
> nextEvent = mySchedulingManager->peekNextEvent();
> t2 = sw1.elapsed();
> /* end */
> 
> That leads to a very bad performance. The times obtained with the
> parallel simulation are much worse than the sequential version of the
> PHOLD model, on a 4-node cluster. After some in-depth analysis we've
> found out that the search operation on the string hashmap was
> responsible for much of the overhead. The objects names we being
> resolved many times in order to find the correspondent object pointer.
> Again, we've changed the code to avoid this operation. But there is
> still the sort operation to classify the events according to their
> receive times, which is pretty inefficient.
> But now, the main question is: is it viable to implement another type
> of event lists, such as calendar queues?
> I've noticed that there is some code related to "Append queues" in the
> warped distribution. Are append queues a working and efficient
> solution?

Queue managment is always a problem.  I believe that the append queue idea
is solid and that it will lead to a nice performance boost when it is
finally implemented.  Basically the idea of append queues is summarized in
this paper:

* J. Dahl, M. Chetlur, and P. A. Wilsey,  "Event List Management In
  Distributed Simulation," European Parallel Computing Conference
  (Euro-Par '01), August 2001
  http://www.ececs.uc.edu/~paw/lab/papers/pdes/europar01.ps.gz

Unfortunately the student that was working on it has moved to other things
and that code is not currently being worked on.  If you want to work on it,
I'll be happy to interact with you on the algorithm design.

regards,
p. wilsey
-- 
Philip A. Wilsey  --gpg/pgp keys available--  philip.wilsey at cliftonlabs.com
Clifton Labs, Inc.                            http://www.cliftonlabs.com
7450 Montgomery Rd, Suite 300                 513 771-3751/4479 (voice/fax)
Cincinnati, OH 45236                          513 518-5515 (cell)




More information about the warped-devel mailing list