The efficiency of parallel Discrete Event simulations that use the optimistic protocol is strongly dependent on the overhead incurred by rollbacks. This paper introduces a novel approach to rollback processing which l...
详细信息
The efficiency of parallel Discrete Event simulations that use the optimistic protocol is strongly dependent on the overhead incurred by rollbacks. This paper introduces a novel approach to rollback processing which limits the number of events rolled back as a result of a straggler or antimessage. The method, called Breadth-First Rollback (BFR), is suitable for spatially explicit problems where the space is discretized and distributed among processes and simulation objects move freely in the space. BFR uses incremental state saving, allowing the recovery of causal relationships between events during rollback. These relationships are then used to determine which events need to be rolled back. Our results demonstrate an almost linear speedup - a dramatic improvement over the traditional approach to rollback processing.
A distributed algorithm is introduced for the analysis of large continuous time Markov chains (CTMCs) by combining in some sense numerical solution techniques and simulation. CTMCs are described as a set of processes ...
详细信息
A distributed algorithm is introduced for the analysis of large continuous time Markov chains (CTMCs) by combining in some sense numerical solution techniques and simulation. CTMCs are described as a set of processes communicating via message passing. The state of a process is described by a probability distribution over a set of reachable states rather than by a single state. simulation is used to determine event times and messages types to be exchanged, whereas transitions are realized by vector matrix products as in iterative numerical analysis techniques. In this way, the state space explosion of numerical analysis is avoided, but it is still possible to determine more detailed results than with simulation. parallelization of the algorithm is realized applying a conservative synchronization scheme, which exploits the possibility of precomputing event times as already proposed for parallelsimulation of CTMCs. In contrast to a pure simulation approach, the amount of computation is increased, whereas the amount of communication keeps constant. Hence it is possible to achieve even on a workstation cluster a significant speedup.
We present a novel approach to parallel discrete event simulation based on the Cilk model of multithreaded computation. Cilk's runtime system not only manages the low-level aspects of program execution, but also p...
详细信息
We present a novel approach to parallel discrete event simulation based on the Cilk model of multithreaded computation. Cilk's runtime system not only manages the low-level aspects of program execution, but also provides the user with an algorithmic model of performance which can be used to predict the execution time of a parallelsimulation. Moreover, a Cilk application can `scale down' to run on a single processor with nearly the same performance as that of serial code. A conservative parallel discrete event simulation algorithm has been developed in which communication between logical processes is achieved using Cilk's virtual memory model, dag consistent shared memory. The simulation executes in cycles, where each cycle involves a divide and conquer computation. Although local lookahead information can be exploited, the algorithm is robust in that it also calculates a global simulation time for each cycle. It can therefore be used for applications where zero lookahead may occur.
We present a conservative strategy for spatially decomposed parallel discrete event battlefield simulation. The traditional null message algorithm provides a foundation from which a mapping to generic simulation attri...
详细信息
We present a conservative strategy for spatially decomposed parallel discrete event battlefield simulation. The traditional null message algorithm provides a foundation from which a mapping to generic simulation attributes can be made. We informally discuss preservation of logical correctness and freedom from deadlock. Experimental results demonstrate the potential execution time savings when load imbalance is not dominant;more importantly, they highlight improvement opportunities in spite of potential load imbalance. The net result is that a very reasonable performance gain can be delivered for little effort in a way that supports good simulation system design principles. The approach is straightforward and can be easily implemented as part of a more general sequential or parallelsimulation support environment. While the approach is expressed in terms of battlefield simulation, its essence applies to many simulation applications.
Multi-resolution representation of simulated entities is considered essential for a growing portion of distributedsimulations. Heretofore, modelers have represented entities at just one level of resolution, or have r...
详细信息
Multi-resolution representation of simulated entities is considered essential for a growing portion of distributedsimulations. Heretofore, modelers have represented entities at just one level of resolution, or have represented concurrent representations in an inconsistent manner. We address the question of the cost of maintaining multiple, concurrent representations. We present a brief overview of our concept of a Multiple Resolution Entity (MRE) and Attribute Dependency Graph (ADG) both originally described elsewhere, and then compare simulation and consistency costs of some approaches, including our own MRE/ADG-based approach, to multi-resolution modeling. The cost analysis presented here is the first known analysis of its type, and will provide a basis for simulation designers to determine the best, and most cost-effective approach to supporting simulation of entities at different levels of resolution concurrently.
Excessive rollback recoveries due to overoptimistic event execution in Time Warp simulators often degrade their runtime performance. This paper presents a two-sided throttling scheme to dynamically adjust the event ex...
详细信息
Excessive rollback recoveries due to overoptimistic event execution in Time Warp simulators often degrade their runtime performance. This paper presents a two-sided throttling scheme to dynamically adjust the event execution speed of Time Warp simulators. The proposed throttle is based on a new concept called global progress window, which allows the individual simulation process to be positioned on a global time scale, thereby to accelerate or suspend their event execution. As each simulation process can be throttled to a steady state, excessive rollback recoveries due to causality errors can be avoided. To quantify the effect of rollbacks and for purpose of comparing different Time Warp implementations, we propose two new measures called RPE (number of Rollback events Per committed Event), and Ε (relative Effectiveness in reducing rollback overhead). Our implementation results show that the proposed throttle effectively regulates the proceeding of each simulation process, resulting in a significant reduction in rollback thrashing and elapsed time.
Time Warp's optimistic scheduling requires the maintenance of simulation state history to support rollback in the event of causality violations. State history, and the ability to rollback the simulation, can provi...
详细信息
ISBN:
(纸本)0818679654
Time Warp's optimistic scheduling requires the maintenance of simulation state history to support rollback in the event of causality violations. State history, and the ability to rollback the simulation, can provide unique functionality for human-in-the-loop simulation environments. This paper investigates the use of Time Warp to output valid simulation state in a near real-time manner re-execute portions of the simulation, and interactively probe simulation values to ascertain underlying causes of transient behavior. A shared-memory, multi-threaded interactive simulation architecture is presented and the additional state saving requirements imposed by interactivity are examined. The shortcomings of existing state saving schemes lead us to propose Multiplexed State Saving (MSS). By interleaving checkpointing and incremental state logs MSS provides bounded rollback costs and asynchronous access to prior simulation state. The interaction algorithms and MSS form a scalable, bounded cost component suitable for use in a real-time interactive Time Warp system.
In traditional optimistic distributedsimulation protocols, a logical process (LP) receiving a straggler rolls back and sends out anti-messages. Receiver of an anti-message may also roll back and send out more anti-me...
详细信息
In traditional optimistic distributedsimulation protocols, a logical process (LP) receiving a straggler rolls back and sends out anti-messages. Receiver of an anti-message may also roll back and send out more anti-messages. So a single straggler may result in a large number of anti-messages and multiple rollbacks of some LPs. In our protocol, an LP receiving a straggler broadcasts its rollback. On receiving this announcement, other LPs may roll back but they do not announce their rollbacks. So each LP rolls back at most once in response to each straggler. Anti-messages are not used. This eliminates the need for output queues and results in simple memory management. It also eliminates the problem of cascading rollbacks and echoing, and results in faster simulation. All this is achieved by a scheme for maintaining transitive dependency information. The cost incurred includes the tagging of each message with extra dependency information and the increased processing time upon receiving a message. We also present the similarities between the two areas of distributedsimulation and distributed recovery. We show how the solutions for one area can be applied to the other area.
This paper is a reminder of the danger of allowing `risk' when synchronizing a parallel discrete-event simulation: a simulation code that runs correctly on a serial machine may, when run in parallel, fail catastro...
详细信息
This paper is a reminder of the danger of allowing `risk' when synchronizing a parallel discrete-event simulation: a simulation code that runs correctly on a serial machine may, when run in parallel, fail catastrophically. This can happen when Time Warp presents an `inconsistent' message to an LP, a message that makes absolutely no sense given the LP's state. Failure may result if the simulation modeler did not anticipate the possibility of this inconsistency. While the problem is not new, there has been little discussion of how to deal with it;furthermore the problem may not be evident to new users or potential users of parallelsimulation. This paper shows how the problem may occur, and the damage it may cause. We show how one may eliminate inconsistencies due to lagging rollbacks and stale state, but then show that so long as risk is allowed it is still possible for an LP to be placed in a state that is inconsistent with model semantics, again making it vulnerable to failure. We finally show how simulation code can be tested to ensure safe execution under a risk-free protocol. Whether risky or risk-free, we conclude that under current practice the development of correct and safe parallelsimulation code is not transparent to the modeler;certain protections must be included in model code or model testing that are not rigorously necessary if the simulation were executed only serially.
暂无评论