The two main approaches to parallel discrete event simulation - conservative and optimistic - are likely to encounter some limitations when the size and complexity of the simulation system increases. For such large sc...
详细信息
ISBN:
(纸本)1565550552
The two main approaches to parallel discrete event simulation - conservative and optimistic - are likely to encounter some limitations when the size and complexity of the simulation system increases. For such large scale simulations, the conservative approach appears to be limited by blocking overhead and sensitivity to lookahead, whereas the optimistic approach may become prone to cascading rollbacks, state saving overhead, and demands for larger memory space. These drawbacks restrict the synchronization schemes based on each of the two approaches from scaling up. A combined approach may resolve these limitations, while preserving and utilizing potential advantages of each method. However, the schemes proposed so far integrate the two views at the same level, i.e. local to a logical process, and hence may not be able to fully solve the problems. In this paper we propose the Local Time Warp method for parallel discrete-event simulation and present a novel synchronization scheme for it called HCTW. The new scheme hierarchically combines a Conservative Time Window algorithm with Time Warp and aims at reducing cascade rollbacks, sensitivity to lookahead, and the scalability problems. Local Time Warp is believed to be suitable for parallel machines equipped with thousands of processors and thus an appropriate candidate for simulation of large and complex systems.
The successful application of optimistic synchronization techniques in parallelsimulation requires that rollback overheads be contained. The chief contributions to rollback overhead in a Time Warp simulation are the ...
详细信息
ISBN:
(纸本)1565550552
The successful application of optimistic synchronization techniques in parallelsimulation requires that rollback overheads be contained. The chief contributions to rollback overhead in a Time Warp simulation are the time required to save state information and the time required to restore a previous state. Two competing techniques for reducing rollback overhead are periodic checkpointing (Lin and Lazowska, 1989) and incremental state saving (Bauer et al., 1991). This paper analytically compares the relative performance of periodic checkpointing to incremental state savings. The analytical model derived for periodic checkpointing is based almost entirely on the previous model developed by Lin (Lin and Lazowska, 1989). The analytical model for incremental state saving has been developed for this study. The comparison assumes an optimal checkpoint interval and shows under what simulation parameters each technique performs best.
The suitability of the Time Warp mechanism to perform simulations with real-time constraints is examined. A model for Time Warp is developed that accounts for overheads such as state saving, state restoration, and sen...
详细信息
ISBN:
(纸本)1565550552
The suitability of the Time Warp mechanism to perform simulations with real-time constraints is examined. A model for Time Warp is developed that accounts for overheads such as state saving, state restoration, and sending and transmitting positive and negative messages. A criterion called R-schedulability is defined to indicate whether or not computations can meet real-time deadlines. It is shown that if false events (events that will be rolled back or cancelled later) are generated, and there are no committed events with timestamps equal to those of the false events, Time Warp cannot meet the R-schedulability criterion. Further, if aggressive cancellation is used, scheduling guarantees still cannot be made even in the absence of such false events. However, Time Warp using lazy cancellation is shown to be R-schedulable provided such false events do not exist. Finally, based on these results, bounds on the execution time of a Time Warp simulation are derived.
In Time Warp parallelsimulation, the state of each process must be saved (checkpointed) regularly in case a rollback is necessary. Although most existing Time Warp implementations checkpoint after every state transit...
详细信息
ISBN:
(纸本)1565550552
In Time Warp parallelsimulation, the state of each process must be saved (checkpointed) regularly in case a rollback is necessary. Although most existing Time Warp implementations checkpoint after every state transition, this is not necessary, and the checkpoint interval is in reality a tuning parameter of the simulation. Lin and Lazowska proposed a model to derive the optimal checkpoint interval by assuming that the rollback behavior of Time Warp is not affected by the frequency of checkpointing. An experimental study conducted by Preiss et al. indicates that the behavior of rollback is affected by the frequency of checkpointing in general, and that the Lin-Lazowska model may not reflect the real situations in general. This paper extends the Lin-Lazowska model to include the effect of the checkpoint interval on the rollback behavior. The relationship among the checkpoint interval, the rollback behavior, and the overhead associated with state saving and restoration is described. A checkpoint interval selection algorithm which quickly determines the optimal checkpoint interval during the execution of Time Warp simulation is proposed. Empirical results indicate that the algorithm converges quickly and always selects the optimal checkpoint interval.
Researchers already use high-performance simulators for algorithm development and architectural studies. Advances in simulation technology and workstation performance have made program development on top of simulators...
详细信息
ISBN:
(纸本)0897916336
Researchers already use high-performance simulators for algorithm development and architectural studies. Advances in simulation technology and workstation performance have made program development on top of simulators- debugging, testing, and some tuning-fast enough for real applications. Simulators provide many advantages over running directly on a multiprocessor, including versatility, trivial repeatability, and detailed nonintrusive data collection and debugging. We make the case for application development via simulation and address simulation s traditional disadvantages. We also propose a development methodology that integrates simulation into the multiprocessor development environment. Finally, we examine the software engineering issues required by this integration and the use of parallel simulators for development. We believe that most multiprocessor application development should be done on top of a high-performance simulator (running on a workstation) rather than directly on the multiprocessor. Researchers already use these high-performance simulators for algorithm development [CBDW91], architectural studies [HM92], and language and compiler design (WBC+91, HWW93]. The primary advantages of development via simulation are the cost effectiveness of workstations, the ability to exploit powerful sequential debuggers, the support for nonintrusive data collection and invariant checking, and the versatility of simulation. Although traditional multiprocessor simulators arc too slow to run real applications, high-performance simulators. including TangoLite [DGH91], RPPT [CDJ+91], and our own system Proteus [BDCW91], achieve orders of magnitude speedup over traditional simulators;typical development simulations take only a few minutes. The rapid increases in workstation performance will continue to diminish the turn-Around time. Although we designed proteus to explore algorithms and language issues, our experience and that of other users indicates that development is
By offering a shared address space across a number of processors connected by a local area network, the distributed shared memory model offers an attractive way of programming parallel-distributed applications. Such p...
详细信息
The implementation of the pending event set (PES) is crucial to the execution speed of discrete event simulation programs. This paper studies the implementation of the PES in the context of simulations executing on pa...
详细信息
ISBN:
(纸本)1565550552
The implementation of the pending event set (PES) is crucial to the execution speed of discrete event simulation programs. This paper studies the implementation of the PES in the context of simulations executing on parallel computers using the Time Warp mechanism. We present a scheme for implementing Time Warp's PES based on well-known data structures for priority queues. This scheme supports efficient management of future and past events, especially for rollback and fossil collection operations. A comparative study of several queue implementations is presented. Experiments with a Time Warp system executing on a Kendall Square Research multiprocessor (KSR1) demonstrate that the implementation of the input queue can have a dramatic impact on performance, as large as an order of magnitude, that is much greater than what can be accounted for by simply the reduced execution time to access the data structure. In particular, it is demonstrated that an efficient input queue implementation can also significantly reduce the number of rollbacks, and the efficiency of memory management policies such as Jefferson's cancelback protocol. In the context of this work we also present an improved version of the skew heap that allows dequeueing of arbitrary elements at low cost. In particular, the possibility of dequeueing arbitrary elements will improve memory utilization. This ability is also important in applications where frequent rescheduling may occur, as in ready queues used to select the next logical process to execute.
This work presents results from an experimental evaluation of the space-time tradeoffs in Time Warp augmented with the cancelback protocol for memory management. An implementation of the cancelback protocol on Time Wa...
详细信息
ISBN:
(纸本)1565550552
This work presents results from an experimental evaluation of the space-time tradeoffs in Time Warp augmented with the cancelback protocol for memory management. An implementation of the cancelback protocol on Time Warp is described that executes on a shared memory multiprocessor, a 32 processor Kendall Square Research Machine (KSR1). The implementation supports canceling back more than one object when memory has been exhausted. The limited memory performance of the system is evaluated for three different workloads with varying degrees of symmetry. These workloads provide interesting stress cases for evaluating limited memory behavior. We, however, make certain simplifying assumptions (e.g, uniform memory requirement by all the events in the system) to keep the experiments tractable. The experiments are extensively monitored to determine the extent to which various overheads affect performance. It is observed that (i) depending on the available memory and asymmetry in the workload, canceling back several (called the salvage parameter) events at one time may improve performance significantly, by reducing certain overheads, (ii) a performance nearly equivalent to that with unlimited memory can be achieved with only a modest amount of memory depending on the degree of asymmetry in the workload.
Synchronization is a significant cost in many parallel programs, and can be a major bottleneck if it is handled in a centralized fashion using traditional shared-memory constructs such as barriers. In a parallel time-...
ISBN:
(纸本)9781565550551
Synchronization is a significant cost in many parallel programs, and can be a major bottleneck if it is handled in a centralized fashion using traditional shared-memory constructs such as barriers. In a parallel time-stepped simulation, the use of global synchronization primitives limits scalability, increases the sensitivity to load imbalance, and reduces the potential for exploiting locality to improve cache *** paper presents the results of an initial one-application study quantifying the costs and performance benefits of distributed, nearest neighbors synchronization. The application studied, MP3D, is a particle-based wind tunnel simulation. Our results for this one application on current shared-memory multiprocessors show a significant decrease in synchronization time using these techniques. We prototyped an application-independent library that implements distributed synchronization. The library allows a variety of parallelsimulations to exploit these techniques without increasing the application programming beyond that of conventional approaches.
暂无评论