In traditional optimistic distributedsimulation protocols, a logical process (LP) receiving a straggler rolls back and sends out anti-messages. The receiver of an anti-message may also roll back and send out more ant...
详细信息
ISBN:
(纸本)9780818679650
In traditional optimistic distributedsimulation protocols, a logical process (LP) receiving a straggler rolls back and sends out anti-messages. The receiver of an anti-message may also roll back and send out more anti-messages. So a single straggler may result in a large number of anti-messages and multiple rollbacks of some LPs. In the authors' protocol, an LP receiving a straggler broadcasts its rollback. On receiving this announcement, other LPs may roll back but they do not announce their rollbacks. So each LP rolls back at most once in response to each straggler. Anti-messages are not used. This eliminates the need for output queues and results in simple memory management. It also eliminates the problem of cascading rollbacks and echoing, and results in faster simulation. All this is achieved by a scheme for maintaining transitive dependency information. The cost incurred includes the tagging of each message with extra dependency information and the increased processing time upon receiving a message. They also present the similarities between the two areas of distributedsimulation and distributed recovery. They show how the solutions for one area can be applied to the other area.
It is important to understand and efficiently predict the performance of large codes executing on massively parallel machines. However, these very large machines are scarce, expensive, and generally unavailable to lar...
详细信息
ISBN:
(纸本)9780818679650
It is important to understand and efficiently predict the performance of large codes executing on massively parallel machines. However, these very large machines are scarce, expensive, and generally unavailable to large segments of the research community. It is therefore important to implement performance analysis tools for such machines on platforms that are readily available to the research community at large. To meet this need, we have ported LAPSE, a parallel direct-execution simulator, from the Intel Paragon to an ordinary cluster of workstations. The goal of this research is to provide researchers the opportunity to study codes designed for execution on a massively parallel machine while physically executing on a workstation cluster. However, we encountered significant performance problems when moving to a workstation cluster, due primarily to high communication and context switching costs. To reduce these costs, we implemented the virtual processors of the simulated system using light-weight threads rather than heavy-weight Unix processes. In this paper, we discuss the issues involved in moving from a process-based to a thread-based simulator, and demonstrate up to a four fold increase in performance by doing so.
Interactive simulation of battles is a valuable tool for training. The behavior and movement of hundreds or thousands of entities (tanks, trucks, airplanes, missiles, etc.) is currently simulated using dozens or more ...
详细信息
Interactive simulation of battles is a valuable tool for training. The behavior and movement of hundreds or thousands of entities (tanks, trucks, airplanes, missiles, etc.) is currently simulated using dozens or more workstations on geographically distributed LANs connected by WANs. The simulated entities can move, fire weapons, receive "radio" messages, etc. The terrain that they traverse may change dynamically, for example due to rains turning dirt roads into mud or bombs forming craters. Thus the entities need to receive frequent information about the state of the terrain and the location and state of other entities. Typically, information is updated several times a second. As the number of simulated entities grows, the number of messages that need to be sent per unit of time can grow to unmanageable numbers. One approach to reducing the number of messages is to keep track of what entities need to know about which other entities and only send information to the entities that need to know. For example, tanks in Germany need not know about a change of course of a ship in the Pacific. This technique for reducing messages is known as interest management. Caltech and its Jet Propulsion Laboratory have implemented a simulation of this type on several large-scale parallel computers, exploiting both the compute power and the fast messaging fabric of such systems. The application is implemented using a heterogeneous approach. Some nodes are used to simulate entities, some to manage a database of terrain information, some to provide interest management functions, and some to route messages to the entities that do need to receive the information. Some of these tasks require more memory than others, some require faster processing capability.
Time Warp's optimistic scheduling requires the maintenance of simulation state history to support rollback in the event of causality violations. State history, and the ability to rollback the simulation, can provi...
ISBN:
(纸本)9780818679650
Time Warp's optimistic scheduling requires the maintenance of simulation state history to support rollback in the event of causality violations. State history, and the ability to rollback the simulation, can provide unique functionality for human-in-the-loop simulation environments. This paper investigates the use of Time Warp to output valid simulation state in a near real-time manner, re-execute portions of the simulation, and interactively probe simulation values to ascertain underlying causes of transient behavior.A shared-memory, multi-threaded interactive simulation architecture is presented and the additional state saving requirements imposed by interactivity are examined. The shortcomings of existing state saving schemes lead us to propose Multiplexed State Saving (MSS). By interleaving checkpointing and incremental state logs MSS provides bounded rollback costs and asynchronous access to prior simulation state. The interaction algorithms and MSS form a scalable, bounded cost component suitable for use in a real-time interactive Time Warp system.
The proceedings contains 24 papers. Topics discussed include load balancing in parallelsimulation, asynchronous transfer mode networks, computer architecture, state saving and synchronization in parallelsimulation, ...
详细信息
The proceedings contains 24 papers. Topics discussed include load balancing in parallelsimulation, asynchronous transfer mode networks, computer architecture, state saving and synchronization in parallelsimulation, granularity and partitioning, logic circuits, queueing models, VHDL simulation.
Presented is a conservative algorithm for the parallelsimulation of billiard balls. A spatial approach to these simulations is commonly employed, in which the billiard table is partitioned into segments which are sim...
详细信息
Presented is a conservative algorithm for the parallelsimulation of billiard balls. A spatial approach to these simulations is commonly employed, in which the billiard table is partitioned into segments which are simulated by different processors. The conservative algorithm differs from previous approaches in that it makes use of shared variables to enable processors to ascertain the state of the computation at neighboring processors. The shared variable corresponds to a region at the boundary of the table segments. By making use of shared variables a significant speed-up is obtained.
Advances in massively parallel platforms are increasing the prospects for high performance discrete event simulation. Still the difficulty in parallel programming persists and there is increasing demand for high level...
详细信息
Advances in massively parallel platforms are increasing the prospects for high performance discrete event simulation. Still the difficulty in parallel programming persists and there is increasing demand for high level support for building discrete event models to execute on such platforms. We present a parallel DEVS-based (Discrete Event System Specification) simulation environment that can execute on distributed memory multicomputer systems with bench-marking results of a class of high resolution, large scale ecosystem models. Underlying the environment is a parallel container class library for hiding the details of message passing technology while providing high level abstractions for hierarchical, modular DEVS models. The C++ implementation working on the Thinking Machines CM-5 demonstrates that the desire for high level modeling support need not be irreconcilable with sustained high performance.
A new conservative algorithm for both parallel and sequential simulation of networks is described. The technique is motivated by the construction of a high performance simulator for ATM networks. It permits very fast ...
详细信息
A new conservative algorithm for both parallel and sequential simulation of networks is described. The technique is motivated by the construction of a high performance simulator for ATM networks. It permits very fast execution of models of ATM systems, both sequentially and in parallel. A simple analysis of the performance of the system is made. Initial performance results from parallel and sequential implementations are presented and compared with comparable results from an optimistic TimeWarp based simulator. It is shown that the conservative simulator performs well when the 'density' of messages in the simulated system is high, a condition which is likely to hold in many interesting ATM scenarios.
We investigate conservative parallel discrete event simulations for logical circuits on shared-memory multiprocessors. For a first estimation of the possible speedup, we extend the critical path analysis technique by ...
详细信息
We investigate conservative parallel discrete event simulations for logical circuits on shared-memory multiprocessors. For a first estimation of the possible speedup, we extend the critical path analysis technique by partitioning strategies. To incorporate overhead due to the management of data structures, we use a simulation on an ideal parallel machine (PRAM). This simulation can be directly executed on the SB-PRAM prototype, yielding both an implementation and a basis for data structure optimizations. One of the major tools to achieve these is the SB-PRAM's hardware support for parallel prefix operations. Our reimplementation of the PTHOR program on the SB-PRAM yields substantially higher speedups than before.
Most experimental studies of the performance of parallelsimulation protocols use speedup or number of events processed per unit time as the performance metric. Although helpful in evaluating the usefulness of paralle...
详细信息
Most experimental studies of the performance of parallelsimulation protocols use speedup or number of events processed per unit time as the performance metric. Although helpful in evaluating the usefulness of parallelsimulation for a given simulation model, these metrics tell us little about the efficiency of the simulation protocol used. In this paper, we describe an Ideal simulation Protocol (ISP), based on the concept of critical path, which experimentally computes the best possible execution time for a simulation model on a given parallel architecture. Since ISP computes the bound by actually executing the model on the given parallel architecture, it is much more realistic than that computed by a uniprocessor critical path analysis. The paper illustrates, using parameterized synthetic benchmarks, how an ISP-based performance evaluation can lead to much better insights into the performance of parallelsimulation protocols than what would be gained from speedup graphs alone.
暂无评论