In a large scale distributedsimulation with thousands of dynamic objects, efficient communication of data among these objects is an important issue. The broadcasting mechanism specified by the distributed Interactive...
详细信息
In a large scale distributedsimulation with thousands of dynamic objects, efficient communication of data among these objects is an important issue. The broadcasting mechanism specified by the distributed Interactive simulation (DIS) standards is not suitable for large scale distributedsimulations. In the high level architecture (HLA) paradigm, the Runtime Infrastructure (RTI) provides a set of services, such as data distribution management (DDM) among federates. The goal of the DDM module in RTI is to make the data communication more efficient by sending the data only to those federates that need the data, as opposed to the broadcasting mechanism employed by DIS. Several DDM schemes have appeared in the literature. We discuss grid based DDM and develop a DDM model that uses grids for matching the publishing/subscription regions, and for data filtering. We show that appropriate choice of the grid-cell size is crucial in obtaining good performance. We develop an analytical model and derive a formula for identifying the optimal cell size in grid-based DDM.
Real-time distributedsimulations, such as on-line gaming or military training simulations are normally considered to be non-deterministic. Analysis of these simulations is therefore difficult depending solely on logg...
详细信息
Real-time distributedsimulations, such as on-line gaming or military training simulations are normally considered to be non-deterministic. Analysis of these simulations is therefore difficult depending solely on logging and runtime observations. This paper explores an approach for removing one major source of non-determinism in these simulations, thereby allowing repeatable executions. Specifically, we use a synchronization protocol to ensure repeatable delivery of messages. Through limited instrumentation of the simulation code, we maintain a virtual time clock, by which message delivery is governed. The additional overhead imposed by the scheme is shown to be reasonable, although additional reductions in this overhead are anticipated. The results are demonstrated in the context of a simple combat model, whose only source of non-determinism is communications latency. The simulation is shown to be made repeatable, and the perturbation on the execution compared to the non-repeatable execution small. The paper is one step in bridging the gap between the traditional PDES perspective and real-time simulation world.
The paper examines issues that recur in consideration of simulation time-stamps, in the context of building very large simulation models from components developed by different groups at different times. A key problem ...
详细信息
The paper examines issues that recur in consideration of simulation time-stamps, in the context of building very large simulation models from components developed by different groups at different times. A key problem here is "safety", loosely defined to mean that unintended model behavior does not occur due to unpredictable behavior of timestamp generation and comparisons. We revisit the problems of timestamp format and simultaneity, and then turn to the new problem of timestamp interoperability. We describe how a C++ simulation kernel can support the concurrent evaluation of submodels that internally use heterogeneous timestamps, and evaluate the execution time costs of doing so. We find that use of a safe timestamp format that explicitly allows different time scales costs less than 10% over a stock 64-bit integer format, whereas support for completely heterogeneous timestamps can cost as much as 50% in execution speed.
Load balancing is a crucial factor in achieving good performance for parallel discrete event simulations. We present a load balancing scheme that combines both static partitioning and dynamic load balancing. The stati...
详细信息
Load balancing is a crucial factor in achieving good performance for parallel discrete event simulations. We present a load balancing scheme that combines both static partitioning and dynamic load balancing. The static partitioning scheme maps simulation objects to logical processes before simulation starts while the dynamic load balancing scheme attempts to balance the load during runtime. The static scheme involves two steps. First, the simulation objects that contribute to small lookahead are merged together by using a merging algorithm. Then a partitioning algorithm is applied. The merging is needed to ensure a consistent performance for our dynamic scheme. Our dynamic scheme is tailor-made for an asynchronous simulation protocol that does not rely on null messages. The performance study on a supply chain simulation shows that the partitioning algorithm and dynamic load balancing are important in achieving good performance.
In this paper we present the application of an approach for the performance prediction of message passing programs, to a PVM code implementing an iterative solver based on the Successive OverRelaxation method. The app...
详细信息
ISBN:
(纸本)0769505007
In this paper we present the application of an approach for the performance prediction of message passing programs, to a PVM code implementing an iterative solver based on the Successive OverRelaxation method. The approach, based on the integration of static program analysis and simulation techniques, is aimed at significantly speeding up the time needed for simulating the execution of a message passing program. We show how the proposed technique can provide, in a reasonable elaboration time, the user for a characterization of iterative regular programs as the proposed one, in terms of idle-, cpu-, communication and synchronization time in Heterogeneous and Network Computing environments.
The evaluation of network performance under real application loads is carried out by detailed time-intensive and resource-intensive simulations. Moreover, the use of ILP (instruction-level parallel) processors in cc-N...
详细信息
ISBN:
(纸本)0769505007
The evaluation of network performance under real application loads is carried out by detailed time-intensive and resource-intensive simulations. Moreover, the use of ILP (instruction-level parallel) processors in cc-NUMA (cache-coherent non-uniform memory access) architectures introduces non-deterministic memory accesses; the resulting parallel system must be modeled by a detailed execution-driven simulation, further increasing the evaluation cost. This paper introduces a simulation methodology, based on network traces, to estimate the impact that a given network has on the execution time of parallel applications. This methodology allows the study of the network design space with a level of accuracy close to that of execution-driven simulations but with much shorter simulation times. The network trace, extracted from an execution-driven simulation, is processed to substitute the temporal dependencies produced by the simulated network with an estimation of the message dependencies caused by both the application and the applied cache-coherent protocol. This methodology has been tested on two direct networks, with 16 and 64 nodes respectively, running the FFT and Radix applications of the SPLASH2 suite. The trace-driven simulation is 3 to 4 times faster than the execution-driven one, with an average error of 4% in the total execution time.
Summary form only given. It has been a little less than ten years since modeling and simulation (M&S) hit the knee on the curve. During these past few years a great deal of marketing of the potential of M&S ha...
详细信息
Summary form only given. It has been a little less than ten years since modeling and simulation (M&S) hit the knee on the curve. During these past few years a great deal of marketing of the potential of M&S has occurred, which resulted in a significant influx of funding for research and development (R&D) projects, especially in the area of distributedsimulation. One of the significant experiments in this area-the DARPA Synthetic Theater of War program-officially ended, bringing to an end one of the more robust experiments in distributedsimulation. We are now in a time where most of the M&S funding is targeted at production programs, with much fewer dollars going into R&D or experimentation. Although it is good that major programs are capitalizing on previous R&D efforts, it would not be true to say that the necessary R&D has been completed to realize the vision of distributedsimulation. It would be true to say that we now have a much better understanding of the issues. What is needed now is a period of reflection on the vision, where we are, what was done right, what hasn't worked well, and where we should be headed for the next five to ten years.
Several scheduling algorithms have been proposed to determine the next event to be executed on a processor in a time warp parallel discrete event simulation. However none of them is specifically designed for simulatio...
详细信息
Several scheduling algorithms have been proposed to determine the next event to be executed on a processor in a time warp parallel discrete event simulation. However none of them is specifically designed for simulations where the execution time (or granularity) for different types of events has large variance. We present a grain sensitive scheduling algorithm which addresses this problem. In our solution, the scheduling decision depends on both timestamp and granularity values with the aim at giving higher priority to small grain events even if their timestamp is not the lowest one (i.e. the closest one to the commitment horizon of the simulation). This implicitly limits the optimism of the execution of large grain events that, if rolled back, would produce a large waste of CPU time. The algorithm is adaptive in that it relies on the dynamic recalculation of the length of a simulated time window within which the timestamp of any good candidate event for the scheduling falls in. If the window length is set to zero, then the algorithm behaves like the standard Lowest-Timestamp-First (LTF) scheduling algorithm. simulation results of a classical benchmark in several different configurations are reported for a performance comparison with LTF: these results demonstrate the effectiveness of our algorithm.
暂无评论