We discuss new synchronization algorithms for parallel and distributed discrete event simulations (PDES) which exploit the capabilities and behavior of the underlying communications network. Previous work in this area...
详细信息
We discuss new synchronization algorithms for parallel and distributed discrete event simulations (PDES) which exploit the capabilities and behavior of the underlying communications network. Previous work in this area has assumed the network to be a black box which provides a one-to-one, reliable and in-order message passing paradigm. In our work, we utilize the broadcast capability of the ubiquitous Ethernet for synchronization computations, and both unreliable and reliable protocols for message passing, to achieve more efficient communications between the participating systems. We describe two new algorithms for computation of a distributed snapshot of global reduction operations on monotonically increasing values. The algorithms require O(N) messages (where N is the number of systems participating in the snapshot) in the normal case. We specifically target the use of this algorithm for distributed discrete event simulations to determine a global lower bound on time-stamp (LETS), but expect the algorithm has applicability outside the simulation community.
In a large scale distributedsimulation with thousands of dynamic objects, efficient communication of data among these objects is an important issue. The broadcasting mechanism specified by the distributed Interactive...
详细信息
In a large scale distributedsimulation with thousands of dynamic objects, efficient communication of data among these objects is an important issue. The broadcasting mechanism specified by the distributed Interactive simulation (DIS) standards is not suitable for large scale distributedsimulations. In the high level architecture (HLA) paradigm, the Runtime Infrastructure (RTI) provides a set of services, such as data distribution management (DDM) among federates. The goal of the DDM module in RTI is to make the data communication more efficient by sending the data only to those federates that need the data, as opposed to the broadcasting mechanism employed by DIS. Several DDM schemes have appeared in the literature. We discuss grid based DDM and develop a DDM model that uses grids for matching the publishing/subscription regions, and for data filtering. We show that appropriate choice of the grid-cell size is crucial in obtaining good performance. We develop an analytical model and derive a formula for identifying the optimal cell size in grid-based DDM.
Real-time distributedsimulations, such as on-line gaming or military training simulations are normally considered to be non-deterministic. Analysis of these simulations is therefore difficult depending solely on logg...
详细信息
Real-time distributedsimulations, such as on-line gaming or military training simulations are normally considered to be non-deterministic. Analysis of these simulations is therefore difficult depending solely on logging and runtime observations. This paper explores an approach for removing one major source of non-determinism in these simulations, thereby allowing repeatable executions. Specifically, we use a synchronization protocol to ensure repeatable delivery of messages. Through limited instrumentation of the simulation code, we maintain a virtual time clock, by which message delivery is governed. The additional overhead imposed by the scheme is shown to be reasonable, although additional reductions in this overhead are anticipated. The results are demonstrated in the context of a simple combat model, whose only source of non-determinism is communications latency. The simulation is shown to be made repeatable, and the perturbation on the execution compared to the non-repeatable execution small. The paper is one step in bridging the gap between the traditional PDES perspective and real-time simulation world.
The paper examines issues that recur in consideration of simulation time-stamps, in the context of building very large simulation models from components developed by different groups at different times. A key problem ...
详细信息
The paper examines issues that recur in consideration of simulation time-stamps, in the context of building very large simulation models from components developed by different groups at different times. A key problem here is "safety", loosely defined to mean that unintended model behavior does not occur due to unpredictable behavior of timestamp generation and comparisons. We revisit the problems of timestamp format and simultaneity, and then turn to the new problem of timestamp interoperability. We describe how a C++ simulation kernel can support the concurrent evaluation of submodels that internally use heterogeneous timestamps, and evaluate the execution time costs of doing so. We find that use of a safe timestamp format that explicitly allows different time scales costs less than 10% over a stock 64-bit integer format, whereas support for completely heterogeneous timestamps can cost as much as 50% in execution speed.
Load balancing is a crucial factor in achieving good performance for parallel discrete event simulations. We present a load balancing scheme that combines both static partitioning and dynamic load balancing. The stati...
详细信息
Load balancing is a crucial factor in achieving good performance for parallel discrete event simulations. We present a load balancing scheme that combines both static partitioning and dynamic load balancing. The static partitioning scheme maps simulation objects to logical processes before simulation starts while the dynamic load balancing scheme attempts to balance the load during runtime. The static scheme involves two steps. First, the simulation objects that contribute to small lookahead are merged together by using a merging algorithm. Then a partitioning algorithm is applied. The merging is needed to ensure a consistent performance for our dynamic scheme. Our dynamic scheme is tailor-made for an asynchronous simulation protocol that does not rely on null messages. The performance study on a supply chain simulation shows that the partitioning algorithm and dynamic load balancing are important in achieving good performance.
In this paper we present the application of an approach for the performance prediction of message passing programs, to a PVM code implementing an iterative solver based on the Successive OverRelaxation method. The app...
详细信息
ISBN:
(纸本)0769505007
In this paper we present the application of an approach for the performance prediction of message passing programs, to a PVM code implementing an iterative solver based on the Successive OverRelaxation method. The approach, based on the integration of static program analysis and simulation techniques, is aimed at significantly speeding up the time needed for simulating the execution of a message passing program. We show how the proposed technique can provide, in a reasonable elaboration time, the user for a characterization of iterative regular programs as the proposed one, in terms of idle-, cpu-, communication and synchronization time in Heterogeneous and Network Computing environments.
This paper presents a statistical performance comparison between the cyclic moments-based and Wigner-Ville distribution-based instantaneous frequency estimators for linear FM signals in real valued multiplicative and ...
详细信息
This paper presents a statistical performance comparison between the cyclic moments-based and Wigner-Ville distribution-based instantaneous frequency estimators for linear FM signals in real valued multiplicative and complex-valued additive noise. Theoretical results are used to compare the performance of the estimation algorithms over a wide range of conditions. simulation results confirm our theoretical derivations.
The evaluation of network performance under real application loads is carried out by detailed time-intensive and resource-intensive simulations. Moreover, the use of ILP (instruction-level parallel) processors in cc-N...
详细信息
ISBN:
(纸本)0769505007
The evaluation of network performance under real application loads is carried out by detailed time-intensive and resource-intensive simulations. Moreover, the use of ILP (instruction-level parallel) processors in cc-NUMA (cache-coherent non-uniform memory access) architectures introduces non-deterministic memory accesses; the resulting parallel system must be modeled by a detailed execution-driven simulation, further increasing the evaluation cost. This paper introduces a simulation methodology, based on network traces, to estimate the impact that a given network has on the execution time of parallel applications. This methodology allows the study of the network design space with a level of accuracy close to that of execution-driven simulations but with much shorter simulation times. The network trace, extracted from an execution-driven simulation, is processed to substitute the temporal dependencies produced by the simulated network with an estimation of the message dependencies caused by both the application and the applied cache-coherent protocol. This methodology has been tested on two direct networks, with 16 and 64 nodes respectively, running the FFT and Radix applications of the SPLASH2 suite. The trace-driven simulation is 3 to 4 times faster than the execution-driven one, with an average error of 4% in the total execution time.
暂无评论