It is well known that the critical path provides an absolute lower bound on the execution time of a conservative parallel discrete event simulation. It stands to reason that optimal execution time can only be achieved...
详细信息
It is well known that the critical path provides an absolute lower bound on the execution time of a conservative parallel discrete event simulation. It stands to reason that optimal execution time can only be achieved by immediately executing each event on the critical path. However, dynamically identifying the critical event is difficult, if not impossible. In this paper, we examine several heuristics that might help to determine the critical event, and conduct a performance study to determine the effectiveness of using these heuristics for preferential scheduling.
In this paper we describe an approach to exploit temporal uncertainty in parallel and distributedsimulation by utilizing time intervals rather than precise time stamps. Unlike previously published work that propose n...
详细信息
In this paper we describe an approach to exploit temporal uncertainty in parallel and distributedsimulation by utilizing time intervals rather than precise time stamps. Unlike previously published work that propose new message ordering semantics, our approach is based on conservative, time stamp order execution and enhancing the lookahead of the simulation by pre-drawing random numbers from a distribution that models temporal uncertainty. The advantages of this approach are that it allows time intervals to be exploited using a conventional Time Stamp Order (TSO) delivery mechanism, and it offers the modeler greater statistical control over the assigned time stamps. An implementation of this approach is described and initial performance measurements are presented.
Performance models exist that reliably describe the execution time and efficiency of parallel discrete-event simulations executed in a synchronous iterative fashion. These performance models incorporate the effects of...
详细信息
Performance models exist that reliably describe the execution time and efficiency of parallel discrete-event simulations executed in a synchronous iterative fashion. These performance models incorporate the effects of processor heterogeneity, other processor load due to shared computational resources, application workload imbalance, and the use of speculative computation. This includes modeling the effects of predictive optimism, a technique for improving the accuracy of speculative assumptions. We extend these models to incorporate correlated workloads across the set of processors and validate the models with two different applications.
In this paper we discuss new synchronization algorithms for parallel and distributed Discrete Event simulations (PDES) which exploit the capabilities and behavior of the underlying communications network. Previous wor...
详细信息
In this paper we discuss new synchronization algorithms for parallel and distributed Discrete Event simulations (PDES) which exploit the capabilities and behavior of the underlying communications network. Previous work in this area has assumed the network to be a Black Box which provides a one-to-one, reliable and in-order message passing paradigm. In our work, we utilize the Broadcast capability of the ubiquitous Ethernet for synchronization computations, and both unreliable and reliable protocols for message passing, to achieve more efficient communications between the participating systems. We describe two new algorithms for computation of a distributed snapshot of global reduction operations on monotonically increasing values. The algorithms require O(N) messages (where N is the number of systems participating in the snapshot) in the normal case. We specifically target the use of this algorithm for distributed discrete event simulations to determine a global lower bound on time-stamp (LBTS), but expect the algorithm has applicability outside the simulation community.
This paper investigates issues concerning federations of sequential and/or parallel simulators. An approach is proposed for creating federated simulations by defining a global conceptual model of the entire simulation...
详细信息
This paper investigates issues concerning federations of sequential and/or parallel simulators. An approach is proposed for creating federated simulations by defining a global conceptual model of the entire simulation, and then mapping individual entities of the conceptual model to implementations within individual federates. Proxy entities are defined as a means for linking entities that are mapped to different federates. Using this approach, an implementation of a federation of optimistic simulators is examined. Issues concerning the adaptation of optimistic simulators to a federated system are discussed. The performance of the federated system utilizing runtime infrastructure (RTI) software executing on a shared memory multiprocessor (SMP) is compared with a native (non-federated) SMP-based optimistic parallel simulator. It is demonstrated that a well designed federated simulation system can yield performance comparable to a native, parallelsimulation engine, but important implementation issues must be properly addressed.
Rapid progress in the design of fast CPU chips has outstripped progress in memory and cache performance. Optimistic algorithms would seem to be more vulnerable to poor memory performance because they require extra mem...
详细信息
Rapid progress in the design of fast CPU chips has outstripped progress in memory and cache performance. Optimistic algorithms would seem to be more vulnerable to poor memory performance because they require extra memory for state saving and anti-messages. We examine the performance of both optimistic and conservative protocols in controlled experiments to evaluate the effects of memory speed and cache size, using a variety of applications.
Many synchronous algorithms have been proposed for parallel and discrete simulations. However, the actual performance of these algorithms have been far from ideal, especially when event granularity is small. Barring t...
详细信息
Many synchronous algorithms have been proposed for parallel and discrete simulations. However, the actual performance of these algorithms have been far from ideal, especially when event granularity is small. Barring the case of low parallelism in the given simulation models, one of the main reasons of low speedups is in the uneven load distribution among processors. We present several new locality-preserving load balancing mechanisms for synchronous simulations on shared-memory multiprocessors. We show both theoretically and empirically that some of these mechanisms incur very low overhead. The results confirm that one of the new mechanisms is indeed more efficient and scalable than common existing approaches.
Circuit simulation has proven to be one of the most important computer aided design (CAD) methods for the analysis and validation of integrated circuit designs. A popular approach to describing circuits for simulation...
详细信息
Circuit simulation has proven to be one of the most important computer aided design (CAD) methods for the analysis and validation of integrated circuit designs. A popular approach to describing circuits for simulation purposes is to use a hardware description language such as VHDL. Similar efforts have also been carried out in the analog domain that has led to tools such as SPICE. However, with the growing trend of hardware designs that contain both analog and digital components, design environments that seamlessly integrate analog and digital circuitry are needed. simulation of such circuits is however, exacerbated by the higher resource (CPU and memory) demands that arise when analog and digital models are integrated in a mixed-mode (analog and digital) simulation. One solution to this problem is to use PDES algorithms on a distributed platform. However, a synchronization interface between the analog and digital simulation environment is required to achieve integrated mixed-mode simulation. In this paper, we present the issues involved in the construction of synchronization protocols which support mixed-mode simulation in a distributedsimulation environment. The proposed synchronization protocols provide an interface between an optimistic (Time Warp based) discrete-event simulation kernel and any continuous time simulation kernel. Empirical and formal analyses were conducted to ensure correctness and completeness of the protocols and the results of these analyses are also presented.
We have parallelized the Iowa Logic Simulator, a gate-level fine-grained discrete-event simulator, by employing an optimistic algorithm framework based on a global event queue implemented as a parallel heap. The origi...
详细信息
We have parallelized the Iowa Logic Simulator, a gate-level fine-grained discrete-event simulator, by employing an optimistic algorithm framework based on a global event queue implemented as a parallel heap. The original code and the basic data structures of the serial simulator remained unchanged. Wrapper data structures for the logical processes (gates) and the events are created to allow rollbacks, all the earliest events at each logical processes are stored into the parallel heap, and multiple earliest events are simulated repeatedly by invoking the simulate function of the serial simulator. The parallel heap allowed extraction of hundreds to thousands of earliest events in each queue access. On a bus-based shared-memory multiprocessor, simulation of synthetic circuits with 250,000 gates yielded speedups of 3.3 employing five processors compared to the serial execution time of the Iowa Logic Simulator, and limited the number of rollbacks to within 2,000. The basic steps of parallelization are well-defined and general enough to be employable on other discrete-event simulators.
Load balancing is a crucial factor in achieving good performance for parallel discrete event simulations. In this paper, we present a load balancing scheme that combines both static partitioning and dynamic load balan...
详细信息
Load balancing is a crucial factor in achieving good performance for parallel discrete event simulations. In this paper, we present a load balancing scheme that combines both static partitioning and dynamic load balancing. The static partitioning scheme maps simulation objects to logical processes before simulation starts while the dynamic load balancing scheme attempts to balance the load during runtime. The static scheme involves two steps. First, the simulation objects that contribute to small lookahead are merged together by using a merging algorithm. Then a partitioning algorithm is applied. The merging is needed to ensure a consistent performance for our dynamic scheme. Our dynamic scheme is tailor-made for an asynchronous simulation protocol that does not rely on null messages. The performance study on a supply-chain simulation shows that the partitioning algorithm and dynamic load balancing are important in achieving good performance.
暂无评论