Based on a linear ordering of vertices in a directed graph, a linear-time partitioning algorithm for parallel logic simulation is presented. Unlike most other partitioning algorithms, the proposed algorithm preserves ...
详细信息
Based on a linear ordering of vertices in a directed graph, a linear-time partitioning algorithm for parallel logic simulation is presented. Unlike most other partitioning algorithms, the proposed algorithm preserves circuit concurrency by assigning to processors circuit gates that can be evaluated at about the same time. As a result, the concurrency preserving partitioning (CPP) algorithm can provide better load balancing throughout the period of a parallelsimulation. This is especially important when the algorithm is used together with a Time Warp simulation where a high degree of concurrency can lead to fewer rollbacks and better performance. The algorithm consists of three phases, and three conflicting goals can be separately considered in each phase so to reduce computational complexity. A parallel gate-level circuit simulator is implemented on an Intel Paragon machine to evaluate the performance of the CPP algorithm. The results are compared with two other partitioning algorithms to show that reasonable speedup may be achieved with the algorithm.
There is a wide-spread usage of hardware design languages(HDL) to speed up the time-to-market for the design of modern digital systems. Verification engineers can simulate hardware in order to verify its performance a...
详细信息
ISBN:
(纸本)0769519709
There is a wide-spread usage of hardware design languages(HDL) to speed up the time-to-market for the design of modern digital systems. Verification engineers can simulate hardware in order to verify its performance and correctness with help of an HDL. However simulation can't keep pace with the growth in size and complexity of circuits and has become a bottleneck of the design process. distributed HDL simulation on a cluster of workstations has the potential to provide a cost-effective solution to this problem. In this paper we describe the design and implementation of DVS, an object-oriented framework for distributed Verilog simulation. Verilog is an HDL which sees wide industrial use. DVS is an outgrowth of Clustered Time Warp, originally developed for logic simulation. The design of the framework emphasizes simplicity and extensibility and aims to accommodate experiments involving partitioning and dynamic load balancing. Preliminary results obtained by simulating a 16bit multiplier are presented.
One of the main reasons why parallel discrete event simulation has not been adopted more widely in industry is that the terminology used by the parallelsimulation community differs from that of industrial simulation ...
详细信息
One of the main reasons why parallel discrete event simulation has not been adopted more widely in industry is that the terminology used by the parallelsimulation community differs from that of industrial simulation practitioners. This paper shows how the gap between these two communities can be bridged by presenting a methodology for automating the parallelization of manufacturing simulations. Our approach provides a way of automatically generating a mapping from a sequential simulation model to an efficient parallel implementation. The results of this mapping can be expressed in a form which is independent of any particular parallelsimulation system or language. Since it is easy to generate code for different simulation systems, it is possible to evaluate alternative parallelsimulation protocols at an early stage of development. A prediction of the performance can thus be obtained by studying the behaviour of an abstraction of the simulation model with various strategies or on different computing platforms.
We propose in this paper two new asynchronous parallel algorithms for test set partitioned fault simulation. The algorithms are based on a new two-stage approach to parallelizing fault simulation for sequential VLSI c...
详细信息
We propose in this paper two new asynchronous parallel algorithms for test set partitioned fault simulation. The algorithms are based on a new two-stage approach to parallelizing fault simulation for sequential VLSI circuits in which the test set is partitioned among the available processors. These algorithms provide the same result as the previous synchronous two stage approach. However, due to the dynamic characteristics of these algorithms and due to the fact that there is very minimal redundant work, they run faster than the previous synchronous approach. A theoretical analysis comparing the various algorithms is also given to provide an insight into these algorithms. The implementations were done in MPI and are therefore portable to many parallel platforms. Results are shown for a shared memory multiprocessor.
We present a dynamic load balancing algorithm for parallel Discrete Event simulation of spatially explicit problems. In our simulations the space is discretized and divided into subareas each of which is simulated by ...
详细信息
We present a dynamic load balancing algorithm for parallel Discrete Event simulation of spatially explicit problems. In our simulations the space is discretized and divided into subareas each of which is simulated by a Logical Process (LP). Load predictions are done based on the future events that are scheduled for a given LP. The information about the load of the processes is gathered and distributed during the Global Virtual Time calculation. Each LP calculates the new load distribution of the system. The load is then balanced by moving spatial data between neighboring LPs in one round of communications. In our problems, the LPs should described as being elements of a ring from the point of view of communication. Due to the spatial characteristics, the load can be migrated only between neighboring LPs. We present an algorithm that performs the load balancing in a ring and minimizes the maximum after-balance load.
The IDES project at Sandia National Laboratories is developing a large scale portable parallel simulator for use in stockpile stewardship. IDES will use the Breathing-Time-Buckets synchronization protocol;to support I...
详细信息
The IDES project at Sandia National Laboratories is developing a large scale portable parallel simulator for use in stockpile stewardship. IDES will use the Breathing-Time-Buckets synchronization protocol;to support IDES development, this paper studies a performance model and describes performance experiments on expected workload and architectural parameters. A new parallel algorithm for terminating the window quickly is also described and analyzed.
We present an execution model for parallelsimulation of a distributed shared memory architecture. The model captures the processor-memory interaction and abstracts the memory subsystem. Using this model we show how p...
详细信息
We present an execution model for parallelsimulation of a distributed shared memory architecture. The model captures the processor-memory interaction and abstracts the memory subsystem. Using this model we show how parallel, on-line, partially-ordered memory traces can be correctly predicted without interacting with the memory subsystem. We also outline a parallel optimistic memory simulator that uses these traces, finds a global order among all events, and returns correct data and timing to each processor. A first evaluation of the amount of concurrency that our model can extract for an ideal multiprocessor shows that processors may execute relatively long instruction sequences without violating the causality constraints. However, parallelsimulation efficiency is highly dependent on the memory consistency model and the application characteristics.
In this work we illustrate the design and implementation guidelines of a recently developed middleware defined to support the parallel and distributedsimulation of large scale, complex and dynamically interacting sys...
详细信息
ISBN:
(纸本)0769524478
In this work we illustrate the design and implementation guidelines of a recently developed middleware defined to support the parallel and distributedsimulation of large scale, complex and dynamically interacting system models. The distributedsimulation of complex system models, may suffer the communication and synchronization required to maintain the causality constraints between distributed model components. We designed and implemented the ARTIS middleware as a new framework by incorporating a set of features that allow adaptive optimization by exploiting many complex and dynamic model and distributedsimulation characteristics. As an example, a dynamic migration mechanism for the run-time adaptive allocation of model entities has been designed and exploited for dynamic load and communication balancing. Optimizations have been introduced to obtain the maximum advantage from heterogeneous and asymmetric communication systems, from shared memory to LAN and Internet communication. Other optimizations have been introduced by the exploitation of concurrent replications of parallel and distributedsimulations, in order to increase the resources utilization and to maximize the speedup of simulation processes. Solutions have been designed, implemented and tuned to obtain a significant reduction in the communication and synchronization overheads between the physical execution units, and an increased model scalability and simulation speedup, even in worst-case modeling assumptions and simulation scenarios.
We present a classification that groups lookback into four types: direct strong lookback, universal strong lookback, direct weak lookback, and universal weak lookback. They are defined in terms of absolute and dynamic...
详细信息
ISBN:
(纸本)0769519709
We present a classification that groups lookback into four types: direct strong lookback, universal strong lookback, direct weak lookback, and universal weak lookback. They are defined in terms of absolute and dynamic impact times. We discuss relationships between lookback types by considering,when rollbacks and/or anti-messages are avoided From different types of lookback, we also derive three optimization techniques for optimistic simulation and point out their advantages over lazy cancellation. Finally, we show that all four types of lookback exist in the PCS network simulation and can be exploited by either lookback-based or optimistic protocols.
This paper extends our previous work on formalizing event orderings using partial order set and its application in space analysis in distributedsimulation. We focus on the time an space trade-off in exploiting event ...
详细信息
ISBN:
(纸本)0769518532
This paper extends our previous work on formalizing event orderings using partial order set and its application in space analysis in distributedsimulation. We focus on the time an space trade-off in exploiting event parallelism. Event parallelism is divided into inherent (problem) parallelism, event ordering parallelism and effective event parallelism. Firstly, we analyze the performance cost of varying event ordering parallelism on memory requirement in open and closed systems. Secondly, we study the effects of interconnection topology Of a physical system on exploitable event ordering parallelism. Measurements were obtained from a time-space analyzer that we have developed.
暂无评论