A simulation-oriented language can significantly enhance the usability of parallel Discrete Event simulation (PDES) by hiding the complexities of the synchronization protocol used to ensure that events are processed i...
详细信息
A simulation-oriented language can significantly enhance the usability of parallel Discrete Event simulation (PDES) by hiding the complexities of the synchronization protocol used to ensure that events are processed in the correct order. The higher-level interface presented to the user by such a language also allows optimizations to be performed that are difficult and cumbersome with current parallel simulators, such as granularity control. APOSTLE is a new high-level simulation-oriented language for PDES, and in this paper we report that the APOSTLE granularity control mechanism reduced simulation run-times by as much as 80%. We also report that APOSTLE achieved a parallel speed-up of around 9 on 16 processors relative to its optimized sequential implementation and a parallel speed-up of around 6 on 16 processors relative to MODSIM II. Overall, we believe that the widespread success of PDES can only be achieved using a simulation-oriented language, and that APOSTLE has made a significant contribution towards this goal.
The proceedings contain 70 papers. The topics discussed include: programming shared virtual memory multiprocessors;parallelsimulation of a multi-dimensional computational fluid dynamics problem;computing the singular...
ISBN:
(纸本)0818673761
The proceedings contain 70 papers. The topics discussed include: programming shared virtual memory multiprocessors;parallelsimulation of a multi-dimensional computational fluid dynamics problem;computing the singular values of the product of two matrices in distributed memory multiprocessors;a latency-hiding MIMD wavelet transform;simulation of chaotic iterative processes in speed-independent computing networks;sparse householder QR factorization on a mesh;and the role of associative memory in virtual shared memory architectures: a price-performance comparison.
In this paper we study message flow processes in distributed simulators of open queueing networks. We develop and study queueing models for distributed simulators with maximum lookahead sequencing. We characterize the...
详细信息
In this paper we study message flow processes in distributed simulators of open queueing networks. We develop and study queueing models for distributed simulators with maximum lookahead sequencing. We characterize the 'external' arrival process, and the message feedback process in the simulator of a simple queueing network with feedback. We show that a certain 'natural' modelling construct for the arrival process is exactly correct, whereas an 'obvious' model for the feedback process is wrong;we then show how to develop the correct model. Our analysis throws light on the stability of distributed simulators of queueing networks with feedback. We show how the stability of such simulators depends on the parameters of the queueing network.
One of the methods used to reduce the time spent simulating VHDL designs is by parallelizing the simulation. In this paper, we describe the implementation of an object-oriented Time Warp simulator for VHDL on an actor...
详细信息
One of the methods used to reduce the time spent simulating VHDL designs is by parallelizing the simulation. In this paper, we describe the implementation of an object-oriented Time Warp simulator for VHDL on an actor based environment. The actor model of computation allows the exploitation of the grained parallelism in a truly asynchronous manner and allows for the overlap of computation with communication. Some preliminary results obtained by simulating a set of multipliers and some ISCAS benchmark circuits are provided. In addition, the importance of placing processes based on circuit partitioning techniques for improving runtimes and scalability is demonstrated. Results are reported on a Sun SPARCServer 1000 and an Intel Paragon.
The partitioning of complex processor models on the gate and register-transfer level for parallel functional simulation based on the clock-cycle algorithm is considered. We introduce a hierarchical partitioning scheme...
详细信息
The partitioning of complex processor models on the gate and register-transfer level for parallel functional simulation based on the clock-cycle algorithm is considered. We introduce a hierarchical partitioning scheme combining various partitioning algorithms in the frame of a competing strategy. Melting together different partitioning results within one level using superpositions we crossover to a mixture of experts one. This approach is improved applying genetic algorithms. In addition we present two new partitioning algorithms both of them taking cones as fundamental units for building partitions.
ECATNets (Extended Concurrent Algebraic Term Nets) are a kind of high-level algebraic net used for specifying various aspects of distributed and parallel systems. We address the problem of developing parallel simulati...
详细信息
Based on a linear ordering of vertices in a directed graph, a linear-time partitioning algorithm for parallel logic simulation is presented. Unlike most other partitioning algorithms, the proposed algorithm preserves ...
详细信息
Based on a linear ordering of vertices in a directed graph, a linear-time partitioning algorithm for parallel logic simulation is presented. Unlike most other partitioning algorithms, the proposed algorithm preserves circuit concurrency by assigning to processors circuit gates that can be evaluated at about the same time. As a result, the concurrency preserving partitioning (CPP) algorithm can provide better load balancing throughout the period of a parallelsimulation. This is especially important when the algorithm is used together with a Time Warp simulation where a high degree of concurrency can lead to fewer rollbacks and better performance. The algorithm consists of three phases, and three conflicting goals can be separately considered in each phase so to reduce computational complexity. A parallel gate-level circuit simulator is implemented on an Intel Paragon machine to evaluate the performance of the CPP algorithm. The results are compared with two other partitioning algorithms to show that reasonable speedup may be achieved with the algorithm.
Presented is a dynamic load balancing algorithm developed for Clustered Time Warp, a hybrid approach which makes use of Time Warp between clusters of LPs and a sequential mechanism within the clusters. The load balanc...
详细信息
Presented is a dynamic load balancing algorithm developed for Clustered Time Warp, a hybrid approach which makes use of Time Warp between clusters of LPs and a sequential mechanism within the clusters. The load balancing algorithm focuses on distributing the load of the simulation evenly among the processors and then tries to reduce interprocessor communications. A triggering technique is used that is based on the throughput of the simulation system. The algorithm was implemented and its performance was measured using two of the largest benchmark digital circuits of the ISCAS '89 series. Results show that by dynamically balancing the load, the throughput was improved by 40-100% when compared to Time Warp.
We present an execution model for parallelsimulation of a distributed shared memory architecture. The model captures the processor-memory interaction and abstracts the memory subsystem. Using this model we show how p...
详细信息
We present an execution model for parallelsimulation of a distributed shared memory architecture. The model captures the processor-memory interaction and abstracts the memory subsystem. Using this model we show how parallel, on-line, partially-ordered memory traces can be correctly predicted without interacting with the memory subsystem. We also outline a parallel optimistic memory simulator that uses these traces, finds a global order among all events, and returns correct data and timing to each processor. A first evaluation of the amount of concurrency that our model can extract for an ideal multiprocessor shows that processors may execute relatively long instruction sequences without violating the causality constraints. However, parallelsimulation efficiency is highly dependent on the memory consistency model and the application characteristics.
Discrete-event simulation is an important tool used for the performance evaluation of parallel systems. The space of tradeoffs is large however, when attempting to balance model fidelity and simulation execution time....
详细信息
Discrete-event simulation is an important tool used for the performance evaluation of parallel systems. The space of tradeoffs is large however, when attempting to balance model fidelity and simulation execution time. This paper describes a simulator - TAPS (Threaded Application parallel System Simulator) - that, in the context of threaded parallel computations, provides a spectrum of possibilities in this tradeoff space. TAPS is specifically designed to be parallelized;we discuss some crucial considerations regarding its parallelization.
暂无评论