A number of optimistic synchronization schemes for parallelsimulation rely upon a global synchronization. The problem is to determine when every processor has completed all its work, and there are no messages in tran...
详细信息
ISBN:
(纸本)1565550552
A number of optimistic synchronization schemes for parallelsimulation rely upon a global synchronization. The problem is to determine when every processor has completed all its work, and there are no messages in transit in the system that will cause more work. Most previous solutions to the problem have used distributed termination algorithms, which are inherently serial;other parallel mechanisms may be inefficient. In this paper we describe an efficient parallel algorithm derived from a common `barrier' synchronization algorithm used in parallel processing. The algorithm's principle attraction is speed, and generality - it is designed to be used in contexts more general than parallel discrete-event simulation. To establish our claim to speed, we compare our algorithm's performance with the standard barrier algorithm, and find that its additional costs are not excessive. Our experiments are conducted using up to 256 processors on the Intel Touchstone Delta.
Several mathematical and algorithmic problems that have arisen in discrete event simulations of large systems are described. The simulated systems belong to the areas of computational physics, queueing networks, and e...
详细信息
ISBN:
(纸本)1565550552
Several mathematical and algorithmic problems that have arisen in discrete event simulations of large systems are described. The simulated systems belong to the areas of computational physics, queueing networks, and econometric models.
We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallelsimulation of continuous-time Markov chains. This paper reviews the basic method an...
详细信息
ISBN:
(纸本)1565550552
We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallelsimulation of continuous-time Markov chains. This paper reviews the basic method and compares four different methods based on uniformization, evaluating their strengths and weaknesses as a function of problem characteristics. The methods vary in their use of optimism, logical aggregation, communication management, and adaptivity. Performance evaluation is conducted on the Intel Touchstone Delta multiprocessor, using up to 256 processors.
The major goal of this work has been to develop an implementation of a parallel partitioning algorithm which is suitable for use in a conservatively synchronized parallel Discrete Event simulation (PDES) environment. ...
详细信息
ISBN:
(纸本)1565550552
The major goal of this work has been to develop an implementation of a parallel partitioning algorithm which is suitable for use in a conservatively synchronized parallel Discrete Event simulation (PDES) environment. Effective partitioning is essential for performance and capacity consideration, for any PDES problem. The performance of the partitioning algorithm is very important, to the overall simulation performance. There are two possible approaches to improve performance for the partitioning step: algorithm modifications;and parallelize the partitioning process. In this work, an efficient parallelized version of the iterative improvement based partitioning algorithm (Fiduccia and Mattheyses, 1982) is developed. The basic algorithm has been modified, first for parallel execution with a similar quality of final partition;and then further modified to increase the parallelism of the algorithm, at the expense of partition quality.
In this paper we consider the effect of using bus interconnection structures on the overheads present in conservative parallelsimulations of multicomputer programs. We use a modified version of the Poker Programming ...
详细信息
ISBN:
(纸本)1565550552
In this paper we consider the effect of using bus interconnection structures on the overheads present in conservative parallelsimulations of multicomputer programs. We use a modified version of the Poker Programming Environment to empirically measure the overhead in three parallel algorithms using buses. We discuss the sources of overhead and compare them with those found using point-to-point communication. Preliminary results indicate that the overheads encountered using a bus interconnection structure were not predicted by our previous results using point-to-point communications.
An approach for high performance parallel logic simulation on a local area network of workstation computers is discussed in this paper. The single, shared transmission medium often found in such networks places limita...
详细信息
ISBN:
(纸本)1565550552
An approach for high performance parallel logic simulation on a local area network of workstation computers is discussed in this paper. The single, shared transmission medium often found in such networks places limitations on parallel execution, hence a reduction in the frequency of synchronization is pursued by combining a circuit partitioning methodology with a specific synchronization constraint. A consequence of the partitioning methodology is replication of objects between blocks of a partition. A partitioning procedure based on iterative improvement is described for reducing replication while preserving load balance. Two interprocessor synchronization techniques for parallelsimulation are studied: conservative and optimistic synchronization. Experiments conducted on three large sequential circuits indicate that reasonable speedup is achievable for well-balanced partitions, and that optimistic synchronization provides a modest improvement in performance over conservative synchronization.
Time-parallelsimulations exploit parallelism by partitioning the time domain of a simulation model. Exploiting temporal parallelism requires predicting future states of a simulation model. A poor prediction of future...
详细信息
ISBN:
(纸本)1565550552
Time-parallelsimulations exploit parallelism by partitioning the time domain of a simulation model. Exploiting temporal parallelism requires predicting future states of a simulation model. A poor prediction of future states may cause extensive recomputation so that a time-parallelsimulation requires more real time to execute than a corresponding sequential simulation. Recurrent states of a simulation model provide a way to predict future states. In this paper, we propose a time-parallelsimulation method which uses a pre-simulation to identify recurrent states. For simulation models in which recurrent states do not exist or can not support sufficient time parallelism, an approximation technique is suggested to extend the class of simulation models to which time-parallelsimulation can be applied. Several queueing network models are investigated with the proposed time-parallelsimulation. Experimental results suggest that the proposed approach can exploit massive parallelism while yielding accurate results.
Global virtual time (GVT) is used in distributedsimulations to reclaim memory, commit output, detect termination, and handle errors. It is a global function that is computed many times during the course of a simulati...
详细信息
ISBN:
(纸本)1565550552
Global virtual time (GVT) is used in distributedsimulations to reclaim memory, commit output, detect termination, and handle errors. It is a global function that is computed many times during the course of a simulation. A small GVT latency (delay between its occurrence and detection) allows for more efficient use of resources. We present an algorithm which minimizes the latency, and we prove its correctness. The algorithm is unique in that a target virtual time (TVT) is predetermined by an initiator who then detects when GVT≥TVT. This approach eliminates the avalanche effect because the collection phase is spread out over time, and it allows for regular and timely GVT updates. The algorithm does not require messages to be acknowledged, which significantly reduces the message overhead of the simulation. One possible application is with interactive simulators, where regular and timely updates would produce output that is up to date and appears smooth.
Time Warp and Breathing Time Buckets are two general-purpose optimistic synchronization strategies for supporting parallel discrete-event simulations. However, each one of these approaches has potential fatal shortcom...
详细信息
ISBN:
(纸本)1565550552
Time Warp and Breathing Time Buckets are two general-purpose optimistic synchronization strategies for supporting parallel discrete-event simulations. However, each one of these approaches has potential fatal shortcomings. Time Warp may exhibit rollback explosions that can cause an avalanche of antimessages. Breathing Time Buckets, on the other hand, may not be able to process enough events per synchronization cycle to remain efficient. A new strategy, called Breathing Time Warp, has been developed in the Synchronous parallel Environment for Emulation and Discrete-Event simulation (SPEEDES) operating system. This new strategy solves both of these problems by mixing the two algorithms together, resulting in the best of both methods. This paper describes the implementation of the Breathing Time Warp algorithm in SPEEDES, and then shows how this new approach sometimes improves the performance of parallel discrete-event simulations.
A hardware-based framework which supports a wide range of parallel discrete event synchronization protocols has been proposed in [Reyn92]. This framework offloads all synchronization activity from the host processors ...
详细信息
ISBN:
(纸本)1565550552
A hardware-based framework which supports a wide range of parallel discrete event synchronization protocols has been proposed in [Reyn92]. This framework offloads all synchronization activity from the host processors and host communication network in the system. The underlying hardware computes results of global, binary associative operations, or global reductions. In this paper we present results of simulations that strongly suggest the need for a next-generation reduction network which computes and disseminates results of target-specific reductions to support both aggressive and non-aggressive parallel discrete event simulations. Target-specific reductions allow a logical process to receive synchronization information only from those logical processes which may have a direct or indirect impact on its performance.
暂无评论