In this paper, we consider the problem of partitioning a conservative parallelsimulation for execution on a multi-computer. The synchronization protocol makes use of null messages [6]. We propose the use of a simulat...
ISBN:
(纸本)9781565550278
In this paper, we consider the problem of partitioning a conservative parallelsimulation for execution on a multi-computer. The synchronization protocol makes use of null messages [6]. We propose the use of a simulated annealing algorithm with an adaptive search schedule to find good (sub-optimal) partitions. The paper discusses the algorithm, its implementation and reports on the performance results of simulations of a partitioned FCFS queueing network model executed on iPSC/860 hypercube. The results obtained are compared with a random partitioning. They show that a partitioning which makes use of our simulated annealing algorithm results in a reduction of 25-35% of the running time of the simulations when compared to the running time of a random partition of the model.
Large concurrent or distributed systems have proven notoriously difficult to comprehend due to their combinatoric complexity. There is a clear need for thorough formal system modeling at all design stages. Ease of use...
详细信息
Large concurrent or distributed systems have proven notoriously difficult to comprehend due to their combinatoric complexity. There is a clear need for thorough formal system modeling at all design stages. Ease of use and validation capability are important criteria in method selection. It is important that the selection permit evaluation of design details with respect to performance and behavioral goals. Net models are a likely candidate to achieve the requisite needs of such a modeling environment. They provide both graphical and mathematical system modeling views which inherently enable both quantitative and qualitative analysis of the design. To demonstrate this claim, a colored Petri Net model and performance analysis of a particular scalable multiprocessor interconnect fabric is presented. The fundamental component of this fabric is an adaptive routing device, R2. The R2 fabric is based on the interconnection scheme used in the Mayfly parallel processing system.< >
This conference proceedings contains 22 papers on advances in the design and analysis of algorithms for parallel and distributedsimulation. Topics discussed include selecting the checkpoint interval in time warp para...
详细信息
ISBN:
(纸本)1565550552
This conference proceedings contains 22 papers on advances in the design and analysis of algorithms for parallel and distributedsimulation. Topics discussed include selecting the checkpoint interval in time warp parallelsimulation, parallel algorithms for simulating continuous time Markov chains, determining initial states for time-parallelsimulations, global synchronization for optimistic parallel discrete event simulation, an algorithm for minimally latent global virtual time, a parallel partitioning technique for use with conservative parallelsimulation, disseminating critical synchronization information in parallel discrete event simulations, shared variables in distributedsimulation, high performance parallel logic simulation on a network of workstations, corolla partitioning for distributed logic simulation of VLSI circuits, efficient implementation of event sets in time warp, an analytical comparison of periodic checkpointing and incremental state saving, parallelsimulation of communicating finite state machines, the effect of synchronization requirements on the performance of distributedsimulations, and time warp simulation in time-constrained systems.
Synchronization is a significant cost in many parallel programs, and can be a major bottleneck if it is handled in a centralized fashion using traditional shared-memory constructs such as barriers. In a parallel time-...
详细信息
ISBN:
(纸本)1565550552
Synchronization is a significant cost in many parallel programs, and can be a major bottleneck if it is handled in a centralized fashion using traditional shared-memory constructs such as barriers. In a parallel time-stepped simulation, the use of global synchronization primitives limits scalability, increases the sensitivity to load imbalance, and reduces the potential for exploiting locality to improve cache behavior. This paper presents the results of an initial one-application study quantifying the costs and performance benefits of distributed, nearest neighbors synchronization. The application studied, MP3D, is a particle-based wind tunnel simulation. Our results for this one application on current shared-memory multiprocessors show a significant decrease in synchronization time using these techniques. We prototyped an application-independent library that implements distributed synchronization. The library allows a variety of parallelsimulations to exploit these techniques without increasing the application programming beyond that of conventional approaches.
Recent experiments have shown that conservative methods can achieve good performance by exploiting the characteristics of the system being simulated. In this paper we focus on the interrelationship between run time an...
详细信息
ISBN:
(纸本)1565550552
Recent experiments have shown that conservative methods can achieve good performance by exploiting the characteristics of the system being simulated. In this paper we focus on the interrelationship between run time and synchronization requirements of a distributedsimulation. A metric that considers the effect of lookahead and the physical rate of transmission of messages, and an arrival approximation that models the effect of synchronization requirements on the run time are developed. It is shown that even when good lookahead is exploited in the system, poor run-time performance is achieved if an inefficient mapping of LPs to processors is used.
A number of optimistic synchronization schemes for parallelsimulation rely upon a global synchronization. The problem is to determine when every processor has completed all its work, and there are no messages in tran...
详细信息
ISBN:
(纸本)1565550552
A number of optimistic synchronization schemes for parallelsimulation rely upon a global synchronization. The problem is to determine when every processor has completed all its work, and there are no messages in transit in the system that will cause more work. Most previous solutions to the problem have used distributed termination algorithms, which are inherently serial;other parallel mechanisms may be inefficient. In this paper we describe an efficient parallel algorithm derived from a common `barrier' synchronization algorithm used in parallel processing. The algorithm's principle attraction is speed, and generality - it is designed to be used in contexts more general than parallel discrete-event simulation. To establish our claim to speed, we compare our algorithm's performance with the standard barrier algorithm, and find that its additional costs are not excessive. Our experiments are conducted using up to 256 processors on the Intel Touchstone Delta.
Several mathematical and algorithmic problems that have arisen in discrete event simulations of large systems are described. The simulated systems belong to the areas of computational physics, queueing networks, and e...
详细信息
ISBN:
(纸本)1565550552
Several mathematical and algorithmic problems that have arisen in discrete event simulations of large systems are described. The simulated systems belong to the areas of computational physics, queueing networks, and econometric models.
We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallelsimulation of continuous-time Markov chains. This paper reviews the basic method an...
详细信息
ISBN:
(纸本)1565550552
We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallelsimulation of continuous-time Markov chains. This paper reviews the basic method and compares four different methods based on uniformization, evaluating their strengths and weaknesses as a function of problem characteristics. The methods vary in their use of optimism, logical aggregation, communication management, and adaptivity. Performance evaluation is conducted on the Intel Touchstone Delta multiprocessor, using up to 256 processors.
The major goal of this work has been to develop an implementation of a parallel partitioning algorithm which is suitable for use in a conservatively synchronized parallel Discrete Event simulation (PDES) environment. ...
详细信息
ISBN:
(纸本)1565550552
The major goal of this work has been to develop an implementation of a parallel partitioning algorithm which is suitable for use in a conservatively synchronized parallel Discrete Event simulation (PDES) environment. Effective partitioning is essential for performance and capacity consideration, for any PDES problem. The performance of the partitioning algorithm is very important, to the overall simulation performance. There are two possible approaches to improve performance for the partitioning step: algorithm modifications;and parallelize the partitioning process. In this work, an efficient parallelized version of the iterative improvement based partitioning algorithm (Fiduccia and Mattheyses, 1982) is developed. The basic algorithm has been modified, first for parallel execution with a similar quality of final partition;and then further modified to increase the parallelism of the algorithm, at the expense of partition quality.
In this paper we consider the effect of using bus interconnection structures on the overheads present in conservative parallelsimulations of multicomputer programs. We use a modified version of the Poker Programming ...
详细信息
ISBN:
(纸本)1565550552
In this paper we consider the effect of using bus interconnection structures on the overheads present in conservative parallelsimulations of multicomputer programs. We use a modified version of the Poker Programming Environment to empirically measure the overhead in three parallel algorithms using buses. We discuss the sources of overhead and compare them with those found using point-to-point communication. Preliminary results indicate that the overheads encountered using a bus interconnection structure were not predicted by our previous results using point-to-point communications.
暂无评论