The proceedings contains 76 articles. Topics discussed include systems, networking, distributedsimulation, queueing systems, multiprocessor architecture, modeling techniques, parallel systems. Tools, processors, network and system simulation, optimizing parallel programs, petri nets, neural networks and genetic algorithms, real time systems and systems modelling.
We simulate ballistic particle deposition wherein a large number of spherical particles are 'dropped' vertically over a planar horizontal surface. Upon first contact (with the surface or with a previously depo...
详细信息
We simulate ballistic particle deposition wherein a large number of spherical particles are 'dropped' vertically over a planar horizontal surface. Upon first contact (with the surface or with a previously deposited particle) each particle stops. This model helps material scientists to study the adsorption and sediment formation [1]. The model is sequential, with particles deposited one by one. We have found an equivalent formulation using a continuous time random process and we simulate the latter in parallel using a method similar to the one previously employed for simulating Ising spins [2]. We augment the parallel algorithm for simulating Ising spins with several techniques aimed at the increase of efficiency of producing the particle configuration and statistics collection. Some of these techniques are similar to [3], [4], and [5]. We implement the resulting algorithm on a 16K PE MasPar MP-1 and a 4K PE MasPar MP-2. The parallel code runs on MasPar computers two orders of magnitude faster than an optimized sequential code runs on a fast workstation.
This paper examines the cost/performance of simulating a hypothetical target parallel computer using a commercial host parallel computer. We address the question of whether parallelsimulation is simply faster than se...
详细信息
ISBN:
(纸本)9781565550278
This paper examines the cost/performance of simulating a hypothetical target parallel computer using a commercial host parallel computer. We address the question of whether parallelsimulation is simply faster than sequential simulation, or if it is also more cost-effective. To answer this, we develop a performance model of the Wisconsin Wind Tunnel (WWT), a system that simulates cache-coherent shared-memory machines on a message-passing Thinking Machines CM-5. The performance model uses Kruskal and Weiss's fork-join model to account for the effect of event processing time variability on WWT's conservative fixed-window simulation algorithm. A generalization of Thiebaut and Stone's footprint model accurately predicts the effect of cache interference on the CM-5. The model is calibrated using parameters extracted from a fully-parallelsimulation (p=N), and validated by measuring the speedup as the number of processors (p) ranges from one to the number of target nodes (N. Together with simple cost models, the performance model indicates that for target system sizes of 32 nodes and larger, parallelsimulation is more cost-effective than sequential simulation. The key intuition behind this result is that large simulations require large memories, which dominate the cost of a uniprocessor; parallel computers allow multiple processors to simultaneously access this large memory.
Compared to highly optimized optimistic simulators which use local event queues for individual processors on a shared-memory computer, we demonstrate that employing a single global event queue drastically reduces the ...
详细信息
Compared to highly optimized optimistic simulators which use local event queues for individual processors on a shared-memory computer, we demonstrate that employing a single global event queue drastically reduces the number of rollbacks, brings down the storage requirements, and achieves superior load balance. On a bus-based Silicon Graphics multiprocessor, these virtues consistently translated into faster execution times and higher speedups on those synthetic networks of medium- to coarse-grained logical processes which were ridden with rollbacks and load imbalance on local-queue-based simulators. A dynamic randomization-based load distribution scheme for local-event-queue simulators is also shown to be an effective improvement.
Optimistic computation methods typically save copies of objects' state information, so that they can recover from erroneous 'over-optimistic' computations. Such state saving is generally time and space con...
详细信息
Optimistic computation methods typically save copies of objects' state information, so that they can recover from erroneous 'over-optimistic' computations. Such state saving is generally time and space consuming, and can be rather complicated both to implement and to use. I show how the data structure community's theory of persistence can be used not only to analyse and explain the treatment of state in optimistic systems, but also as a simple yet general mechanism for performing the necessary state saving with minimal impact on application code. Preliminary results based on a benchmark application and an existing optimistic simulator are presented, showing that providing support for fully general object states is a realistic and practical option. In addition, I show how some existing state saving techniques - including support for shared state - can be derived, and discuss a number of ways in which the model might be extended.
This paper describes an extension of the TNE algorithm, the objective of which is to increase its parallelism and to break the inter-processor deadlocks inherent with the use of TNE. The algorithm, which we call the S...
详细信息
This paper describes an extension of the TNE algorithm, the objective of which is to increase its parallelism and to break the inter-processor deadlocks inherent with the use of TNE. The algorithm, which we call the SGTNE algorithm (Semi Global TNE), is executed over a cluster of processors as opposed to TNE, which is executed over a cluster of processes assigned to a single processor. SGTNE helps to break the inter-processor deadlocks by executing a shortest path algorithm over a snapshot of the LPs in a cluster of processors. This paper discusses the algorithm and its implementation and reports on the performance results of simulations of a partitioned FCFS queueing network model executed on the Intel Paragon A4 multiprocessor machine. We also examine the impact of partitioning on the efficient implementation of the SGTNE algorithm. The results obtained indicate that SGTNE yields good speedups and that a partitioning which makes use of a strongly connected component algorithm results in a reduction of 30% in the running time of a simulation when compared to simple partitioning strategies. The results also indicate that SGTNE outperforms TNE.
An implementation of a conservative parallel simulator with deadlock avoidance is presented. Its performance when working with a realistic model of a message routing network is evaluated and contrasted against a seque...
详细信息
The design of a specialized computer architecture for qualitative simulation is presented. Our interest focuses on the hardware design of an application-specific computer architecture which is composed of programmable...
详细信息
In this article, we consider the iterated Runge-Kutta (IRK) method which is an iteration method based on a predictor-corrector scheme for the solution of ordinary differential equations. The method uses embedded formu...
详细信息
In the past decade, the use of distributed algorithms to model simulations is considerably increased, in order to gain speedup over traditional sequential simulations. Also, there has been much interest in using inexp...
暂无评论