We took the design of an existing circuit simulator, SPICE, considered various parallelization techniques, and selected a relaxation-based algorithm, Iterated Timing Analysis, for further study. In this report, the im...
详细信息
ISBN:
(纸本)0818649909
We took the design of an existing circuit simulator, SPICE, considered various parallelization techniques, and selected a relaxation-based algorithm, Iterated Timing Analysis, for further study. In this report, the implementation of this algorithm on the CM-5 is described. simulation is performed by a collection of subcircuits communicating with one another by sending asynchronous event messages. Initial studies on two platforms, Split-C, a parallel extension of C, and the CMMD message passing library, indicate that this approach, with appropriate partitioning and scheduling taking granularity into account, has a great deal of potential for reducing the cost of circuit simulation. We also tried to exploit parallelism by scheduling the events optimistically. Trace driven analysis shows that the optimistic simulation method exploits more parallelism then the conservative methods for circuits with feedback signal paths.
Device model evaluation, an essential part of a circuit simulator, is a compute-intensive task. A multiprocessor-based circuit simulator that ignores the parallelization of model equation formulation (LOAD), and just ...
详细信息
Device model evaluation, an essential part of a circuit simulator, is a compute-intensive task. A multiprocessor-based circuit simulator that ignores the parallelization of model equation formulation (LOAD), and just parallelizes the solution (SOLVE) of the equations will seriously degrade the simulation performance. this paper describes methods of parallelizing the LOAD part of a circuit simulator on PACE (parallel Architecture for Circuit Evaluation) a distributed memory multiprocessor designed at AT&T Bell Laboratories. this is integrated withthe parallel SOLVE algorithms given in our earlier work. Load balancing and minimization of interprocessor communication are used as the primary objectives of the parallel LOAD heuristics studied. Performance results, using the prototype PACE system, on benchmark circuits show the feasibility of our approach.< >
We took the design of on existing circuit simulator, SPICE, considered various parallelization techniques, and selected a relaxation-based algorithm, Iterated Timing Analysis, for further study. In this report, the im...
详细信息
We took the design of on existing circuit simulator, SPICE, considered various parallelization techniques, and selected a relaxation-based algorithm, Iterated Timing Analysis, for further study. In this report, the implementation of this algorithm on the CM-5 is described. simulation is performed by a collection of subcircuits communicating with one another by sending asynchronous event messages. Initial studies on two platforms, Split-C, a parallel extension of C, and the CMMD message passing library, indicate that this approach, with appropriate partitioning and scheduling taking granularity into account, has a great deal of potential for reducing the cost of circuit simulation. We also tried to exploit parallelism by scheduling the events optimistically. Trace driven analysis shows that the optimistic simulation method exploits more parallelism than the conservative methods for circuits with feedback signal paths.< >
Although users may want to employ shared variables when they program distributedsimulation applications, almost none of the currently existing distributedsimulation systems do offer this facility. In this paper, we ...
详细信息
ISBN:
(纸本)1565550552
Although users may want to employ shared variables when they program distributedsimulation applications, almost none of the currently existing distributedsimulation systems do offer this facility. In this paper, we present new algorithms which provide the illusion of consistent shared variables in distributedsimulation systems without physically shared memmory.
We describe, in this paper, a synchronization/deadlock resolution mechanism for a network of communicating finite state machines implemented on a parallel machine. As it is message-based, it is appropriate for distrib...
详细信息
ISBN:
(纸本)1565550552
We describe, in this paper, a synchronization/deadlock resolution mechanism for a network of communicating finite state machines implemented on a parallel machine. As it is message-based, it is appropriate for distributed memory *** technique was inspired by a project at the Jet Propulsion laboratories whose goal is the specification and verification of the software used to control the interplanetary spacecraft operated by the *** network of communicating finite state machines makes use of write messages to alter the value of the variables describing the finite state machines and read messages to determine the state of the variables. Since a blocking protocol is employed, it is possible for deadlocks to occur. Consequently, we describe deadlock resolution *** algorithms were implemented on an iPSC/2 hypercube, demonstrating good performance on a queueing network model.
Several mathematical and algorithmic problems that have arisen in discrete event simulations of large systems are described. the simulated systems belong to the areas of computational physics, queueing networks, and e...
详细信息
ISBN:
(纸本)1565550552
Several mathematical and algorithmic problems that have arisen in discrete event simulations of large systems are described. the simulated systems belong to the areas of computational physics, queueing networks, and econometric models.
Optimistic methods of synchronizing parallel discrete event simulations can be risky by sending (positive) messages (events) before they have been committed. Risky methods often use anti-messages (negative messages) t...
详细信息
ISBN:
(纸本)1565550552
Optimistic methods of synchronizing parallel discrete event simulations can be risky by sending (positive) messages (events) before they have been committed. Risky methods often use anti-messages (negative messages) to cancel incorrectly sent positive messages. Riskfree methods are more conservative, they do not send messages until they are known to be correct. the Time Warp Operating System (TWOS) uses anti-messages. Riskfree TWOS is implemented and tested on the standard TWOS benchmarks. Performance of the riskfree TWOS is dependent on the amount of lookahead in the simulation. Good lookahead was required for even reasonable performance. Tracker, a simulation of the riskfree simulation, is used to give idealized best case riskfree performance. BeRisky is an example simulation which has a speedup of n for Time Warp, but only a speedup of 2 for riskfree methods.
Recent experiments have shown that conservative methods can achieve good performance by exploiting the characteristics of the system being simulated. In this paper we focus on the interrelationship between run time an...
详细信息
ISBN:
(纸本)1565550552
Recent experiments have shown that conservative methods can achieve good performance by exploiting the characteristics of the system being simulated. In this paper we focus on the interrelationship between run time and synchronization requirements of a distributedsimulation. A metric that considers the effect of lookahead and the physical rate of transmission of messages, and an arrival approximation that models the effect of synchronization requirements on the run time are developed. It is shown that even when good lookahead is exploited in the system, poor run-time performance is achieved if an inefficient mapping of LPs to processors is used.
Time Warp has evolved to a common technique for distributedsimulation. Speedup in Time Warp simulation systems mainly depends on two overhead factors: first, the load on the simulators has to be well balanced and sec...
详细信息
ISBN:
(纸本)1565550552
Time Warp has evolved to a common technique for distributedsimulation. Speedup in Time Warp simulation systems mainly depends on two overhead factors: first, the load on the simulators has to be well balanced and second, communication and rollbacks have to be kept to a minimum. Both of these factors are influenced by the partitioning of the simulated system. In this paper, we focus on various static partitioning schemes used to partition digital circuits for distributedsimulation. A new hierarchical partitioning approach is presented, compared and rated with other partitioning schemes by evaluating benchmark circuits. Partitioning is done in two steps: a fine grained clustering step based on corollas and a coarse grained step forming partitions using the connectivity matrix. the corolla approach yields very good partitioning results even for a large number of partitions. the achieved speedups are almost linear (up to 12 partitions for larger circuits), as long as the partition sizes are large enough so that communication between the simulators is not a bottleneck. the results reveal the great impact of partitioning on the acceleration of distributed logic simulation and show the effectiveness of the presented corolla partitioning scheme.
Synchronization is a significant cost in many parallel programs, and can be a major bottleneck if it is handled in a centralized fashion using traditional shared-memory constructs such as barriers. In a parallel time-...
详细信息
ISBN:
(纸本)1565550552
Synchronization is a significant cost in many parallel programs, and can be a major bottleneck if it is handled in a centralized fashion using traditional shared-memory constructs such as barriers. In a parallel time-stepped simulation, the use of global synchronization primitives limits scalability, increases the sensitivity to load imbalance, and reduces the potential for exploiting locality to improve cache behavior. this paper presents the results of an initial one-application study quantifying the costs and performance benefits of distributed, nearest neighbors synchronization. the application studied, MP3D, is a particle-based wind tunnel simulation. Our results for this one application on current shared-memory multiprocessors show a significant decrease in synchronization time using these techniques. We prototyped an application-independent library that implements distributed synchronization. the library allows a variety of parallelsimulations to exploit these techniques without increasing the application programming beyond that of conventional approaches.
暂无评论