The implementation of a distributed digital logic simulation algorithm on a network of workstations is presented. The simulation of digital circuits is done using a demand driven approach. The simulation is performed ...
详细信息
Performance of Time Warp simulation systems are often measured on exclusively available parallel computing resources. In distributed systems exclusive use is normally not feasible. Instead, due to the multi-tasking op...
详细信息
Performance of Time Warp simulation systems are often measured on exclusively available parallel computing resources. In distributed systems exclusive use is normally not feasible. Instead, due to the multi-tasking operating systems, many users share the workstations and their availability for parallelsimulation purposes varies extensively. Time Warp has been found to be very sensitive to variations in available processing power. This paper presents two methods for a Time Warp VLSI simulation system to reduce the negative effect of a non-ideal environment on the execution of parallelsimulations. A dynamic load balancing algorithm which adapts to the change of available processing power is presented. This mechanism, together with a multi-cluster partitioning technique significantly improves the performance of Time Warp based simulation systems on heterogeneous computing resources.
The partitioning of systems for parallelsimulation is a complex task, requiring consideration of both computational load requirements and communications activity. Typically, this information is not accurately known p...
详细信息
The partitioning of systems for parallelsimulation is a complex task, requiring consideration of both computational load requirements and communications activity. Typically, this information is not accurately known prior to execution. This paper investigates the use of historical information for the prediction of future requirements, both for computation and communications. In addition, for optimistic simulation algorithms, we present a novel technique (which we call predictive optimism) whereby binary prediction schemes can be used to increase the accuracy of optimistic assumptions, thereby decreasing rollbacks and potentially improving overall simulator performance.
This paper describes an approach to discrete event simulation modeling that appears to be effective for developing portable and efficient parallel execution of models of large distributed systems and communication net...
详细信息
This paper describes an approach to discrete event simulation modeling that appears to be effective for developing portable and efficient parallel execution of models of large distributed systems and communication networks. In this approach, the modeler develops sub-models using an existing sequential simulation modeling tool, using the full expressive power of the tool. A set of modeling language extensions permit automatically synchronized communication between sub-models;however, the automation requires that any such communication must take a non-zero amount of simulation time. Within this modeling paradigm, a variety of conservative synchronization protocols can transparently support conservative execution of sub-models on potentially different processors. A specific implementation of this approach, U.P.S. (Utilitarian parallel Simulator), is described, along with performance results on the Intel Paragon.
This paper reports on the performance of four parallel algorithms for simulating an associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are implemented on the Ma...
详细信息
This paper reports on the performance of four parallel algorithms for simulating an associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are implemented on the MasPar MP-2. Another algorithm is a parallelization of an efficient serial algorithm on the Intel Paragon. We assess the strengths and weaknesses of these algorithms as a function of problem size and characteristics, and compare their performance on traces derived from execution of three SPEC92 benchmark programs.
This paper presents an approach for speculative parallel execution of rendezvous-synchronized simulations. Rendezvous-synchronized simulation is based on the notions of processes and gates and on the rendezvous mechan...
详细信息
This paper presents an approach for speculative parallel execution of rendezvous-synchronized simulations. Rendezvous-synchronized simulation is based on the notions of processes and gates and on the rendezvous mechanism defined in the basic process algebra of Lotos - a standard formal specification language for temporal ordering[2]. Time is introduced via a mechanism similar to the delay behaviour annotation provided by the Topo toolset[4-6]. The algorithm allows speculative gate activations. This increases the available parallelism while ensuring correct execution of the computation. The model is used to describe closed stochastic queueing network simulations. Analysis of their execution results suggests that the model makes available a promising degree of parallelism.
Distributing computation among multiple processors is one approach to reducing simulation time for large VLSI circuit designs. However, parallelsimulation introduces the problem of how to partition the logic gates an...
详细信息
Distributing computation among multiple processors is one approach to reducing simulation time for large VLSI circuit designs. However, parallelsimulation introduces the problem of how to partition the logic gates and system behaviors of the circuit among the available processors in order to obtain maximum speedup. A complicating factor that is often ignored is the effect of the time-synchronization protocol (conservative [1] or optimistic [2]). Inherent in the partitioning problem is the question of how to effectively measure the relative quality of a partition. This paper describes an objective cost function for measuring the relative quality of a task partition that includes a synchronization factor for a conservative NULL-message protocol. A graph-based partitioning tool based on this cost function is used to perform the static task allocation for parallelsimulation of a structural VHDL circuit. Results for two 1000 - 4000 gate circuits demonstrate that the additional consideration of the synchronization protocol in the cost function generates partitions that exhibit improved speedup.
This paper analyzes three previous techniques for dynamically sizing checkpoint intervals and presents a new, heuristic algorithm for this purpose. All four techniques are implemented in a common application domain an...
详细信息
This paper analyzes three previous techniques for dynamically sizing checkpoint intervals and presents a new, heuristic algorithm for this purpose. All four techniques are implemented in a common application domain and a direct comparison between the algorithms is performed. The results show a significant difference in the performance of the implemented algorithms. However, in virtually all cases, the dynamic algorithms performed near or better that the best static value. Moreover, the best algorithms performed as much as 12% better the best static value.
暂无评论