It is well known that the critical path provides an absolute lower bound on the execution time of a conservative parallel discrete event simulation. It stands to reason that optimal execution time can only be achieved...
详细信息
It is well known that the critical path provides an absolute lower bound on the execution time of a conservative parallel discrete event simulation. It stands to reason that optimal execution time can only be achieved by immediately executing each event on the critical path. However, dynamically identifying the critical event is difficult, if not impossible. In this paper, we examine several heuristics that might help to determine the critical event, and conduct a performance study to determine the effectiveness of using these heuristics for preferential scheduling.
Distributing computation among multiple processors is one approach to reducing simulation time for large VLSI circuit designs. However, parallelsimulation introduces the problem of how to partition the logic gates an...
详细信息
Distributing computation among multiple processors is one approach to reducing simulation time for large VLSI circuit designs. However, parallelsimulation introduces the problem of how to partition the logic gates and system behaviors of the circuit among the available processors in order to obtain maximum speedup. A complicating factor that is often ignored is the effect of the time-synchronization protocol (conservative [1] or optimistic [2]). Inherent in the partitioning problem is the question of how to effectively measure the relative quality of a partition. This paper describes an objective cost function for measuring the relative quality of a task partition that includes a synchronization factor for a conservative NULL-message protocol. A graph-based partitioning tool based on this cost function is used to perform the static task allocation for parallelsimulation of a structural VHDL circuit. Results for two 1000 - 4000 gate circuits demonstrate that the additional consideration of the synchronization protocol in the cost function generates partitions that exhibit improved speedup.
The spatial scale, runtime speed and behavioral detail of epidemic outbreak simulations together require the use of large-scale parallel processing. In this paper, an optimistic parallel discrete event execution of a ...
详细信息
A number of optimistic synchronization schemes for parallelsimulation rely upon a global synchronization. The problem is to determine when every processor has completed all its work, and there are no messages in tran...
详细信息
ISBN:
(纸本)1565550552
A number of optimistic synchronization schemes for parallelsimulation rely upon a global synchronization. The problem is to determine when every processor has completed all its work, and there are no messages in transit in the system that will cause more work. Most previous solutions to the problem have used distributed termination algorithms, which are inherently serial;other parallel mechanisms may be inefficient. In this paper we describe an efficient parallel algorithm derived from a common `barrier' synchronization algorithm used in parallel processing. The algorithm's principle attraction is speed, and generality - it is designed to be used in contexts more general than parallel discrete-event simulation. To establish our claim to speed, we compare our algorithm's performance with the standard barrier algorithm, and find that its additional costs are not excessive. Our experiments are conducted using up to 256 processors on the Intel Touchstone Delta.
Of critical importance to any real-time system is the issue of predictability. We divide overall system predictability into two parts: algorithmic and systemic. Algorithmic predictability is concerned with ensuring th...
ISBN:
(纸本)0769516084
Of critical importance to any real-time system is the issue of predictability. We divide overall system predictability into two parts: algorithmic and systemic. Algorithmic predictability is concerned with ensuring that the parallelsimulation engine and model from a complexity point of view are able to consistently yield results within a real-time deadline. Systemic predictability is concerned with ensuring that OS scheduling, interrupts and virtual memory overheads are consistent over a real-time period. To provide a framework for investigating systemic predictability, we define a new class of parallelsimulation called Extreme simulation or XSim. An XSim is any analytic parallelsimulation that is able to generate a statistically valid result by a real-time deadline. Typically, this deadline is between 10 and 100 milliseconds. XSims are expected to provide decision support to existing complex, realtime systems. As a new design and implementation methodology for realizing XSims, we embed a state-of-the-art optimistic simulator into the Linux operating system. In this operating environment, OS scheduling and interrupts are disabled. Given a 50 millisecond model completion deadline, we observe that the XSim has a systemic predictability, measure of 98% compared with only 56% for the same Time Warp system operating in user-level.
Performance of VHDL simulation is a critical issue in electronic circuit design and is hard to achieve due to the complexity of the language and the different abstraction levels. This paper presents a system for perfo...
详细信息
ISBN:
(纸本)1565550277
Performance of VHDL simulation is a critical issue in electronic circuit design and is hard to achieve due to the complexity of the language and the different abstraction levels. This paper presents a system for performance evaluation of distributed-time VHDL simulation based on the analysis of simulation traces. The system allows to model different architectures, interconnection topologies and simulation algorithms. The main tools are a VHDL analyzer to extract dependencies, and a trace-driven simulator to evaluate the execution time on a given architecture.
In this paper we examine various modeling and simulation applications of cluster computing using a Beowulf cluster. Those applications are used to investigate the performance of our cluster in terms of computational s...
详细信息
ISBN:
(纸本)0769508375
In this paper we examine various modeling and simulation applications of cluster computing using a Beowulf cluster. Those applications are used to investigate the performance of our cluster in terms of computational speedup, scalability, and communications. The applications include solution of linear systems by Jacobi iteration, distributed image gent ration, and the finite difference time domain solution of Maxwell's equations. It is observed that the computational load for these applications must be large compared to the communication overhead tr, take advantage of the speedup obtained using parallel computing. For the applications reviewed here, this condition is increasingly satisfied as the problem size becomes larger or as higher resolution is required.
In this paper we introduce a new concept, network atomic operations (NAOs) to create a zero-cost consistent cut. Using NAOs, we define a wall-clock-time driven GVT algorithm called Seven O' Clock that is an extens...
详细信息
ISBN:
(纸本)0769523838
In this paper we introduce a new concept, network atomic operations (NAOs) to create a zero-cost consistent cut. Using NAOs, we define a wall-clock-time driven GVT algorithm called Seven O' Clock that is an extension of Fujimoto's shared memory GVT algorithm. Using this new GVT algorithm, we report good optimistic parallel performance on a cluster of state-of-the-art Itanium-II quad processor systems for both benchmark applications such as PHOLD and real-world applications such as a large-scale TCP/Internet model. In some cases, super-linear speedup is observed.
暂无评论