A recent paper by Bailey [1] contains a theorem stating that the idealized execution times of unit-delay, synchronous and conservative asynchronous simulations are equal under the conditions that unlimited number of p...
详细信息
A recent paper by Bailey [1] contains a theorem stating that the idealized execution times of unit-delay, synchronous and conservative asynchronous simulations are equal under the conditions that unlimited number of processors are available and the evaluation time of each logic element is equal. Further it is shown that the above conditions result in a lower bound on the execution times of both synchronous and conservative asynchronous simulations. Bailey's above important conclusions are derived under a strict assumption that the inputs to a circuit remain fixed during the entire simulation. We remove this limitation and, by extending the analyses to multi-input, multi-output circuits with an arbitrary number of input events, show that the conservative asynchronous simulation extracts more parallelism and executes faster than synchronous simulation in general. Our conclusions are supported by a comparison of the idealized execution times of synchronous and conservative asynchronous algorithms on ISCAS combinational and sequential benchmark circuits.
Verification has grown to dominate the cost of electronic system design, consuming about 60% of design effort. Among several verification techniques, logicsimulation remains the major verification technique. Speeding...
详细信息
Verification has grown to dominate the cost of electronic system design, consuming about 60% of design effort. Among several verification techniques, logicsimulation remains the major verification technique. Speeding up logicsimulation results in great savings and shorter time-to-market. We parallelize logicsimulation using Graphics Processing Units (GPUs). In the past, GPUs were special-purpose application accelerators, suitable only for conventional graphics applications. The new generations of GPU architecture provide easier programmability and increased generality while maintaining the tremendous memory bandwidth and computational power of traditional GPUs. We develop a parallel cycle-based logicsimulation algorithm that uses And Inverter Graphs (AIGs) as design representations. AIGs have proven to be an effective representation for various design automation applications, and we obtain similar benefits for speeding up logicsimulation. We develop two clustering algorithms that partition the gates in the designs into independent blocks. Our algorithms exploit the massively parallel GPU architecture featuring thousands of concurrent threads, fast memory, and memory coalescing for optimizations. We demonstrate up-to 5x and 21x speedups on several benchmarks using our simulation system with the first and second clustering algorithms, respectively. Our work ultimately results in significant reduction in the overall design cycle.
In this paper, we propose a lock-free architecture to accelerate logic gate circuit simulation using SIMD multi-core machines. We evaluate its performance on different test circuits simulated on the Intel Xeon Phi and...
详细信息
ISBN:
(纸本)9781509020881
In this paper, we propose a lock-free architecture to accelerate logic gate circuit simulation using SIMD multi-core machines. We evaluate its performance on different test circuits simulated on the Intel Xeon Phi and 2 other machines. Comparisons are presented of this software/hardware combination with reported performances of GPU and other multi-core simulation platforms. Comparisons are also given between the lock free architecture and a leading commercial simulator running on the same Intel hardware.
Explicit-multi-threading (XMT) is a parallel programming approach for exploiting on-chip parallelism. Its fine-grained single program multiple data (SPMD) programming model is suitable for many computing intensive app...
详细信息
Explicit-multi-threading (XMT) is a parallel programming approach for exploiting on-chip parallelism. Its fine-grained single program multiple data (SPMD) programming model is suitable for many computing intensive applications. In this paper, we present a parallel gate level logic simulator implemented on an XMT platform and study its performance. Test results show potential for achieving more than a hundred-fold speedup over a serial implementation. This indicates an interesting possibility for a certain type of a single chip multicore architecture: use an existing easy-to-program API, such as VHDL or Verilog, for reduced application-software development time and better performance over serial performance-driven languages, such as C.
暂无评论