We propose a new approach to parallelizing fault simulation in which the test set is partitioned among the available processors. the approach can be used for any of the sequential circuit fault simulation algorithms c...
详细信息
ISBN:
(纸本)0818677554
We propose a new approach to parallelizing fault simulation in which the test set is partitioned among the available processors. the approach can be used for any of the sequential circuit fault simulation algorithms commonly used, and it can be implemented on various different parallelarchitectures. this approach for the first time overcomes the limitations of serial logic simulation. In addition, the excessive redundant computations required in the traditional fault-partitioning approach are also considerably reduced. Significant improvements in speedup were observed as compared to previous approaches. An average speedup of 5.7 was obtained for test set partitioning over 10 processors for the benchmark circuits studied. Although pessimistic fault coverage may be reported in some cases, the proposed approach was found to be very accurate for the circuits studied.
the proceedings contains 80 papers from the Fourthinternationalconference on High Performance Computing. Topics discussed include: database management systems (DBMS);data migration and caching;algorithms;programming...
详细信息
the proceedings contains 80 papers from the Fourthinternationalconference on High Performance Computing. Topics discussed include: database management systems (DBMS);data migration and caching;algorithms;programming and languages;load balancing and scheduling;reconfigurable custom computing;routing;instruction level parallelism (ILP) architectures and compiler issues;parallel input/output and multithreaded systems;virtual channels;and image processing.
this paper presents a complete methodology for the automatic synthesis of VLSI architectures used in digital signal processing. Most signal processingalgorithms have the form of an n-dimensional nested loop with unit...
详细信息
ISBN:
(纸本)0780341376
this paper presents a complete methodology for the automatic synthesis of VLSI architectures used in digital signal processing. Most signal processingalgorithms have the form of an n-dimensional nested loop with unit uniform loop carried dependencies. We model such algorithms with generalized UET grids. We calculate the optimal makespan for the generalized UET grids and then we establish the minimum number of systolic cells required achieving the optimal makespan. We present a complete methodology for the hardware synthesis of the resulting architecture, based on VHDL. this methodology automatically detects all necessary computation and communication elements and produces optimal layouts. the complexity of our proposed scheduling policy is completely independent of the size of the nested loop and depends only on its dimension, thus being the most efficient (in terms of complexity) known to us. All these methods were implemented and incorporated in an integrated software package which provides the designer with a powerful parallel design environment, from high level signal processing algorithmic specifications to low-level (i.e., actual layouts) optimal implementation. the evaluation was performed using well-known algorithms from signal processing.
In order to generate local addresses for an array section A(l:h:s) with block-cyclic distribution, an efficient compiling method is required. In this paper, two local address generation methods for the block-cyclic di...
详细信息
In this paper we present a new approach for fault tolerance in VLSI processor architectures. the reconfiguration technique is a general one in the sense that it can be applied to any arbitrary architecture with any nu...
详细信息
In this paper we present a new approach for fault tolerance in VLSI processor architectures. the reconfiguration technique is a general one in the sense that it can be applied to any arbitrary architecture with any number of spares each of which may be connected to an arbitrary number of processing elements. the technique is composed of two stages, local and global reconfiguration. In the local reconfiguration stage, faulty cells are maximally mapped onto adjacent spares. In the global stage, the shortest path from a faulty cell to a spare is found and the spare is ″propagated″ to the faulty site by the logical displacement of processing elements along that path.
this contribution describes a new class of arithmetic architectures for Galois fields GF(2k). the main applications of the architecture are public-key systems which are based on the discrete logarithm problem for elli...
详细信息
We present two algorithms to minimize the amount of synchronization added when parallelizing a loop with loop-carried dependences. In contrast to existing schemes, our algorithms add lesser synchronization, while pres...
详细信息
We present two algorithms to minimize the amount of synchronization added when parallelizing a loop with loop-carried dependences. In contrast to existing schemes, our algorithms add lesser synchronization, while preserving the parallelism that can be extracted from the loop. Our first algorithm uses an interval graph representation of the dependence `overlap' to find a synchronization placement in time almost linear in the number of dependences. Although this solution may be suboptimal, it is still better than that obtained using existing methods, which first eliminate redundant dependences and then synchronize the remaining ones. Determining the optimal synchronization is an NP-complete problem. Our second algorithm therefore uses integer programming to determine the optimal solution. We first use a polynomial-time algorithm to find a minimal search space that must contain the optimal solution. then, we formulate the problem of choosing the minimal synchronization from the search space as a set-cover problem, and solve it exactly using 0-1 integer programming. We show the performance impact of our algorithms by synchronizing a set of synthetic loops on an 8-processor Convex Exemplar. the greedily synchronized loops ran between 7% and 22% faster than those synchronized by the best existing algorithm. Relative to the same base, the optimally synchronized loops ran between 10% and 22% faster.
In this paper a parallel version of the hybrid method of moments/Green's function technique is presented for the analysis of portable hand-held transceivers radiating close to the human head. As compared to other ...
详细信息
In this paper a parallel version of the hybrid method of moments/Green's function technique is presented for the analysis of portable hand-held transceivers radiating close to the human head. As compared to other numerical techniques, this formulation leads to a drastic reduction in memory requirement and because of the parallel implementation gives also acceptable execution times. After a brief description of the theory and the parallelprocessing, an example is presented which demonstrates the very efficient parallelization using the message passing interface standard even on an inhomogeneous cluster of workstations.
Object dataflow is a popular approach used in parallel rendering. the data representing the 3D scene is statically distributed among processors and objects are fetched and cached only on demand. Most previous object d...
详细信息
Object dataflow is a popular approach used in parallel rendering. the data representing the 3D scene is statically distributed among processors and objects are fetched and cached only on demand. Most previous object dataflow methods were implemented on shared memory architectures and exploited spatial coherency to reduce hardware cache misses. In this paper, we propose an efficient model for object dataflow parallel volume rendering on message passing machines. the algorithm is introduced and its ray storage mechanism is used to support latency hiding by postponing computation on inactive rays. Memory usage is optimized by letting objects migrate and replicate at different processors rather than the common static assignments. Our cache-only-memory approach uses a distributed-directory scheme to trace the location of objects at other nodes. A mechanism to minimize network congestion was implemented which optimizes channel utilization. Unlike previous methods, our approach can benefit from temporal coherence and effectively minimizes communication costs during animation on limited-bandwidth multiprocessing environments. We report results of the algorithm's implementation on several platforms like Cray T3D, Convex SPP and DEC-alpha cluster of workstations (COWs), and achieved higher efficiency and scalability than existing algorithms.
Recently the application of optical or photonics technology to microwave array antennas in the signal processing and beam formation has been studied. But so far there are only few practical systems reported on the mul...
详细信息
Recently the application of optical or photonics technology to microwave array antennas in the signal processing and beam formation has been studied. But so far there are only few practical systems reported on the multibeam applications. In this paper, an optical signal processing array antenna for both multibeam transmission and reception is proposed by using parallel optical processing principle and optical heterodyne techniques. the structure and working principle for the proposed optical processing array antenna are given. the receive mode of this antenna by using plural local signals generated by optical processor is shown. To support the antenna design, an experimental setup for a 2-beam array antenna at Ku band frequency is demonstrated. Measured amplitude and phase distributions of optical excitation have very good agreement withthe calculated data.
暂无评论