The use of array processors in molecular graphics is a relatively recent phenomenon, and the utility of these devices is not yet widely appreciated. The paper describes the architecture of array processors and the kin...
详细信息
The use of array processors in molecular graphics is a relatively recent phenomenon, and the utility of these devices is not yet widely appreciated. The paper describes the architecture of array processors and the kind of scientific projects in which their use is appropriate and beneficial. There are two major areas of molecular graphics where the use of a programmable array processor either leads to a considerable increase in the speed of a particular operation, or makes a hitherto uneconomical calculation feasible in a reasonable timescale. These areas are the real-time transformation of images on a graphics screen and/or the completion of large-scale molecular mechanics, molecular dynamics, or molecular orbital calculations. The problems associated with the use of array processors in these applications are discussed at length.
The algebraic path problem (APP) is a general framework which unifies several solution procedures for a number of well-known matrix and graph problems. In this paper, we present a new 3-dimensional (3D) orbital algebr...
详细信息
The algebraic path problem (APP) is a general framework which unifies several solution procedures for a number of well-known matrix and graph problems. In this paper, we present a new 3-dimensional (3D) orbital algebraic path algorithm and corresponding 2-D toroidal array processors which solve the n x n APP in the theoretically minimal number of 3n time-steps. The coordinated time-space scheduling of the computing and data movement in this 3-D algorithm is based on the modular function which preserves the main technological advantages of systolic processing: simplicity, regularity, locality of communications, pipelining, etc. Our design of the 2-D systolic array processors is based on a classical 3-D -> 2-D space transformation. We have also shown how a data manipulation (copying and alignment) can be effectively implemented in these array processors in a massively-parallel fashion by using a matrix-matrix multiply-add operation.
Reconfigurability of processor arrays is important due to two reasons 1) to efficiently execute different algorithms and 2) to isolate faulty processors, An array processor that is reconfigurable by the user any numbe...
详细信息
Reconfigurability of processor arrays is important due to two reasons 1) to efficiently execute different algorithms and 2) to isolate faulty processors, An array processor that is reconfigurable by the user any number of times to yield a different topology or to isolate faults is envisaged in this paper, The system has a host or controller that broadcasts a command to the interconnect to configure itself into a particular fashion, The interconnect uses static-RAM programming technology and can be programmed to different configurations by sending a different set of bits to the configuration random access memory (RAM) in the interconnect. We present three designs reconfigurable into array, ring, mesh, or Illiac mesh topologies, The first design provides no redundancy or fault tolerance, The second design is capable of graceful degradation by bypassing faulty elements, The third design is capable of graceful degradation by rerouting, The details of the interconnect and the configuration RAM contents for typical configurations are illustrated. It is seen that reconfigurable interconnect results in a highly reconfigurable or polymorphic computer.
We have proposed a modification of the orthogonal Faddeev method [6] for solving various SLAE and also for inversion and pseudoinversion of matrices. The proposed version of the method relies on Householder and Jordan...
详细信息
We have proposed a modification of the orthogonal Faddeev method [6] for solving various SLAE and also for inversion and pseudoinversion of matrices. The proposed version of the method relies on Householder and Jordan-Gauss methods and its computational complexity is approximately half that of [6]. This method, combined with the matrix-graph method [9] of formalized SPPC structure design, has been applied to synthesize a number of AP architectures that efficiently implement the proposed method. Goal-directed isomorphic and homeomorphic transformations of the LFG of the original algorithm (5) lead to a one-dimensional (linear) AP of fixed size, with minimum hardware and time costs and with minimized input-output channel width.
The proposed algorithm (5) has been implemented using a 4-processor AP, with Motorola DSP96002 processors as PEs (Fig. 7). Application of the algorithm (5) to solve an SLAE with a coefficient matrixA withM=N=100 and one righthand side on this AP produced a load factor η=0.82; for inversion of the matrixA of the same size we achieved η=0.77.
The sequence of transformations and the partitioning of a trapezoidal planaer LFG described in this article have been generalized to the case of other LA algorithms decribed by triangular planar LFGs and executed on linear APs. It is shown that the AP structures synthesized in this study execute all the above-listed algorithms no less efficiently than the modified Faddeev algorithm, provided their PEs are initially tuned to the execution of the corresponding operators.
High-speed thermal imaging is necessary in many applications. However, the traditional column-wise readout implementations reduce the achievable frame rate. Also, analog integration for each individual pixel is not po...
详细信息
High-speed thermal imaging is necessary in many applications. However, the traditional column-wise readout implementations reduce the achievable frame rate. Also, analog integration for each individual pixel is not possible without sacrificing pixel area. In this brief, we present an implementation of a fast and low-power pixel-wise readout circuit scheme with a digital integration method using 65-nm standard CMOS technology. The power consumption of the readout circuit is up to 15 mu W and the layout area is 100 mu m x 100 mu m. Furthermore, we analyze our design for non-idealities, such as noise and process mismatches using a circuit simulator.
To overcome the low testability due to circuit complexity and the pin limitations of VLSI devices, built-in self test and diagnosis are utilized for locating failures in tree array processors. Each cell (processing el...
详细信息
To overcome the low testability due to circuit complexity and the pin limitations of VLSI devices, built-in self test and diagnosis are utilized for locating failures in tree array processors. Each cell (processing element) generates pseudorandom test patterns and compresses test responses into a signature. By comparing signatures, the signature for the fault-free processor is found and used to locate faulty processors. For arrays with distributed faults, a tree array is partitioned into subtrees on which the diagnosis algorithm is applied in parallel. The time complexity of the diagnosis algorithm is derived.
Computing with large die-size graphical processors, that involves huge arrays of identical structures, in the late CMOS era is abounding with challenges due to spatial non-idealities arising from chip-to-chip and with...
详细信息
Computing with large die-size graphical processors, that involves huge arrays of identical structures, in the late CMOS era is abounding with challenges due to spatial non-idealities arising from chip-to-chip and within-chip variation of MOSFET threshold voltage. In this paper, we propose a software-framework using machine learning for in-situ prediction and correction of computation corrupted due to threshold voltage variation of transistors. Semi-supervised training is imparted to a fully connected cascade feed-forward (FCCFF) neural network (NN). This FCCFF-NN then creates an accurate spatial map of faulty processing elements (PE), which are avoided in computing. Besides correcting spatial faults, any transient errors (such as single-event upsets) are also tracked and corrected if the number of affected PEs is large enough to cause noticeable computing errors. For experimental validation, we consider a 256x256 PE array. Each PE is comprised of add-accumulate-multiply (AAM) block with three 8-bit registers (two for inputs and a third for storing the computed result). One thousand instances of this processor array are created and PEs in each instance are randomly perturbed with threshold voltage variation. Common image processing operations such as low pass filtering and edge enhancement are performed on each of these 1,000 instances. A fraction of these images (about 10 %) is used to train the NN for spatial non-idealities. Based on this training, the NN is able to accurately predict the spatial extremities in 95 % of all the remaining 90 % of the cases. The proposed NN based error tolerance produces superior quality processed images whose degradation is no longer visually perceptible.
This paper presents the linear array processors with multiple access modes memory system (LAPMAMM), an efficient mono-dimensional parallel architecture for real-time image processing. This architecture is composed of ...
详细信息
This paper presents the linear array processors with multiple access modes memory system (LAPMAMM), an efficient mono-dimensional parallel architecture for real-time image processing. This architecture is composed of n processors and n(2) memory modules. These memory modules have multiple access modes: RAM, FIFO, normal CAM and interactive CAM modes. They are associated to a linear array of VLIW processors which are interconnected using a simple tree network that ensures an O(log(n)) data propagation time. The practical working of the architecture is explained using the example of a labeling algorithm developed for LAPMAMM. A hardware simulation of a LAPMAMM prototype has been carried out to test its performance in low and intermediate level image processing. The simulation results of the VHDL model are presented. 2003 Elsevier Ltd. All rights reserved.
Algorithm-based fault tolerance (ABFT) is used to provide low-cost error protection for VLSI processor arrays used in real-time digital signal processing. The main objective of incorporating an ABFT technique in a pro...
详细信息
Algorithm-based fault tolerance (ABFT) is used to provide low-cost error protection for VLSI processor arrays used in real-time digital signal processing. The main objective of incorporating an ABFT technique in a processor array is to improve its reliability. All previous approaches on ABFT are evaluated in terms of their error detecting/correcting capabilities, the reliability improvement has never been addressed. In this paper, we develop a stochastic model for an array processor incorporating ABFT that takes the behavior of transient/intermittent failures and hardware overhead into account. This model is then used to evaluate reliability and reliability improvements of several existing ABFT techniques that tolerate single faults. Therefore, a user can evaluate a number of ABFT techniques and make a trade-off between reliability and cost prior to the implementation. Moreover, we have conducted extensive simulation experiments and the simulation results validate the proposed model.
Methods of implementing fast radix 2 transforms on array processors are considered. The complexity of arithmetic and data routing operations is analyzed for the methods given. It is shown that all methods give an O(P)...
详细信息
Methods of implementing fast radix 2 transforms on array processors are considered. The complexity of arithmetic and data routing operations is analyzed for the methods given. It is shown that all methods give an O(P) speed-up in the arithmetic operations for a P processor array. However the methods incur an overhead in data organization. Theorems are presented that prove one method to be superior in minimizing this overhead for transforms of length N > P.
暂无评论