A CMOS image sensor/processor chip fabricated in a 0.35 mu m CMOS technology is presented. The chip contains a general purpose software-programmable SIMD array of 128 x 128 processing elements. It executes over 20 GOP...
详细信息
A CMOS image sensor/processor chip fabricated in a 0.35 mu m CMOS technology is presented. The chip contains a general purpose software-programmable SIMD array of 128 x 128 processing elements. It executes over 20 GOPS while dissipating 240 mW of power and achieves pixel-processor density of 4 10 cells/mm(2). Performance and accuracy measurement results are given.
Shorter total interconnect and fewer switches in a processor array definitely lead to less capacitance, power dissipation and dynamic communication cost between the processing elements. This paper presents an algorith...
详细信息
Shorter total interconnect and fewer switches in a processor array definitely lead to less capacitance, power dissipation and dynamic communication cost between the processing elements. This paper presents an algorithm to find a maximum logical array (MLA) that has shorter inter-connect and fewer switches in a reconfigurable VLSI array with hard/soft faults. The proposed algorithm initially gen-erates the middle (|k/2|th) logical column and then makes it nearly straight for the MLA with k logical columns. A dy-namic programming approach is presented to compact other logical columns toward the middle logical column, result-ing in a tightly-coupled MLA. In addition, the lower bound of the interconnect length of the MLA is proposed. Experi-mental results show that the resultant logical array is nearly optimal for the host array with large fault size, according to the proposed lower bound.
We first relate the architecture of systolic arrays to the technological and economic design forces acting on architects of special-purpose systems some 20 years ago. We then observe that those same design forces now ...
详细信息
We first relate the architecture of systolic arrays to the technological and economic design forces acting on architects of special-purpose systems some 20 years ago. We then observe that those same design forces now are bearing down on the architects of contemporary general-purpose processors, who consequently are producing general-purpose processors whose architectural features are increasingly similar to those of systolic arrays. We then describe some economic and technological forces that are changing the landscape of architectural research. At base, they are the increasing complexity of technology and applications, the fragmenting of the general-purpose processor market, and the judicious use hardware configurability. We describe a 2D architectural taxonomy, identifying what, we believe, to be a "sweet spot" for architectural research.
We first relate the architecture of systolic arrays to the technological and economic design forces acting on architects of special-purpose systems some 20 years ago. We then observe that those same design forces now ...
详细信息
We first relate the architecture of systolic arrays to the technological and economic design forces acting on architects of special-purpose systems some 20 years ago. We then observe that those same design forces now are bearing down on the architects of contemporary general-purpose processors, who consequently are producing general-purpose processors whose architectural features are increasingly similar to those of systolic arrays. We then describe some economic and technological forces that are changing the landscape of architectural research. At base, they are the increasing complexity of technology and applications, the fragmenting of the general-purpose processor market, and the judicious use hardware configurability. We describe a 2D architectural taxonomy, identifying what, we believe, to be a "sweet spot" for architectural research.
Methods for an efficient mapping of algorithms to parallel architectures are of utmost importance because many state-of-the-art embedded digital systems deploy parallelism to increase their computational power. This p...
详细信息
ISBN:
(纸本)9780889866386
Methods for an efficient mapping of algorithms to parallel architectures are of utmost importance because many state-of-the-art embedded digital systems deploy parallelism to increase their computational power. This paper deals with the mapping of loop programs onto processor arrays implemented in an FPGA or available as (reconfigurable) coarsegrained processor architectures. Most existing work is closely related to approaches from the DSP domain and is not able to exploit the full parallelism of a given algorithm and the computational potential of a typical 2-dimensional array. In contrast, we present a mapping methodology which incorporates many important parameters of the target architecture in one approach. These are: number of processing elements, resources of the data path and memory within a processing element, and interconnection within the processor array. Based on these parameters, we formulate an optimization problem whose solution specifies an efficient mapping of an algorithm to the target architecture. We can optimize for speed of the algorithm and/or hardware cost caused by the communication and computation resources of the architecture.
Fine grain parallel architectures such as processor arrays (PAs) play an important role in the acceleration of applications which demand high processing capabilities. Methods for the mapping of compute-intensive algor...
详细信息
ISBN:
(纸本)9780889866386
Fine grain parallel architectures such as processor arrays (PAs) play an important role in the acceleration of applications which demand high processing capabilities. Methods for the mapping of compute-intensive algorithms to PAs often neglect an efficient routing of input and output (I/O). We formulate an integer linear program (ILP) with the objective to minimize the cost of channels and registers within the PA which are required to route the I/O. This optimization problem is integrated in our framework for the design of PAs. There we use partitioning to map algorithms to PAs. I/O is caused by the algorithm itself and in partitioning by the intermediate data because of the sequential execution of the partitions. The ILP for the routing of the I/O can be combined with an optimization problem which considers the efficient routing of the data dependencies of the algorithm within the PA. We demonstrate the combined approach on the edge detection algorithm.
As new genes are sequenced, it is common for molecular biologists to compare the new gene's DNA to known sequences. One simple form of DNA sequence comparison is done by solving the Longest Common Subsequence (LCS...
详细信息
ISBN:
(纸本)9780889866386
As new genes are sequenced, it is common for molecular biologists to compare the new gene's DNA to known sequences. One simple form of DNA sequence comparison is done by solving the Longest Common Subsequence (LCS) problem. In this paper, we propose a parallel algorithm and specialized FPGA-based processor (the associative ASC processor with reconfigurable 2D mesh) to solve the exact and approximate match LCS problems. This solution uses inexpensive hardware and can be reconfigured as new analysis techniques are developed, making it particularly attractive for processing biosequences.
For search-intensive applications such as data mining and bioinformatics, a SIMD processor array on a Chip may be an effective architecture, and if the application is control-intensive, a Multiple SIMD (MSIMD) archite...
详细信息
ISBN:
(纸本)9780889866386
For search-intensive applications such as data mining and bioinformatics, a SIMD processor array on a Chip may be an effective architecture, and if the application is control-intensive, a Multiple SIMD (MSIMD) architecture may further increase processor utilization. In this paper, we describe the implementation of an associative MSIMD architecture on the MASC processor. The MASC processor implemented using FPGAs, is easily scalable, and dynamically assigns tasks to Processing Elements as the program executes.
This paper considers a reconfiguration problem on a processor array model based on single-and-half-track switches, which is proposed for a fault tolerance technique at the fabrication time. The focus of this paper is ...
详细信息
This paper considers a reconfiguration problem on a processor array model based on single-and-half-track switches, which is proposed for a fault tolerance technique at the fabrication time. The focus of this paper is to achieve the optimal reconfigurability, which means that whenever there exists a solution for successful reconfiguration, the designed method can find the solution. The paper consists of two parts. In the first part, we show two essential constraints that have been assumed in most of the previous studies, and make four reconfiguration classes that differ in the assumed essential constraints. Then, we present some inclusion relations among the four reconfiguration classes. As a result, it becomes clear that the most restrictive class including most of the previous methods never achieves the truly optimal reconfigurability. In the second part, we present a reconfiguration method based on sequential routing (RMSR). Although the worst-case time complexity of the RMSR is exponential in the number of processing elements, the reconfigurability of the RMSR is optimal within the most restrictive reconfiguration class. The effectiveness of the RMSR is shown by a computer simulation.
The paper deals with the problem of analyzing fault, susceptibility of a parallel algorithm designed for multiprocessor array (MIMD structure). This algorithm realizes quite complex communication protocol in the syste...
详细信息
ISBN:
(纸本)0769517307;0769517315
The paper deals with the problem of analyzing fault, susceptibility of a parallel algorithm designed for multiprocessor array (MIMD structure). This algorithm realizes quite complex communication protocol in the system. We present an original methodology of the analysis based on the use of software implemented fault injector. The considered algorithm is modeled as a multithreaded application. The experiment set up an I results are presented and commented The performed experiments proved relatively high natural robustness of the analyzed algorithm and showed further possibilities of its improvement.
暂无评论