In several digital signal processing algorithms, computational nodes are organized in consecutive stages and data is reordered between these stages. Parallel computation of such algorithms with reduced number of proce...
详细信息
ISBN:
(纸本)0769522262
In several digital signal processing algorithms, computational nodes are organized in consecutive stages and data is reordered between these stages. Parallel computation of such algorithms with reduced number of processing elements implies that several computational nodes are assigned to each element. As a drawback, permutations become more complex and require data storage. In this paper, a systematic design methodology for stride permutation networks is derived. These permutations are represented with Boolean matrices, which are decomposed and mapped directly onto register-based networks. The resulting networks are regular and scalable and they support any stride of power-of-two. In addition, the networks reach the lower bound in the number of registers indicating area-efficiency. Since the proposed methodology is systematic, it can be exploited in automated design generation.
Point mutation of amino acids is a means used by biotechnologists to improve the performance of proteins. To study a point-mutated polypeptide, one requires its global minimum energy conformation. This conformation ca...
详细信息
Point mutation of amino acids is a means used by biotechnologists to improve the performance of proteins. To study a point-mutated polypeptide, one requires its global minimum energy conformation. This conformation can be determined by molecular dynamics via Langevin's equations of motion. Molecular dynamics simulations belong to the most difficult problems to parallelize in a scalable manner. We provide a method for defining a special purpose 3D array processor architecture for the molecular dynamics simulation of point-mutated polypeptides. The architecture is derived from a spatial decomposition of a known conformation of the point-mutated polypeptide or the native conformation of the given protein. By using an approximation scheme for the deterministic forces, the interprocessor communication can be kept local. The architecture affords a simple distributed load balancer and is scalable. The computational workload of the array processor architecture to perform molecular dynamics simulations under realistic conditions is addressed. An example architecture is given by point-mutated penicillin amidase.
This paper presents a novel architecture for array processor,called LEAP,which is a set of simple processing *** targeted programs are perfect innermost *** using the technique called if-conversion,the control depende...
详细信息
This paper presents a novel architecture for array processor,called LEAP,which is a set of simple processing *** targeted programs are perfect innermost *** using the technique called if-conversion,the control dependence can be converted to data dependence to prediction *** an innermost loop can be represented by a data dependence graph,where the vertex supports the expression statements of high level languages. By mapping the data dependence graph to fixed PEs,each PE steps the loop iteration automatically and independently at the *** execution forms multiple pipelining *** simulation of four loops of LFK shows the effectiveness of the LEAP architecture,compared with traditional CISC and RISC architectures.
This paper describes a VLSI array processor system designed and built for classification problems based on the k-nearest-neighbors approach. This architecture is suitable for different pattern recognition applications...
详细信息
This paper describes a VLSI array processor system designed and built for classification problems based on the k-nearest-neighbors approach. This architecture is suitable for different pattern recognition applications and is very efficient for high-dimensional databases. The architecture is scalable with the size of the recognition problem making the system effectively applicable to computational intensive application like on-line pattern recognition. A system prototype composed of a board with two processors, the software driver and a test application have been built and evaluated. For handwritten character recognition task the complete system shows a speed up of 260 times over a sequential algorithm running on a Sun SPARC20 workstation. (C) 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.
In modern computers the processors synchronization problem arises. In array processors the clock skew may be significant. The last may lead to the incorrect work of parallel algorithms. The problem of a clock skew in ...
详细信息
ISBN:
(纸本)5742202601
In modern computers the processors synchronization problem arises. In array processors the clock skew may be significant. The last may lead to the incorrect work of parallel algorithms. The problem of a clock skew in high-speed systems is so much important that a modern VLSI are often supplied by several phase locked loops, placed on one chip. In this case the phase locked loops can be used for creating a distributed system of generators. Here the discrete phase locked loops is considered.
A novel architecture named Window-Memory Sharing processorarray is proposed, which targets window operations in image processing. The architecture can be used not only for conventional image filtering, but also in pr...
详细信息
A novel architecture named Window-Memory Sharing processorarray is proposed, which targets window operations in image processing. The architecture can be used not only for conventional image filtering, but also in practical window operations such as motion vector search in MPEG2. The derived architecture is flexible enough to satisfy user's requirement for either area or speed.
Two-dimensional (2-D) sliding discrete Fourier transform (DFT) algorithm can realize sliding spectrum analysis and real-time signal processing. In this paper, its fixed-point error analysis is carried out to form a th...
详细信息
Two-dimensional (2-D) sliding discrete Fourier transform (DFT) algorithm can realize sliding spectrum analysis and real-time signal processing. In this paper, its fixed-point error analysis is carried out to form a theoretical basis for hardware implementation. The analysis models the error as an additive white noise and arrives at the signal to noise ratio (SNR) successively. Then, a simplified method for 2-D sliding DTT based on vector radix (VR) algorithm is introduced. With this approach the fixed-point error can be reduced to the same scale as that of 2-D FFT. As an example, the architecture and error analysis of 8*8 2-D sliding DFT array processor based on VR-4*4 algorithm are presented. The idea can be extended to larger size DFT. Finally some comparisons ape derived. (C) 1999 Elsevier Science B.V. All rights reserved.
In this paper;we present a novel architecture named as Window-MSPA architecture which targets to window operations in image processing. We have previously developed a Memory Sharing processorarray (MSPA) for fast arr...
详细信息
In this paper;we present a novel architecture named as Window-MSPA architecture which targets to window operations in image processing. We have previously developed a Memory Sharing processorarray (MSPA) for fast array processing with regular iterative algorithms. Window-MSPA tries to optimize the data I/O ports and the number of processing elements so as to reduce hardware cost. The input scheme of image data is restricted to row by row input which simplifies the I/O architecture. Under this practical I/O restriction, the fastest processings are achieved. In this paper, we present the general Window-MSPA design methodology for wide variety of applications.,its an practical application, we have already reported the design of MP@HL MPEG2 Motion Estimator LS1[13]. Design formulas for Window-MSPA architecture are given for various size of window: operations in image processing. Thus, the derived architecture is flexible enough to satisfy user's requirement for either area or speed.
A digital programmable artificial retina (PAR) is a functional extension of a CMOS imager, in which every pixel is fitted with a local ADC and a tiny digital programmable processor. From an architectural viewpoint, a ...
详细信息
ISBN:
(纸本)0819439843
A digital programmable artificial retina (PAR) is a functional extension of a CMOS imager, in which every pixel is fitted with a local ADC and a tiny digital programmable processor. From an architectural viewpoint, a PAR is an SIMD array processor with local optical input. A PAR is aimed at processing images on-site (where they are sensed) until they can be output from the array under concentrated form. The overall goal is to get compact, fast and inexpensive vision systems, in particular for robotics applications. A 256x256 PAR with up to a few tens bits of local memory per pixel (allowing image sequence processing) is now within reach at reasonable cost. However, whereas the local memory size benefits quadratically from the feature size decrease, wiring density improvement can only be linear, at best. So control should become more complex with the danger of a growing proportion of the digital pixel area being devoted to instruction or address decoding. We propose efficient scalable solutions to this problem at the architectural, circuit and topological levels, which attempt to minimise both silicon area and power consumption.
processor's architecture has great effect on the performance of whole processorarray. In order to improve the performance of SIMD array architecture, we modified the structure of BAP (bit-serial array processor) ...
详细信息
ISBN:
(纸本)0819439916
processor's architecture has great effect on the performance of whole processorarray. In order to improve the performance of SIMD array architecture, we modified the structure of BAP (bit-serial array processor) processing element based on the BAP128 processor. The array processor chip of modified bit-serial array processor (MBAP in abbreviation) with 0.35 mum CMOS technology is designed for embedded image understanding system. This paper not only presents MBAP architecture, but also gives the architecture feature about this design. Toward basic macro instructions and low-level processing algorithms of image understanding, the performance of BAP and MBAP is compared. The result shows that the performance of MBAP has much improvement on BAP, at the cost of increasing 5% chip resource.
暂无评论