In this paper, a boundary postprocessing technique is proposed to compute the discrete wavelet transform (DWT) near block boundaries. the basic idea is to take advantage of available lifting filterbank factorizations ...
详细信息
In this paper, a boundary postprocessing technique is proposed to compute the discrete wavelet transform (DWT) near block boundaries. the basic idea is to take advantage of available lifting filterbank factorizations to model the DWT as a Finite State Machine (FSM). the proposed technique can reduce the size of auxiliary buffers in block-based DWT implementations and reduce the communication overhead between adjacent blocks. Two new DWT system architectures, Overlap-State sequential and Split-and-Merge parallel, are presented using this technique. Experimental results show that, for the popular (9, 7) filters, the size of auxiliary buffers can be reduced by 42% and that the parallel algorithm is 30% faster than existing approaches.
this paper presents two methods for solving a partial differential equation of the second order, with application to the well-known Poisson equation. these methods are aimed at making a high-speed hardware solver. the...
详细信息
this paper presents two methods for solving a partial differential equation of the second order, with application to the well-known Poisson equation. these methods are aimed at making a high-speed hardware solver. the solutions presented will be a part of a hardware device simulator which is called "Virtual Device". We present simulation results to compare the two methods for solving this equation. We start with an iterative method (Gauss-Seidel method) and then end with a direct method (LU method).
Image processing is often considered a good candidate for the application of parallelprocessing because of the large volumes of data and the complex algorithms commonly encountered. this paper presents a tutorial int...
详细信息
Image processing is often considered a good candidate for the application of parallelprocessing because of the large volumes of data and the complex algorithms commonly encountered. this paper presents a tutorial introduction to the field of parallel image processing. After introducing the classes of parallelprocessing a brief review of architectures for parallel image processing is presented. Software design for low-level image processing and parallelism in high-level image processing are discussed and an application of parallelprocessing to handwritten postcode recognition is described. the paper concludes with a look at future technology and market trends.
We describe a family of reconfigurable parallelarchitectures for logic emulation. they are supposed to be applicable like conventional FPGAs, while covering a larger range of circuit sizes and clock frequencies. In o...
详细信息
ISBN:
(纸本)3540643591
We describe a family of reconfigurable parallelarchitectures for logic emulation. they are supposed to be applicable like conventional FPGAs, while covering a larger range of circuit sizes and clock frequencies. In order to evaluate the performance of such programmable designs, we also need software methods for code generation from circuit descriptions. We propose a combination of scheduling and routing algorithms for embedding calculations into the target architecture.
this work deals withthe scheduling problem of a directed acyclic graph with interprocessor communication delays. the objective is to minimize the makespan, taking into account the contention in the network induced by...
详细信息
ISBN:
(纸本)3540643591
this work deals withthe scheduling problem of a directed acyclic graph with interprocessor communication delays. the objective is to minimize the makespan, taking into account the contention in the network induced by the message routing. We propose two heuristics for solving the scheduling and routing problems onto arbitrary networks, taking into consideration the access conflicts to links during the task scheduling. Both heuristics significantly improve the performance of the algorithms which do not consider the contention in the network. the comparison of these heuristics is done on problems with different granularity levels in regard to execution times and number of needed processors.
the paper presents a fixed structure systolic array for perform a set of computationally and real-time demanding problems that frequently arise in the area of image processing. the fixed structure systolic array imple...
详细信息
ISBN:
(纸本)0780338790
the paper presents a fixed structure systolic array for perform a set of computationally and real-time demanding problems that frequently arise in the area of image processing. the fixed structure systolic array implements Faddeev Algorithm, which could be interpreted as generalised Gauss elimination. Modification of the algorithm is considered for improved stability and accuracy. the computations of the modified algorithm are presented in the form of SFG. this is followed with possible applications and additional extensions of the proposed systolic array structure. Enormous computational and real-time requirements for signal and image processing problems support development of fast application specific structures capable of improving the performance for several orders of magnitude compared to general-purpose computer architectures [1]. parallel computing is gaining increasing importance withthe evolution of algorithms for image and signal processing, advances in VLSI technology and ever-broader range of applications. In high-end computing various parallel structures from powerful supercomputers to processor arrays are employed to meet the demands of real-time problems. From a sequential computation point of view, a fast algorithm is the one using a reduced number of operations, which is an important measure of speed in sequential contest. Many DSP algorithms can serve as an example, DFT algorithm compared to several versions of FFT is probably the most evident. In parallelprocessingthe level of concurrency can become more important than the actual number of operations. Ability to parallelise an algorithm and the proportion of sequential code can severely impact the performance of the parallel structure.
In many of the various layers of software supporting reconfigurable architectures such as compilers, operating systems, synthesis tools, and so forth, a primary objective is to deliver the performance, power, cost, an...
详细信息
ISBN:
(纸本)3540643591
In many of the various layers of software supporting reconfigurable architectures such as compilers, operating systems, synthesis tools, and so forth, a primary objective is to deliver the performance, power, cost, and other advantages of reconfigurable architectures to a target application. Inherent to these tools are various estimation procedures for such performance metrics as throughput time, power, reliability, cost, and so on. Analysis of Reconfigurable Computers (ARC) is a comprehensive analysis and modeling tool we are developing that can be used to calculate these and other performance metrics.
QRS is the dominant complex in the Electrocardiogram (ECG). Its accurate detection is of fundamental importance to reliable ECG interpretation and hence, to all systems analyzing the ECG signal (e.g. Heart-monitoring)...
详细信息
QRS is the dominant complex in the Electrocardiogram (ECG). Its accurate detection is of fundamental importance to reliable ECG interpretation and hence, to all systems analyzing the ECG signal (e.g. Heart-monitoring). Syntactic methods are a very powerful tool for QRS detection, since they can easily describe complex patterns, but their high computational cost prevents their implementation for real time applications. In this paper, we present a VLSI architecture for ECG signal processing, automatically derived using a nested loop parallelization method. this architecture detects the QRS complex by parsing the corresponding to the ECG signal input string, based on an attribute grammar describing it.
We derive cost formulae for three different parallelisation techniques for training supervised networks. these formulae are parameterised by properties of the target computer architecture. It is therefore possible to ...
详细信息
ISBN:
(纸本)3540643591
We derive cost formulae for three different parallelisation techniques for training supervised networks. these formulae are parameterised by properties of the target computer architecture. It is therefore possible to decide the best match between parallel computer and training technique. One technique, exemplar parallelism, is far superior for almost all parallel computer architectures. Formulae also take into account optimal batch learning as the overall training approach.
this paper describes a new parallel algorithm for Minimum Cost Path computation on the Polymorphic Processor Array, a massively parallel architecture based on a reconfigurable mesh interconnection network. the propose...
详细信息
ISBN:
(纸本)3540643591
this paper describes a new parallel algorithm for Minimum Cost Path computation on the Polymorphic Processor Array, a massively parallel architecture based on a reconfigurable mesh interconnection network. the proposed algorithm has been implemented using the Polymorphic parallel C language and has been validated through simulation. the proposed algorithm for the Polymorphic Processor Array, delivers the same performance, in terms of computational complexity, as the hypercube interconnection network of the connection Machine, and as the Gated Connection Network.
暂无评论