this paper presents the second version of the multimedia fixed-point DSP (MDSP) chip for portable multimedia services and its chip implementation. MDSP employs parallelprocessing techniques, such as SIMD, vector proc...
详细信息
ISBN:
(纸本)0780356322
this paper presents the second version of the multimedia fixed-point DSP (MDSP) chip for portable multimedia services and its chip implementation. MDSP employs parallelprocessing techniques, such as SIMD, vector processing, and DSP schemes. MDSP can handle 8-, 16-, 32- or 40-bit data and can perform four MAC operations in parallel. In addition, MDSP can complete various vector operations in a cycle. Withthese features, MDSP can handle both 2-D video signal processing and 1-D signal processing. the MDSP chip has 73,095 gates, has been fabricated, and is running at 30 MHz.
this paper presents two methods for solving a partial differential equation of the second order, with application to the well-known Poisson equation. these methods are aimed at making a high-speed hardware solver. the...
详细信息
this paper presents two methods for solving a partial differential equation of the second order, with application to the well-known Poisson equation. these methods are aimed at making a high-speed hardware solver. the solutions presented will be a part of a hardware device simulator which is called "Virtual Device". We present simulation results to compare the two methods for solving this equation. We start with an iterative method (Gauss-Seidel method) and then end with a direct method (LU method).
Image processing is often considered a good candidate for the application of parallelprocessing because of the large volumes of data and the complex algorithms commonly encountered. this paper presents a tutorial int...
详细信息
Image processing is often considered a good candidate for the application of parallelprocessing because of the large volumes of data and the complex algorithms commonly encountered. this paper presents a tutorial introduction to the field of parallel image processing. After introducing the classes of parallelprocessing a brief review of architectures for parallel image processing is presented. Software design for low-level image processing and parallelism in high-level image processing are discussed and an application of parallelprocessing to handwritten postcode recognition is described. the paper concludes with a look at future technology and market trends.
A concept for a future integer arithmetic unit as well as a first implementation of the arithmetic unit's core as smart pixel detector chip is presented. this architecture is well-suited for a realization with 3-D...
详细信息
ISBN:
(纸本)0818685727
A concept for a future integer arithmetic unit as well as a first implementation of the arithmetic unit's core as smart pixel detector chip is presented. this architecture is well-suited for a realization with 3-D optoelectronic very large scale integrated (VLSI) circuits. Due to the use of optical interconnections running vertically to the circuit's surface no pin limitation is given. this allows massively parallelism and a higher throughput performance than in all-electronic solutions. To exploit the potential of optical interconnections in VLSI systems efficiently well-adapted low-level algorithms and architectures have to be developed. this is demonstrated for a pipelined arithmetic unit using a redundant number representation. A gate layout for the optoelectronic circuits is given as well as a specification for the necessary optical interconnection scheme linking the circuits with free-space optics. It is shown that the throughput can be increased by a factor of 10 to 50 compared to current all-electronic processors by considering state-of-the-art optical and optoelectronic technolgy.
High Performance Fortran (HPF) is the de facto standard language for writing data parallel programs. In case of applications that use indirect addressing on distributed arrays, HPF compilers have limited capabilities ...
详细信息
ISBN:
(纸本)3540649522
High Performance Fortran (HPF) is the de facto standard language for writing data parallel programs. In case of applications that use indirect addressing on distributed arrays, HPF compilers have limited capabilities for optimizing such codes on distributed memory architectures, especially for optimizing communication and reusing communication schedules between subroutine boundaries. this paper describes a dynamic approach for optimizing unstructured communication in codes with indirect addressing. the basic idea is that runtime data reflecting the communication patterns will be reused if possible. the user has only to specify which data in the program has to be traced for modifications. the experiments and results show the effectiveness of the chosen approach.
this paper discusses the main achievements of the EPIC project, whose aim was to design a high level programming environment with an associated implementation for portable parallel image processing. the project was fu...
详细信息
ISBN:
(纸本)3540649522
this paper discusses the main achievements of the EPIC project, whose aim was to design a high level programming environment with an associated implementation for portable parallel image processing. the project was funded as part of the EPSRC Portable Software Tools for parallelarchitectures (PSTPA) programme. the paper summarises new portable programming abstractions for image processing, and outlines the automatically optimising implementation which achieves portability of application code and efficiency of implementation on a closely coupled distributed memory parallel system. the paper includes timings for optimised and unoptimised versions of typical image processingalgorithms;it draws the main conclusion that it is possible to achieve portability with efficiency, for a specific application, by adopting a high level algebraic programming model, together with a transformation-based optimiser which reclaims the loss of efficiency which an algebraic approach traditionally entails.
the paper presents a fixed structure systolic array for perform a set of computationally and real-time demanding problems that frequently arise in the area of image processing. the fixed structure systolic array imple...
详细信息
ISBN:
(纸本)0780338790
the paper presents a fixed structure systolic array for perform a set of computationally and real-time demanding problems that frequently arise in the area of image processing. the fixed structure systolic array implements Faddeev Algorithm, which could be interpreted as generalised Gauss elimination. Modification of the algorithm is considered for improved stability and accuracy. the computations of the modified algorithm are presented in the form of SFG. this is followed with possible applications and additional extensions of the proposed systolic array structure. Enormous computational and real-time requirements for signal and image processing problems support development of fast application specific structures capable of improving the performance for several orders of magnitude compared to general-purpose computer architectures [1]. parallel computing is gaining increasing importance withthe evolution of algorithms for image and signal processing, advances in VLSI technology and ever-broader range of applications. In high-end computing various parallel structures from powerful supercomputers to processor arrays are employed to meet the demands of real-time problems. From a sequential computation point of view, a fast algorithm is the one using a reduced number of operations, which is an important measure of speed in sequential contest. Many DSP algorithms can serve as an example, DFT algorithm compared to several versions of FFT is probably the most evident. In parallelprocessingthe level of concurrency can become more important than the actual number of operations. Ability to parallelise an algorithm and the proportion of sequential code can severely impact the performance of the parallel structure.
the performance aspects of three decentralized instruction level parallel (ILP) execution models, namely, execution unit dependence based decentralization (EDD);control dependence based decentralization (CDD), and dat...
详细信息
the performance aspects of three decentralized instruction level parallel (ILP) execution models, namely, execution unit dependence based decentralization (EDD);control dependence based decentralization (CDD), and data dependence based decentralization (DDD), are investigated. Using a suite of important benchmarks and realistic system parameters, the performance differences resulting from the type of partitioning as well as from specific implementation issues such as the type of processing element interconnect are analyzed.
this paper presents algorithms and architectures for implementing from 1-D to multidimensional M-D digital nonrecursive filters. these architectures are very regular and support single chip implementation in VLSI, as ...
详细信息
the proceedings contain 68 papers. the special focus in this conference is on Design Methods and General Aspects. the topics include: New CAD framework extends simulation of dynamically reconfigurable logic;a language...
ISBN:
(纸本)3540649484
the proceedings contain 68 papers. the special focus in this conference is on Design Methods and General Aspects. the topics include: New CAD framework extends simulation of dynamically reconfigurable logic;a language for parametrised and reconfigurable hardware design;integrated development environment for logic synthesis based on dynamically reconfigurable FPGAs;designing for xilinx XC6200 FPGAs;perspectives of reconfigurable computing in research, industry and education;catalyst for new computing paradigms;run-time management of dynamically reconfigurable designs;acceleration of satisfiability algorithms by reconfigurable hardware;an optimized design flow for fast FPGA-based rapid prototyping;a knowledge-based system for prototyping on FPGAs;a rapid prototyping system based on java and FPGAs;prototyping new ILP architectures using FPGAs;fast floorplanning for FPGAs;a fault model for the configurable logic modules;reconfigurable hardware as shared resource in multipurpose computers;the bridge between high speed sensors and low speed computing;a reconfigurable engine for real-time video processing;an FPGA implementation of a magnetic bearing controller for mechatronic applications;exploiting contemporary memory techniques in reconfigurable accelerators;a platform for tractable virtual circuitry and reactive environment for runtime reconfiguration.
暂无评论