A new generation of high performance programmable digital signal processors (DSPs) has a highly-integrated parallel architecture, incorporating special-purpose hardware features, on-chip memory and multiple processors...
详细信息
ISBN:
(纸本)0780335295
A new generation of high performance programmable digital signal processors (DSPs) has a highly-integrated parallel architecture, incorporating special-purpose hardware features, on-chip memory and multiple processors into a single chip. For such single-chip multiprocessor DSPs, however, a sophisticated performance monitoring tool is essential to achieve the maximum performance. In this paper, we discuss the requirements and functionality of performance monitoring tools suitable for single-chip multiprocessor DSPs. As a specific example, we describe a performance monitoring tool developed for Texas Instruments' TMS320C80 (MVP), MVP Performance Monitor (MPM), which satisfies these requirements and functionality. the effectiveness of the MPM is demonstrated using an 8x8 block-based discrete cosine transform (DCT) implementation. An overall speed-up of 4.67 was achieved by using the MPM.
作者:
Lenke, MLRR-TUM
Lehrstuhl für Rechnertechnik und Rechnerorganisation Institut für Informatik Technische Universit?t München 80290 München Germany
Typical applications of the so-called Grand Challenges need massively parallel computer system architectures. Tools like parallel debuggers, performance analysers and visualizers help the code designer to develop effi...
详细信息
Typical applications of the so-called Grand Challenges need massively parallel computer system architectures. Tools like parallel debuggers, performance analysers and visualizers help the code designer to develop efficient parallelalgorithms. Such tools merely support the development cycle. But technical and scientific engineers who make use of parallel high-performance computing applications, e.g. numerical simulation algorithms in computational fluid dynamics (CFD), must be supported in their engineering work by another kind of tool. A tool for the application cycle is required because old, conventional suggestions regarding the arrangement for the application cycle rely on strictly sequential procedures. they are due to the heritage of traditional work on former vector computers. that formative influence is still felt in today's arrangements for the application cycle, prevents a more efficient engineering work and, therefore, must be overcome. New tool conceptions have to be introduced to enable on-line interaction between the technical and scientific engineers and their running parallel simulation. VIPER stands for VIsualization of parallel numerical simulation algorithms for Extended Research and offers physical parameters of the mathematical model and parameters of the numerical method as objects of a graphical user tool interface for online observation and online modification. A special client-server-client process architecture implementation enables technical and scientific engineers who are sitting at their graphic workstation to interact withtheir parallel simulation algorithms running on a remote parallel computer system. the VIPER prototype is applied on ParNsflex which is a parallel Navier-Stokes solver for real world aero-dynamic problems. A Paragon XP/S was selected as test parallel computer system. A first evaluation indicates the superiority of the VIPER conception against conventional procedures. Copyright (C) 1996 Published by Elsevier Science L
the problem of testability analysis for data-processing oriented architectures is considered. In particular, this paper concentrates on the analysis of pipelined architectures containing registers which act as data st...
详细信息
ISBN:
(纸本)0780336399
the problem of testability analysis for data-processing oriented architectures is considered. In particular, this paper concentrates on the analysis of pipelined architectures containing registers which act as data storage. A testability analyzer is proposed which accepts a RTL description of a complex device and automatically identifies the possible critical areas, i.e. those areas which seems the more difficult to test. the proposed testability analysis allows significant reduction of the area overhead and the test cost required for such kind of devices.
Recent advances in FPGA technology offer a suitable environment for massively parallel, fine-grain array architectures. the paper gives geometric criteria for an optimal ''jigsaw tessellated'' processo...
详细信息
ISBN:
(纸本)078033650X
Recent advances in FPGA technology offer a suitable environment for massively parallel, fine-grain array architectures. the paper gives geometric criteria for an optimal ''jigsaw tessellated'' processor cell, and cost function for cell placement. the paper demonstrates the use of FPGA-based processor arrays by the implementation results of cellular image processingalgorithms. the outlined concepts are being implemented in a placement-routing tool.
A high speed two-dimensional surface photovoltage (SPV) sensing system based on a digital data processing was developed and applied to an in-situ monitoring of chemical images. the SPV signal generated by a scanning l...
详细信息
A high speed two-dimensional surface photovoltage (SPV) sensing system based on a digital data processing was developed and applied to an in-situ monitoring of chemical images. the SPV signal generated by a scanning light spot was directly memorized in a computer and signal integration of all measurement points was carried out in parallel in allocated memories by numerical calculation, which made it possible in principle to reduce the measurement time as short as the scanning time of the light spot. For the formation of the image of 6400 data points, the proposed system needs about 30 min which is at least one order of magnitude faster than that of a conventional analog SPV system.
We study the scalability of 2-D discrete wavelet transform algorithms on fine-grained parallelarchitectures. the principal operation in the 2-D DWT is the filtering operation used to implement the filter banks of the...
详细信息
In this paper we compare three routing algorithms for massively parallelarchitectures, each offering an increasing degree of adaptivity: a deterministic algorithm, a minimal adaptive based on Duato's methodology ...
详细信息
To efficiently exploit the potential of future massively parallel and fine-grained optoelectronic processors well-adapted low-level algorithms have to be developed. So called bit and CORDIC algorithms are well suited ...
详细信息
To efficiently exploit the potential of future massively parallel and fine-grained optoelectronic processors well-adapted low-level algorithms have to be developed. So called bit and CORDIC algorithms are well suited for that purpose. We present a concept for an optoelectronic 3D processor based on this particular algorithm class. this processor allows a hard-wired execution of 8 complex functions like logarithm, exponential function, sine, cosine, arc tangent, square root, multiplication and division without using sophisticated multiplication units. the strength of the 3D processor is based on lots of off-chip interconnections as it is a aspired in smart pixel systems using optical I/O arrays. We compared different smart pixel architectures based on bit serial and bit parallel approaches as well as a redundant number representation. All approaches showed nearly the same throughput, whereas the redundant approach offers the best latency. Furthermore, the requirements for the electronic logic and the optical interconnection scheme are specified.
this paper presents the design of a dedicated parallel architecture for connected component analysis. Categorized in one-dimensional array processors, for an image of n/spl times/n pixels, the proposed architecture ha...
详细信息
暂无评论