A high-speed video rate 8 /spl times/ 8 inverse discrete cosine transform (IDCT) processor using a distributed arithmetic architecture is presented. 64-point one-dimensional IDCT processing units are simultaneously op...
详细信息
A high-speed video rate 8 /spl times/ 8 inverse discrete cosine transform (IDCT) processor using a distributed arithmetic architecture is presented. 64-point one-dimensional IDCT processing units are simultaneously operated with 8 clock cycles. With these 64 fully parallel units and three-stage pipeline structure for each unit, the latency time of this proposed architecture is only 37 cycles. ROM banks containing IDCT coefficients are minimized to 12 by applying two pixel bits per clock. This VLSI is fabricated in 1.0-/spl mu/m double metal CMOS process with 120 mm/sup 2/ die area. It consumes approximately 3 watts at a 5 volt operating voltage and 50 MHz master clock frequency. Since critical path delay is given as 15.5 nsec, this proposed chip has enough speed for digital HDTV applications.< >
The proceedings contain 130 papers. The special focus in this conference is on Vector andparallelprocessing. The topics include: Tolerating faults in synchronization networks;on incomplete hypercubes;reducing networ...
ISBN:
(纸本)9783540558958
The proceedings contain 130 papers. The special focus in this conference is on Vector andparallelprocessing. The topics include: Tolerating faults in synchronization networks;on incomplete hypercubes;reducing network hardware quantity by employing multi-processor cluster structure in distributed memory parallel processors;connection machine results for pyramid embedding algorithms;interconnection networks based on block designs;partitioning and mapping communication graphs on a modular reconfigurable parallel architecture;generalized shuffle-exchange networks;a mechanism for integrating a visualization tool with a symbolic debugger;the software-monitor DELTA-T and its use for performance measurements of some farming variants on the multi-transputer system damp;visualization of message passing parallel programs;parallel physical optimization algorithms for data mapping;profiling on a massively parallel computer;a multiprocessor multiwindow visualization subsystem;data race detection based on execution replay for parallelapplications;the C_NET programming environment: an overview;P++, a C++ virtual shared grids based programming environment for architecture-independent development of structured grid applications;detection of concurrency-related errors in joyce;invariance properties in distributed systems;synchronization of parallel processes in distributed systems;statistical probabilistic clock synchronization algorithm;a SIMD architecture for medical imaging;computing the inner product on reconfigurable buses with shift switching;a novel sorting array processor;the time-parallel solution of parabolic partial differential equations using the frequency-filtering method;comparing the DAP, meiko and suprenum with a fluid dynamic benchmark andparallel detection algorithm of radar signals.
Management of the communications among a set of concurrent processes arises in many applications and is a central concern in parallel computing anddistributed computing. In this paper we introduce MANIFOLD: a coordin...
详细信息
Dynamic load balancing is an important technique when developing applications with unpredictable load distribution on distributed memory multicomputers. A tool, Dynamo, that can be used to utilize dynamic load balanci...
详细信息
We present analytical performance models for the numerical factorization phase of the multifrontal method for sparse matrices. Using a concise characterization of parallel architectures, we provide upper-bound estimat...
详细信息
The major problem in three-dimensional geometric applications such as the computation of arbitrary polyhedra intersections is the great diversity of possible configurations of geometric objects. Thus, even seemingly s...
详细信息
We describe the implementation and performance of a three dimensional particle simulation distributed between a Thinking Machines CM-2 and a Cray Y-MP. These are connected by a combination of two high-speed networks;a...
详细信息
P++ is an innovative parallel array class library for structured grid applications on distributed memory multiprocessor architectures. It is implemented in standard C++ using a serial array class library and a portabl...
详细信息
The authors describe the implementation and performance of a three dimensional particle simulation distributed between a Thinking Machines CM-2 and a Cray Y-MP. These are connected by a combination of two high-speed n...
详细信息
The authors describe the implementation and performance of a three dimensional particle simulation distributed between a Thinking Machines CM-2 and a Cray Y-MP. These are connected by a combination of two high-speed networks; a high performance parallel interface (HIPPI) and an optical network (Ultra Net). This is the first application to use this configuration at NASA Ames Research Center. The authors describe their experience implementing and using the application and report the results of several timing measurements. They show that the distribution of applications across disparate supercomputing platforms is feasible and has reasonable performance. In addition, several practical aspects of the computing environment are discussed.< >
Digital simulators are widely used in the design of digital integrated circuits. The complexity of these circuits is increasing, leading to higher simulation times. One strategy to increase simulation speed is paralle...
Digital simulators are widely used in the design of digital integrated circuits. The complexity of these circuits is increasing, leading to higher simulation times. One strategy to increase simulation speed is parallelprocessing. In this paper the implementation of the DACAPO-III simulation system is described in two steps. In the first step this system was implemented on one transputer and in a second step, on multiple transputers. For synchronization between the processors the 'time warp' method was chosen. Problems and results in using parallelprocessing for distributed simulation are discussed.
暂无评论