Field Programmable Gate Array (FPGA) architectures have emerged as an alternative means of implementing complex logic circuits providing rapid manufacturing turnaround time and low prototyping costs. this paper presen...
详细信息
Field Programmable Gate Array (FPGA) architectures have emerged as an alternative means of implementing complex logic circuits providing rapid manufacturing turnaround time and low prototyping costs. this paper presents a new FPGA architecture suitable for the application specific signal processingalgorithms and Wafer-Scale integration (WSI) Technology. the architecture must be designed for versatility, flexibility, high speed, improved logic density, and defect tolerance. the proposed FPGA architecture consists of 2 dimensional array of programmable logic elements based on look-up table, interconnection resources, and input/output (I/O) blocks. the architectural style is similar to the one used in XILINX FPGA architecture. A key variation from the commonly used FPGA is the dual switching scheme employed in the proposed architecture. the design methodology, the design tools, and results obtained by using a Segmented Channel Routing algorithm to map on it a 16 bit parallel multiplier, are presented.
the proceedings contains 80 papers from the Fourthinternationalconference on High Performance Computing. Topics discussed include: database management systems (DBMS);data migration and caching;algorithms;programming...
详细信息
the proceedings contains 80 papers from the Fourthinternationalconference on High Performance Computing. Topics discussed include: database management systems (DBMS);data migration and caching;algorithms;programming and languages;load balancing and scheduling;reconfigurable custom computing;routing;instruction level parallelism (ILP) architectures and compiler issues;parallel input/output and multithreaded systems;virtual channels;and image processing.
this paper presents a complete methodology for the automatic synthesis of VLSI architectures used in digital signal processing. Most signal processingalgorithms have the form of an n-dimensional nested loop with unit...
详细信息
ISBN:
(纸本)0780341376
this paper presents a complete methodology for the automatic synthesis of VLSI architectures used in digital signal processing. Most signal processingalgorithms have the form of an n-dimensional nested loop with unit uniform loop carried dependencies. We model such algorithms with generalized UET grids. We calculate the optimal makespan for the generalized UET grids and then we establish the minimum number of systolic cells required achieving the optimal makespan. We present a complete methodology for the hardware synthesis of the resulting architecture, based on VHDL. this methodology automatically detects all necessary computation and communication elements and produces optimal layouts. the complexity of our proposed scheduling policy is completely independent of the size of the nested loop and depends only on its dimension, thus being the most efficient (in terms of complexity) known to us. All these methods were implemented and incorporated in an integrated software package which provides the designer with a powerful parallel design environment, from high level signal processing algorithmic specifications to low-level (i.e., actual layouts) optimal implementation. the evaluation was performed using well-known algorithms from signal processing.
In this paper we review the architectures designed for wavelet transforms, withthe purpose to highlight their suitability for inclusion in codec systems. Indeed, common VLSI cost functions (such as AT2) are insuffici...
详细信息
Since natural language parsing is a computationally intensive task, the parallel parsing of natural language seems a promising choice. this work describes boththe Eu-PAGE meta-compiler for the Eurotra formalism, a to...
详细信息
ISBN:
(纸本)0818682043
Since natural language parsing is a computationally intensive task, the parallel parsing of natural language seems a promising choice. this work describes boththe Eu-PAGE meta-compiler for the Eurotra formalism, a tool that automatically generates parallel natural language parsers, and the Dialogos parser, a parallel parser for the Greek language generated by Eu-PAGE. parallel parsers generated by Eu-PAGE are based on finite state machines, employ coarse grained parallelism, and are portably implemented on top of two parallel software platforms: PVM and Orchid. Orchid uses light weight processes as the basic unit of parallelism, enhanced with advanced operating system facilities. the collected experimental results so far demonstrate satisfactory speed-ups of the parallel implementations compared to the sequential one.
the proceedings contain 98 papers. the special focus in this conference is on Image Databases. the topics include: Content-centric computing in visual systems;holographic image representations;image databases are not ...
ISBN:
(纸本)3540635084
the proceedings contain 98 papers. the special focus in this conference is on Image Databases. the topics include: Content-centric computing in visual systems;holographic image representations;image databases are not databases with images;customizing MPEG video compression algorithms to specific application;a new lossless image compression algorithm based on arithmetic coding;analysis of a two step MPEG video system;dedicated hardware processors for a real-time image data pre-processing implemented in FPGA structure;wavelet transform architectures;lossless compression of pre-press images using a novel colour decorrelation technique;real time hardware architecture for visual robot navigation;speeding up fractal encoding of images using a block indexing technique;adding associative meshes to the PACCO I.P. environment;smoothing of MPEG multi-program video coding for packet networks;audio-visual processing for scene change detection;weighted walkthroughs in retrieval by content of pictorial data;a new approach to computation of curvature scale space image for shape similarity retrieval;optimal keys for image database indexing;the terminological image retrieval model;novel block truncation coding of image sequences for limited-color display;image registration with shape mixtures;image retrieval by color regions;interactive model-based matching retrieval;histogram families for color-based retrieval in image databases;image retrieval by multidimensional elastic matching;optimization methods in multilayer classifier networks for automatic control of lamellibranch larva growth and neural networks for region detection.
the proceedings contain 44 papers. the special focus in this conference is on Automatic Data Distribution and Locality Enhancement. the topics include: Cross-loop reuse analysis and its application to cache optimizati...
ISBN:
(纸本)3540630910
the proceedings contain 44 papers. the special focus in this conference is on Automatic Data Distribution and Locality Enhancement. the topics include: Cross-loop reuse analysis and its application to cache optimizations;locality analysis for distributed shared-memory multiprocessors;data distribution and loop parallelization for shared-memory multiprocessors;data localization using loop aligned decomposition for macro-dataflow processing;exploiting monotone convergence functions in parallel programs;exact versus approximate array region analyses;context-sensitive interprocedural analysis in the presence of dynamic aliasing;initial results for glacial variable analysis;compiler algorithms on if-conversion, speculative predicate assignment and predicated code optimizations;determining asynchronous pipeline execution times;compiler techniques for concurrent multithreading with hardware speculation support;resource-directed loop pipelining;integrating program optimizations and transformations withthe scheduling of instruction level parallelism;parametric computation of margins and of minimum cumulative register lifetime dates;global register allocation based on graph fusion;automatic parallelization for non-cache coherent multiprocessors;eliminating lock overhead in automatically parallelized object-based programs;optimal reordering and mapping of a class of nested-loops for parallel execution;communication-minimal tiling of uniform dependence loops;communication-minimal partitioning of parallel loops and data arrays for cache-coherent distributed-memory multiprocessors and resource-based communication placement analysis.
this contribution describes a new class of arithmetic architectures for Galois fields GF(2k). the main applications of the architecture are public-key systems which are based on the discrete logarithm problem for elli...
详细信息
Object dataflow is a popular approach used in parallel rendering. the data representing the 3D scene is statically distributed among processors and objects are fetched and cached only on demand. Most previous object d...
详细信息
Object dataflow is a popular approach used in parallel rendering. the data representing the 3D scene is statically distributed among processors and objects are fetched and cached only on demand. Most previous object dataflow methods were implemented on shared memory architectures and exploited spatial coherency to reduce hardware cache misses. In this paper, we propose an efficient model for object dataflow parallel volume rendering on message passing machines. the algorithm is introduced and its ray storage mechanism is used to support latency hiding by postponing computation on inactive rays. Memory usage is optimized by letting objects migrate and replicate at different processors rather than the common static assignments. Our cache-only-memory approach uses a distributed-directory scheme to trace the location of objects at other nodes. A mechanism to minimize network congestion was implemented which optimizes channel utilization. Unlike previous methods, our approach can benefit from temporal coherence and effectively minimizes communication costs during animation on limited-bandwidth multiprocessing environments. We report results of the algorithm's implementation on several platforms like Cray T3D, Convex SPP and DEC-alpha cluster of workstations (COWs), and achieved higher efficiency and scalability than existing algorithms.
DataScalar architectures improve memory system performance by running computation redundantly across multiple processors, which are each tightly coupled with an associated memory. the program data set (and/or text) is...
详细信息
ISBN:
(纸本)9780897919012
DataScalar architectures improve memory system performance by running computation redundantly across multiple processors, which are each tightly coupled with an associated memory. the program data set (and/or text) is distributed across these memories. In this execution model, each processor broadcasts operands it loads from its local memory to all other units. In this paper, we describe the benefits, costs, and problems associated withthe DataScalar model. We also present simulation results of one possible implementation of a DataScalar system. In our simulated implementation, six unmodified SPEC95 binaries ran from 7% slower to 50% faster on two nodes, and from 9% to 100% faster on four nodes, than on a system with a comparable, more traditional memory system. Our intuition and results show that DataScalar architectures work best with codes for which traditional parallelization techniques fail. We conclude with a discussion of how DataScalar systems may accommodate traditional parallelprocessing, thus improving performance over a much wider range of applications than is currently possible with either model.
暂无评论