there is a clear turning point in the development history of reconfigurable architectures. Larger execution units (EU) used to be adopted in special domain applications to improve the cost performance of programmable ...
详细信息
ISBN:
(纸本)9783642104848
there is a clear turning point in the development history of reconfigurable architectures. Larger execution units (EU) used to be adopted in special domain applications to improve the cost performance of programmable architectures. However, after the granularity of EUs came up to the level of arithmetic logic unit (ALU) and multiplication accumulation unit (MAC), the trend almost stopped. At present, a great number of reconfigurable architectures make use of simple Von-Neumann-architecture processing elements (PE) with such EUs as ALU and MAC. Actually, today's application algorithms are far different from the previous counterparts withthe development over the last decades. Larger operation units can be extracted from common application algorithms. Without the coherent enhancement of EUs, it is difficult for reconfigurable architectures to replace the application specific integrated circuits (ASIC) used for most of current high-throughput applications. In order to further improve the performance/cost ratio, this paper presents a novel architecture with very-coarse-grained EUs and fully-data-driven mechanism.
Multilingual corpora are becoming an essential resource for work in multilingual natural language processing. the aim of this paper is to investigate the effects of applying a clustering technique to parallel multilin...
详细信息
ISBN:
(纸本)9783642033476
Multilingual corpora are becoming an essential resource for work in multilingual natural language processing. the aim of this paper is to investigate the effects of applying a clustering technique to parallel multilingual texts. It is interesting to look at, the differences of the cluster mappings and the tree structures of the clusters. the effect;of reducing die set of terms considered in clustering parallel corpora is also studied. After that;a genetic-based algorithm is applied to optimize the weights of terms considered in clustering the texts to classify unseen examples of documents. Specifically, the aim of this work is to introduce the tools necessary for this task and display a set of experimental results and issues which have become apparent.
parallel bit stream algorithms exploit the SWAR (SIMD within a register) capabilities of commodity processors in high-performance text processing applications such as UTF-8 to UTF-16 transcoding, XML parsing, string s...
详细信息
parallel bit stream algorithms exploit the SWAR (SIMD within a register) capabilities of commodity processors in high-performance text processing applications such as UTF-8 to UTF-16 transcoding, XML parsing, string search and regular expression matching. Direct architectural support for these algorithms in future SWAR instruction sets could further increase performance as well as simplifying the programming task. A set of simple SWAR instruction set extensions are proposed for this purpose based on the principle of systematic support for inductive doubling as an algorithmic technique. these extensions are shown to significantly reduce instruction count in core parallel bit stream algorithms, often providing a 3X or better improvement. the extensions are also shown to be useful for SWAR programming in other application areas, including providing a systematic treatment for horizontal operations. An implementation model for these extensions involves relatively simple circuitry added to the operand fetch components in a pipelined processor.
the proceedings contain 33 papers. the topics discussed include: smart content delivery on the Internet;parallel query processing in databases on multicore architectures;evaluation of a novel load-balancing algorithm ...
详细信息
ISBN:
(纸本)9783540695004
the proceedings contain 33 papers. the topics discussed include: smart content delivery on the Internet;parallel query processing in databases on multicore architectures;evaluation of a novel load-balancing algorithm with variable granularity;a static multiprocessor scheduling algorithm for arbitrary directed task graphs in uncertain environments;architecture aware partitioning algorithms;a simple and efficient fault-tolerant adaptive routing algorithm for meshes;fault tolerance in the biswapped network;a general approach to predict the performance order of TSP family problems;examining the feasibility of reconfigurable models for molecular dynamics simulation;parallel simulated annealing for materialized view selection in data warehousing environments;an optimization service for DSP multicomputers;and a non-blocking multithreaded architecture with support for speculative threads.
the proceedings contain 85 papers. the topics discussed include: seamless image stitching algorithm using radiometric lens calibration for high resolution optical microscopy;ANFIS supported question classification in ...
ISBN:
(纸本)9781424434282
the proceedings contain 85 papers. the topics discussed include: seamless image stitching algorithm using radiometric lens calibration for high resolution optical microscopy;ANFIS supported question classification in computer adaptive testing (CAT);weighted majority voting for face recognition from low resolution video sequences;an auction based mathematical model and heuristics for resource co-allocation problem in grids and clouds;fault classification in gears using support vector machines (SVMs) and signal processing;constructing robot's model of external environment on basis of linguistic relations and generalized constraints;musical harmonization with words: realizability, potential issues and challenges;principal component based classification for text-independent speaker identification;a new parallel programming language fortress: features and applications;economic order quantity model with backorders using trapezoidal fuzzy numbers;and EOG controlled mobile robot using radial basis function networks.
A versatile family of interconnection networks alternative to hypercubes, called Metacubes, has been proposed for building extremely large scale multiprocessor systems with a small number of links per node. A Metacube...
详细信息
A versatile family of interconnection networks alternative to hypercubes, called Metacubes, has been proposed for building extremely large scale multiprocessor systems with a small number of links per node. A Metacube MC(k, m) connects 2(2km + k) nodes with only k + in links per node. Metacube can be used to build parallel computing systems of very large scale with a small number of links per node. In this paper, we propose a new presentation of Metacube for algorithmic design. Based on the new presentation, we give efficient algorithms for parallel prefix computation and parallel sorting on Metacubes, respectively. the algorithm for prefix computation runs in 2(k)m (k + 1) + k communication steps and 2(k + 1)m + 2k computation steps on MC(k, m). the sort algorithm runs in O(2(k)m + k)(2) computation steps and O(2(k)m (2k + 1) + k)(2) communication steps on MC(k, m).
A novel VLSI architecture for digital multimedia post-processing system is presented. First, the sequence of the post-processing de-blocking filter is modified, and the filter algorithm is optimized. Second, the frame...
详细信息
this paper presents a parallel ATPG to speed up the test pattern gen- eration process. the ATPG adopts the master-slave architecture to reduce the inter-process communication. Also, a smart fault list broadcast and fa...
详细信息
A node ranking of a graph G = (V, E) is a proper node coloring C: V → such that any path in G with end nodes x, y fulfilling C(x) = C(y) contains an internal node z with C (z)> C(x). In the on-line version of the ...
详细信息
Low-Density Parity-Check (LDPC) codes are among the best error correcting codes known and have been recently adopted by data transmission standards, such as the second generation for Satellite Digital Video Broadcasti...
详细信息
ISBN:
(纸本)9783540929895
Low-Density Parity-Check (LDPC) codes are among the best error correcting codes known and have been recently adopted by data transmission standards, such as the second generation for Satellite Digital Video Broadcasting (DVB-S2) and WiMAX. LDPC codes are based on sparse parity-check matrices and use message-passing algorithms, also known as belief propagation, which demands very intensive computation. For that reason, VLSI dedicated architectures have been proposed in the past few years, to achieve real-time processing. this paper proposes a new flexible and programmable approach for LDPC decoding on a heterogeneous multicore Cell Broadband Engine. (Cell/B.E.) architecture. Very compact data structures were developed to represent the bipartite graph for both regular and irregular LDPC codes. they are used to map the irregular behavior of the Sum-Product Algorithm (SPA) used in LDPC decoding into a computing model that expresses parallelism and locality of data by decoupling computation and memory accesses. this model can be used in general for exploiting capabilities of modern multicore architecture. For the Cell/B.E., in particular, stream-based programs were developed for simultaneous multicodeword LDPC decoding by using SIMD features and a low-latency DMA-based data communication mechanism between processors. Experimental results show significant throughputs that compare well with state-of-the-art VLSI-based solutions.
暂无评论