this paper describes an architecture dedicated to the real-time processing of census correlation in the context of the realization of passive stereovision sensors. Although DSP circuits have dramatically increased the...
详细信息
ISBN:
(纸本)9781424403127
this paper describes an architecture dedicated to the real-time processing of census correlation in the context of the realization of passive stereovision sensors. Although DSP circuits have dramatically increased their performances in terms of frequency (about 600 MHz today), DSP cores (several Multipliers Accumulators) and pipelines (Super Harvard architectures for example), FPGA circuits remain the best way to design massive parallelarchitectures when ultra fast algorithms computation are needed like it is the case in real time vision systems for collision avoidance.
We present the first parallel algorithm for building a Hausdorff Voronoi diagram (HVD). Our algorithm is targeted towards cluster computing architectures and computes the Hausdorff Voronoi diagram for non-crossing obj...
详细信息
ISBN:
(纸本)0769526365
We present the first parallel algorithm for building a Hausdorff Voronoi diagram (HVD). Our algorithm is targeted towards cluster computing architectures and computes the Hausdorff Voronoi diagram for non-crossing objects in time O(nlog(4)n/p)for input size n and p processors. In addition, our parallel algorithm also implies a new sequential HVD algorithm that constructs HVDs for noncrossing objects in time O(n log(4) n). this improves on previous sequential results and solves an open problem posed by Papadopoulou and Lee [18].
this paper discusses fast parallelalgorithms for evaluating several centrality indices frequently used in complex network analysis. these algorithms have been optimized to exploit properties typically observed in rea...
详细信息
ISBN:
(纸本)0769526365
this paper discusses fast parallelalgorithms for evaluating several centrality indices frequently used in complex network analysis. these algorithms have been optimized to exploit properties typically observed in real-world large scale networks, such as the low average distance, high local density, and heavy-tailed power law degree distributions. We test our implementations on real datasets such as the web graph, protein-interaction networks, movie-actor and citation networks, and report impressive parallel performance for evaluation of the computationally intensive centrality metrics (betweenness and closeness centrality) on high-end shared memory symmetric multiprocessor and multithreaded architectures. To our knowledge, these are the first parallel implementations of these widely-used social network analysis metrics. We demonstrate that it is possible to rigorously analyze networks three orders of magnitude larger than instances that can be handled by existing network analysis (SNA) software packages. For instance, we compute the exact betweenness centrality value for each vertex in a large US patent citation network (3 million patents, 16 million citations) in 42 minutes on 16 processors, utilizing 20GB RAM of the IBM p5 570. Current SNA packages on the other hand cannot handle graphs with more than hundred thousand edges.
Methods for an efficient mapping of algorithms to parallelarchitectures are of utmost importance because many state-of-the-art embedded digital systems deploy parallelism to increase their computational power. this p...
详细信息
ISBN:
(纸本)9780889866386
Methods for an efficient mapping of algorithms to parallelarchitectures are of utmost importance because many state-of-the-art embedded digital systems deploy parallelism to increase their computational power. this paper deals withthe mapping of loop programs onto processor arrays implemented in an FPGA or available as (reconfigurable) coarsegrained processor architectures. Most existing work is closely related to approaches from the DSP domain and is not able to exploit the full parallelism of a given algorithm and the computational potential of a typical 2-dimensional array. In contrast, we present a mapping methodology which incorporates many important parameters of the target architecture in one approach. these are: number of processing elements, resources of the data path and memory within a processing element, and interconnection within the processor array. Based on these parameters, we formulate an optimization problem whose solution specifies an efficient mapping of an algorithm to the target architecture. We can optimize for speed of the algorithm and/or hardware cost caused by the communication and computation resources of the architecture.
A language for semi-structured documents, XML has emerged as the core of the web services architecture, and is playing crucial roles in messaging systems, databases, and document processing. However, the processing of...
详细信息
ISBN:
(纸本)9781424403431
A language for semi-structured documents, XML has emerged as the core of the web services architecture, and is playing crucial roles in messaging systems, databases, and document processing. However, the processing of XML documents has a reputation for poor performance, and a number of optimizations have been developed to address this performance problem from different perspectives, none of which have been entirely satisfactory. In this paper, we present a seemingly quixotic, but novel approach: parallel XML parsing. parallel XML parsing leverages the growing prevalence of multicore architectures in all sectors of the computer market, and yields significant performance improvements. this paper presents our design and implementation of parallel XML parsing. Our design consists of an initial preparsing phase to determine the structure of the XML document, followed by a full, parallel parse. the results of the preparsing phase are used to help partition the XML document for data parallelprocessing. Our parallel parsing phase is a modification of the libxml2 [1] XML parser, which shows that our approach applies to real-world, production quality parsers. Our empirical study shows our parallel XML parsing algorithm can improved the XML parsing performance significantly and scales well.
Based on Horner' rule and Baugh-Wooley algorithm, this paper presents two novel bit-level parallel array algorithms of 2's complement multiplication, and the algorithms have been mapped to systolic arrays by u...
详细信息
ISBN:
(纸本)0780395840
Based on Horner' rule and Baugh-Wooley algorithm, this paper presents two novel bit-level parallel array algorithms of 2's complement multiplication, and the algorithms have been mapped to systolic arrays by using linear mapping techniques. We propose two efficient systolic arrays of multiply and accumulate (MAC) operation and we also describe the vector-vector and matrix-matrix multiplication that can be efficiently implemented by using the MAC arrays. the two systolic arrays have high performance (low time complexity, space complexity and latency) and consume smaller gate-area in comparison to other architectures. It is suitable for VLSI implementation for its regularity and modularity.
this paper presents novel architectures for embedded block coder and decoder using three concurrent passes to speed-up processing by about three times when compared to the conventional processing of the three passes s...
详细信息
ISBN:
(纸本)9806560701
this paper presents novel architectures for embedded block coder and decoder using three concurrent passes to speed-up processing by about three times when compared to the conventional processing of the three passes sequentially. these are some of the core modules in a JPEG 2000 codec, which bring about compression and de-compression of still images. the embedded block codec comprises a bit plane coder and three binary arithmetic coder at the encoder end and, three binary arithmetic decoder and a bit plane decoder at the decoder end. the new architectures for the embedded block encoder and decoder have been coded using Verilog and FPGA implementations of the same have been realized using Virtex 4 Xilinx devices. the FPGA gate counts for the concurrent three pass embedded block encoder and decoder are about 26, 000 gates each and is capable of processing an image of size, 512×512 pixels in about 123 ms. the embedded block codec can be reconfigured to process different cell sizes in 400 micro second.
the emergence of networked eBusiness and the wave of service-oriented computing facilities create new challenges for automating inter-enterprise business process management and eContracting. this development leads to ...
详细信息
ISBN:
(纸本)076952558X
the emergence of networked eBusiness and the wave of service-oriented computing facilities create new challenges for automating inter-enterprise business process management and eContracting. this development leads to strategical benefits for agile enterprises, but also to new challenges on enterprise system architectures and platforms. this paper discusses the techniques of introducing trust-related decisions into eContracting, and their effects. this work enhances the web-Pilarcos project results on B2B interoperability middleware;the architectural model supported comprises of autonomous business services forming loosely-coupled, eContract-governed eCommunities.
this paper addresses the use of low power techniques applied to FIR filter and FFT dedicated datapatharchitectures. New low power arithmetic operators are used as basic modules. In FIR filter and FFT algorithms, 2...
详细信息
ISBN:
(纸本)0387334025
this paper addresses the use of low power techniques applied to FIR filter and FFT dedicated datapatharchitectures. New low power arithmetic operators are used as basic modules. In FIR filter and FFT algorithms, 2's complement is the most common encoding for signed operands. We use a new architecture for signed multiplication, which maintains the pure form of an array multiplier. this architecture uses radix-2(m) encoding, which leads to a reduction of the number of partial lines. Each group of m bits uses the Gray code, thus potentially further reducing the switching activity both internally and at the inputs. the multiplier architecture is applied to the DSP architectures and compared withthe state of the art. Due to the characteristics of the FIR filter and FFT algorithms, which involve multiplications of input data with appropriate coefficients, the best ordering of these operations in order to minimize the power consumption in the implemented architectures is also investigated. As will be shown, the use of the low power operators with an appropriate choice of coefficients can contribute for the reduction of power consumption of the FIR and FFT architectures. Additionally, a new algorithm for the partitioning and ordering of the coefficients is presented. this technique is experimented in a Semi-parallel architecture which enables speed-up transformation techniques.
暂无评论