A novel fast scheme for Discrete Wavelet Transform (DWT) was lately introduced under the name of lifting scheme [4, 10]. this new scheme presents many advantages over the convolution-based approach [10, 11]. For insta...
详细信息
ISBN:
(纸本)9780769532875
A novel fast scheme for Discrete Wavelet Transform (DWT) was lately introduced under the name of lifting scheme [4, 10]. this new scheme presents many advantages over the convolution-based approach [10, 11]. For instance it is very suitable for parallelization. In this paper we present two new FPGA-based parallel implementations of the DWT lifting-based scheme. the first implementation uses pipelining, parallelprocessing and data reuse to increase the speed up of the algorithm. In the second architecture a controller is introduced to deploy dynamically a suitable number of clones accordingly to the available hardware resources on a targeted environment. these two architectures are able of processing large size incoming images or multi framed images in real-time. the simulations driven on a Xilinx Virtex-5 FPGA environment has proven the practical efficiency of our contribution. In fact, the first architecture has given an operating frequency of 289 MHz, and the second architecture demonstrated the controller's capabilities of determining the true available resources needed for a successful deployment of independent clones, over a targeted FPGA environment and processingthe task in parallel.
Dynamically reconfigurable hardware has already been deployed for accelerating computationally demanding applications. Some of these hardware architectures allow run time reconfiguration but this usually leads to a la...
详细信息
Dynamically reconfigurable hardware has already been deployed for accelerating computationally demanding applications. Some of these hardware architectures allow run time reconfiguration but this usually leads to a large reconfiguration overhead. the advantage of run time reconfiguration is that it allows new algorithmic solutions for many applications. To study the potential of frequent run time reconfiguration it is interesting to investigate its costs and benefits from an abstract point of view and to develop new architectural concepts. Multi-level reconfigurable architectures are one such concept that introduces several levels of reconfiguration. this paper deals with new types of multi-level reconfigurable architectures. the corresponding problem of finding the best granularity for different reconfiguration levels is formulated and investigated. Although this problem is shown to be NP-complete, an interesting restricted subcase is solved optimally in polynomial time. For the general case, a good heuristic is proposed that is based on solutions for the restricted case. Results on three example applications show that the reconfiguration cost can be reduced withthe new architectures. Based on a proposed measure of relative efficiency it is also shown that the new architectures are more efficient so that they obtain a larger reconfiguration cost reduction with less additional hardware.
this paper proposes a parallel architecture for a fast three step search (FTSS) algorithm, which is used in motion estimation. FTSS algorithm involves reduced number of search points and is thus less computationally e...
详细信息
ISBN:
(纸本)9781424422050
this paper proposes a parallel architecture for a fast three step search (FTSS) algorithm, which is used in motion estimation. FTSS algorithm involves reduced number of search points and is thus less computationally expensive compared to the standard three step search (TSS) algorithm. Degradation of performance while applying the FTSS algorithm to several standard images has been shown to be insignificant compared to the standard TSS algorithm. the proposed architecture uses only three processing elements accompanied with use of intelligent data arrangement and memory configuration. A technique for reducing external memory accesses has also been developed. the proposed architecture for FTSS provides an efficient solution for applications requiring real-time motion estimations, because it requires smaller area and power than what would be required to implement TSS. the proposed architecture provides the solution for low bit-rate video applications like video telephony and teleconferencing.
An analysis of a parallel solution of N-2-1 Puzzle using clusters, is presented. this problem is interesting due to its complexity and related applications, particularly in the field of robotics. A variation of classi...
详细信息
ISBN:
(纸本)9789537138127
An analysis of a parallel solution of N-2-1 Puzzle using clusters, is presented. this problem is interesting due to its complexity and related applications, particularly in the field of robotics. A variation of classic heuristics for forecasting the work to be done in order to reach a solution is analyzed, and it is shown that its use significantly improves the time of sequential algorithm A*. then, a parallel solution on a distributed architecture is presented and speedup is analyzed based on the number of processors, efficiency, and the possible superlinearity when scaling the problem.
parallelization of operations is of utmost importance for efficient implementation of Public Key Cryptography algorithms. Starting with a classification of parallelization methods at different abstraction levels of pu...
详细信息
parallelization of operations is of utmost importance for efficient implementation of Public Key Cryptography algorithms. Starting with a classification of parallelization methods at different abstraction levels of public key algorithms, we propose a novel memory architecture for elliptic curve implementations with multiple modular multiplier units. this architecture is well-suited for different point addition and doubling algorithms over GF(p) to be implemented on FPGAs. It allows the execution time to scale withthe number of modular multipliers and exhibits nearly no overhead compared to the mere runtime of the multipliers. the advantages of this distributed memory architecture are demonstrated by means of two different point addition and doubling algorithms.
Visualization is one of the most important applications of computer graphics. To have a parallel infrastructure for visualization, some technologies would be needed. We identify the state-of-the-art technologies that ...
详细信息
ISBN:
(纸本)9780769533599
Visualization is one of the most important applications of computer graphics. To have a parallel infrastructure for visualization, some technologies would be needed. We identify the state-of-the-art technologies that have prepared for building such an infrastructure and examine a collection of applications that would benefit from it. We consider a broad range of scientific and technological advances in visualization, which are relevant to visual supercomputing. Mainly, we present the original abstracts from the cited papers.
In order to facilitate efficient query processing, the information contained in data warehouses is typically stored as a set of materialized views. Deciding which views to materialize represent a challenge in order to...
详细信息
ISBN:
(纸本)9783540695004
In order to facilitate efficient query processing, the information contained in data warehouses is typically stored as a set of materialized views. Deciding which views to materialize represent a challenge in order to minimize view maintenance and query processing costs. Some existing approaches are applicable only for small problems, which are far from reality. In this paper we introduce a new approach for materialized view selection using parallel Simulated Annealing (PSA) that selects views from an input Multiple View processing Plan (MVPP). With PSA, we are able to perform view selection on MVPPs having hundreds of queries and thousands of views. Also, in our experimental study we show that our method provides a significant improvement in the quality of the obtained set of materialized views over existing heuristic and sequential simulated annealing algorithms.
the appearance of Multicore processors brings high performance computing to the desktop and opens the doors of mainstream computing for parallel computing. this paradigm shift leads the integration of paxallel program...
详细信息
ISBN:
(纸本)9783540695004
the appearance of Multicore processors brings high performance computing to the desktop and opens the doors of mainstream computing for parallel computing. this paradigm shift leads the integration of paxallel programming standards for high-end shard-memory machine architectures into desktop programming environments. In this paper we present a performance study of these new systems. We evaluate the performance of an OpenMP shared-memory programming model that is integrated into Microsoft Visual Studio C++ 2005 and Intel C++ compilers on a multicore processor. We benchmarked using the NAS OpenMP high-level applications benchmarks and the EPCC OpenMP low-level benchmarks. We report the basic timings, scalability, and run-time profiles of each benchmark and analyze the running results.
As current reasoning techniques are not designed for massive parallelisation, usage of parallel computation techniques in reasoning establishes a major research problem. I will propose two possibilities of applying pa...
详细信息
ISBN:
(纸本)9783540885634
As current reasoning techniques are not designed for massive parallelisation, usage of parallel computation techniques in reasoning establishes a major research problem. I will propose two possibilities of applying parallel computation techniques to ontology reasoning: parallelprocessing of independent ontological modules, and tailoring the reasoning algorithms to parallelarchitectures.
this paper examines the scalable parallel implementation of the QR factorization of a general matrix, targeting SMP and multi-core architectures. Two implementations of algorithms-by-blocks are presented. Each impleme...
详细信息
ISBN:
(纸本)9780769530895
this paper examines the scalable parallel implementation of the QR factorization of a general matrix, targeting SMP and multi-core architectures. Two implementations of algorithms-by-blocks are presented. Each implementation views a block of a matrix as the fundamental unit of data, and likewise, operations over these blocks as the primary unit of computation. the first is a conventional blocked algorithm similar to those included in libFLAME and LAPACK but expressed in a way that allows operations in the so-called critical path of execution to be computed as soon as their dependencies are satisfied. the second algorithm captures a higher degree of parallelism with an approach based on Givens rotations while preserving the performance benefits of algorithms based on blocked Householder transformations. We show that the implementation effort is greatly simplified by expressing the algorithms in code withthe FLAME/FLASH API, which allows matrices stared by blocks to be viewed and managed as matrices of matrix blocks. the SuperMatrix run-time system utilizes FLASH to assemble and represent matrices but also provides out-of-order scheduling of operations that is transparent to the programmer Scalability of the solution is demonstrated on ccNUMA platform with 16 processors and an SMP architecture with 16 cores.
暂无评论