Due to their increasing computational power, modern graphics processingarchitectures are becoming more and more popular for general purpose applications with high performance demands. this is the case of quantum comp...
详细信息
ISBN:
(纸本)9783540693833
Due to their increasing computational power, modern graphics processingarchitectures are becoming more and more popular for general purpose applications with high performance demands. this is the case of quantum computer simulation, a problem with high computational requirements both in memory and processing power. When dealing with such simulations, multiprocessor architectures are an almost obliged tool. In this paper we explore the use of the new graphics processor architecture NVIDIA CUDA in the simulation of some basic quantum computing operations. this new architecture is oriented towards a more general exploitation of the graphics platform, allowing to use it as a parallel SIMD multiprocessor. In this direction, some implementation strategies are proposed, showing that the effectiveness of the codes is subject to a right exploitation of the underlying memory hierarchy.
Clusters built from single-core systems are cost-effective as for the performance improvement and availability. However, the hardware constraints put limitations on the performance of single-core systems. Hence, it is...
详细信息
ISBN:
(纸本)9780769533520
Clusters built from single-core systems are cost-effective as for the performance improvement and availability. However, the hardware constraints put limitations on the performance of single-core systems. Hence, it is difficult to meet withthe increasing high performance requirements of diversified applications at different levels for general-purpose computing. A promising feasible solution is the novice multi-core systems which extend the parallelism to CPU level by integrating multiple processing units on a single die. this paper uses Finite-Difference Time-Domain (FDTD) algorithm as a case study, designing suitable parallel FDTD algorithms for three architectures: distributed-memory machines with single-core processors, shared-memory machines with dual-core processors, and the Cell Broadband Engine (Cell/B.E.) processor with nine heterogeneous cores. the experiment results show that the Cell/B.E. processor using 8 SPEs achieves a significant speedups of 7.05 faster than AMD single-core Opteron processor and 3.37 than AMD dual-core Opeteron processor at the Processor level.
this paper describes application of parallel Grammatical Evolution (PGE) algorithm to combinatorial logic circuit generation. the grammar and algorithms used are described. To increase the efficiency of Grammatical Ev...
详细信息
ISBN:
(纸本)9783540858560
this paper describes application of parallel Grammatical Evolution (PGE) algorithm to combinatorial logic circuit generation. the grammar and algorithms used are described. To increase the efficiency of Grammatical Evolution (GE) the backward processing algorithm was used. Different approaches to create multiobjective fitness functions are described and tested. Specifically the fitness functions are defined as set of rules incorporating different comparison methods in each stage of the computation. the algorithm is internally parallel and consists of three different interconnected populations.
A novel fast scheme for Discrete Wavelet Transform (DWT) was lately introduced under the name of lifting scheme [4, 10]. this new scheme presents many advantages over the convolution-based approach [10, 11]. For insta...
详细信息
ISBN:
(纸本)9780769532875
A novel fast scheme for Discrete Wavelet Transform (DWT) was lately introduced under the name of lifting scheme [4, 10]. this new scheme presents many advantages over the convolution-based approach [10, 11]. For instance it is very suitable for parallelization. In this paper we present two new FPGA-based parallel implementations of the DWT lifting-based scheme. the first implementation uses pipelining, parallelprocessing and data reuse to increase the speed up of the algorithm. In the second architecture a controller is introduced to deploy dynamically a suitable number of clones accordingly to the available hardware resources on a targeted environment. these two architectures are able of processing large size incoming images or multi framed images in real-time. the simulations driven on a Xilinx Virtex-5 FPGA environment has proven the practical efficiency of our contribution. In fact, the first architecture has given an operating frequency of 289 MHz, and the second architecture demonstrated the controller's capabilities of determining the true available resources needed for a successful deployment of independent clones, over a targeted FPGA environment and processingthe task in parallel.
In the quest of designing extremely fault-tolerant computing systems drawing inspiration from nature is one avenue worth exploring. Embryonics (embryonic electronics) is a research project that attempts to implement f...
详细信息
ISBN:
(纸本)9783540858560
In the quest of designing extremely fault-tolerant computing systems drawing inspiration from nature is one avenue worth exploring. Embryonics (embryonic electronics) is a research project that attempts to implement features otherwise available in the world of biology to design robust, massively parallel arrays of processors. this paper elaborates on some of the design approaches undertaken in order to ensure a high level of fault-tolerance as well as on how to partition the array in order to optimally make use of spare resources.
the development of numerical simulation software tools for the solution of real-world problems usually calls for domain experts in modeling. the GraPA framework,, as an abstraction layer on top of hardware characteris...
详细信息
ISBN:
(纸本)9780769534435
the development of numerical simulation software tools for the solution of real-world problems usually calls for domain experts in modeling. the GraPA framework,, as an abstraction layer on top of hardware characteristics, supports modelers in two respects: one is the built-in support for co-processing of multiple models and the other is the generically delivered high performance achieved by implementing concurrency features of multicore and distributed memory architectures. Technically, GraPA is designed as a C++ template framework, where the modeler's data structures and algorithms instantiate the framework. Using this approach, we handle parallelprocessing of lock-free data structures and message passing transperently to the modelers. In this paper, we report on the status of the implementation of GraPA and on its performance characteristics.
Sequence alignment is one of the most important techniques in Bioinformatics. Although efficient dynamic programming algorithms exist for this problem, the alignment of very long DNA sequences still requires significa...
详细信息
ISBN:
(纸本)9783540681052
Sequence alignment is one of the most important techniques in Bioinformatics. Although efficient dynamic programming algorithms exist for this problem, the alignment of very long DNA sequences still requires significant time on traditional computer architectures. In this paper, we present a scalable and efficient mapping of DNA sequence alignment onto the Cell BE multi-core architecture. Our mapping uses two types of parallelization techniques: (i) SIMD vectorization within a processor and (ii) wavefront parallelization between processors.
In this paper, we present how 3D split and merge segmentation using topological and geometrical structuring with an Oriented Boundary Graph may be optimized by parallelalgorithms. this structuring allows to implement...
详细信息
ISBN:
(纸本)9788086943169
In this paper, we present how 3D split and merge segmentation using topological and geometrical structuring with an Oriented Boundary Graph may be optimized by parallelalgorithms. this structuring allows to implement efficiently split and merge operations, but since these treatments have often to be applied with large images, we have studied how to improve performances by parallelizing this process. After a short description of the structuring 0 model and its construction, we describe algorithms for parallelizing the construction of the structuring and describe how this model can be maintained while using parallel processes. We explain the way of partitioning data for use with multiprocessor systems, and extension for use with NUMA architectures and graphics processing units (GPU) is described. Exemples on two medical images of different sizes is presented and execution time will be given.
Programmers of embedded digital signal processors often have to deal withthe devices of the platform or with low level hardware abstraction layers in order to reach the better performance from a given algorithm. this...
详细信息
Matrix decomposition applications that involve large matrix operations can take advantage of the flexibility and adaptability of reconfigurable computing systems to improve performance. the benefits come from replicat...
详细信息
ISBN:
(纸本)9781424416936
Matrix decomposition applications that involve large matrix operations can take advantage of the flexibility and adaptability of reconfigurable computing systems to improve performance. the benefits come from replication, which includes vertical replication and horizontal replication. If viewed on a space-time chart, vertical replication allows multiple computations executed in parallel, and horizontal replication renders multiple functions on the same piece of hardware. In this paper, the reconfigurable architecture that supports replications for matrix decomposition applications on reconfigurable computing systems is described, and issues including the comparison of algorithms on the system and data movement between the internal computation cores and the external memory subsystem are addressed. A prototype of such a system is implemented to prove the concept. It is expected to improve the performance and scalability of matrix decomposition involving large matrices.
暂无评论