An efficient parallel priority queue is at the core of the effort in parallelizing important non-numeric irregular computations such as discrete event simulation scheduling and branch-and-bound algorithms. GPGPUs can ...
详细信息
ISBN:
(纸本)9781467323703;9781467323727
An efficient parallel priority queue is at the core of the effort in parallelizing important non-numeric irregular computations such as discrete event simulation scheduling and branch-and-bound algorithms. GPGPUs can provide powerful computing platform for such non-numeric computations if an efficient parallel priority queue implementation is available. In this paper, aiming at fine-grained applications, we develop an efficient parallel heap system employing CUDA. To our knowledge, this is the first parallel priority queue implementation on many-core architectures, thus represents a breakthrough. By allowing wide heap nodes to enable thousands of simultaneous deletions of highest priority items and insertions of new items, and taking full advantage of CUDA's data parallel SIMT architecture, we demonstrate up to 30-fold absolute speedup for relatively fine-grained compute loads compared to optimized sequential priority queue implementation on fast multicores. Compared to this, our optimized multicore parallelization of parallel heap yields only 2-3 fold speedup for such fine-grained loads. this parallelization of a tree-based data structure on GPGPUs provides a roadmap for future parallelizations of other such data structures.
the "one-architecture-fits-all" design philosophy is inadequate for catering to the diverse characteristics of applications running on manyCore architectures. After evaluating various configurations of manyC...
详细信息
Next Generation Sequencing (NGS) platforms typically produce short reads of size 50-150 base pairs (bp). the number of such short reads can be up to 6 billion per run. To align these short reads to a large genome is a...
详细信息
ISBN:
(纸本)9781467323703;9781467323727
Next Generation Sequencing (NGS) platforms typically produce short reads of size 50-150 base pairs (bp). the number of such short reads can be up to 6 billion per run. To align these short reads to a large genome is a computationally challenging problem. In this paper, we address this problem by considering the design and optimization of parallel sequence alignment on GPU based hybrid architectures. Even though the sequence alignment algorithm is inherently data-parallel, issues such as (a) space-time trade-offs in the Indexing schema, (b) need for fast candidate location search (CAL) on GPU, (c) maintaining low divergence along with low space for the dynamic programming based local alignment, make this a very challenging problem. We present the design of our novel parallel algorithm Graphics processor Accelerated BFAST (GrABFAST) for large scale read alignment that overcomes these challenges and demonstrates superior performance compared to Intel multicore architectures. Using 5 large genomes including those of Humans, Maize, Horse, Dog and Bacteria, we demonstrate a speedup of around 6x using Fermi Tesla C2070 GPUs vs the BFAST algorithm on 16 core Intel Xeon 5570 architecture.
LDPC codes have been intensively used in various wireless communication applications, due to their increased BER performance. the present paper summarizes the state of the art applications of short length LDPC codes a...
详细信息
ISBN:
(纸本)9781479914920
LDPC codes have been intensively used in various wireless communication applications, due to their increased BER performance. the present paper summarizes the state of the art applications of short length LDPC codes and proposes FPGA based application specific hardware architectures for short-length LDPC decoders. the decoding algorithms considered for implementation are both belief propagation and min-sum algorithm. Due to the increased BER performances, the proposed architecture make use of parallel computation capabilities offered by FPGA technology in order to implement the belief propagation algorithm. In spite of the iterative nature and increased computational complexity of the LDPC decoding algorithm, the proposed architecture achieves high-throughput, mandatory in real-time application and data transmission. the architecture for the LDPC belief propagation based decoder is based on arctangent hyperbolic function approximation used for check nodes update.
Real-valued black-box optimization of badly behaved and not well understood functions is a wide topic in many scientific areas. Possible applications range from maximizing portfolio profits in financial mathematics ov...
详细信息
Sequence alignment has been widely utilized in biological computing science. To obtain the optimal alignment results many algorithms adopts dynamic programming method to achieve this goal. Smith-Waterman algorithm is ...
详细信息
ISBN:
(纸本)9781479909735
Sequence alignment has been widely utilized in biological computing science. To obtain the optimal alignment results many algorithms adopts dynamic programming method to achieve this goal. Smith-Waterman algorithm is the famous in the sequence alignment approach. However, such dynamic programming algorithms are computation-consuming. It is impossible to use these algorithms to compare query sequence with a sequence database such as GenBank and PDB. Recently, GPU computing has been applied in many sequence alignment algorithms to enhance the performance. In this paper, we proposed a GPU-based Smith-Waterman algorithm by combining the CPU and GPU computing capabilities to accelerate alignments on a sequence database. In the proposed algorithm, a filtration mechanism using frequency distance is used to decrease the number of compared sequences. We implemented the Smith-Waterman alignments by CUDA on the NVIDIA Tesla C2050. the experimental results show that the highest speedup ratio is about 80 to 90 times over CPU-based Smith-Waterman algorithm.
In this paper we evaluate two life science algorithms, namely Needleman-Wunsch sequence alignment and Direct Coulomb Summation, for GPUs. Whereas for Needleman-Wunsch it is difficult to get good performance numbers, D...
详细信息
In this paper we study two parallelization strategies (loop-level parallelism and domain decomposition), and we investigate their impact in terms of performance and scalability on two different parallelarchitectures....
详细信息
this paper presents a parallel implementation approach of selective harmonic compensator for active power filters. this approach uses field programmable gate array (FPGAs) in order to reduce the compensator computatio...
详细信息
ISBN:
(纸本)9781467324120
this paper presents a parallel implementation approach of selective harmonic compensator for active power filters. this approach uses field programmable gate array (FPGAs) in order to reduce the compensator computational time. To compensate for even a small number of harmonics digital filters require multiple calculation instructions involving multiplications and additions. thus, to improve the performance of the computer system it is proposed the digital compensator implementation using parallel structures in FPGA devices. Experimental results are presented to compare the speedup of the proposed parallel approach withthe DSP sequential execution time conventionally used in active power filters applications.
Problems of identification of material parameters (mostly parameters appearing in constitutive relations) have applications in many fields of engineering including investigation of processes in a rock mass. this paper...
详细信息
ISBN:
(纸本)9783642281440
Problems of identification of material parameters (mostly parameters appearing in constitutive relations) have applications in many fields of engineering including investigation of processes in a rock mass. this paper outlines the structure of parameter identification problems, methods for their solution and describes an identification (calibration) problem from geotechnics, which will serve as a realistic benchmark problem for illustration of the behaviour of selected parameter identification methods.
暂无评论