Writing and optimizing application software for heterogeneous platforms including GPU units is a very difficult task that requires designer efforts and resources to consider several key elements to obtain good perform...
详细信息
ISBN:
(纸本)9781665438605
Writing and optimizing application software for heterogeneous platforms including GPU units is a very difficult task that requires designer efforts and resources to consider several key elements to obtain good performance. Dataflow programming has shown to be a good approach for accomplishing such a difficult task for its properties of portability and the possibility of arbitrary partitioning a dataflow network on each unit of heterogeneous platforms. However, such a design methodology is not sufficient by itself to obtain good performance. The paper describes some methodological steps for improving the performance of dataflow programs written in RVC-CAL and synthesized to execute on heterogeneous CPU/GPU co-processing platforms. The steps do include the optimization of the performance of the communication tasks between processing elements, a strategy for the efficient scheduling of independent GPU partitions, and the introduction of dynamic programming for leveraging the simd nature of GPU platforms. The approach is validated qualitatively and quantitatively using dataflow application program examples executed by applying several partitioning configurations.
Using 512 bit Advanced Vector Extensions, previous development history and Intel documentation, BNF grammar based genetic improvement automatically ports RNAfold to AVX, giving up to a 1.77 fold speed up. The evolved ...
详细信息
ISBN:
(纸本)9783030166694;9783030166700
Using 512 bit Advanced Vector Extensions, previous development history and Intel documentation, BNF grammar based genetic improvement automatically ports RNAfold to AVX, giving up to a 1.77 fold speed up. The evolved code pull request is an accepted GI software maintenance update to bioinformatics package ViennaRNA.
More sensitive than heuristic methods for searching biological databases, the Smith-Waterman algorithm is widely used but has the drawback of a high quadratic running time. The faster approach extends Smith-Waterman u...
详细信息
More sensitive than heuristic methods for searching biological databases, the Smith-Waterman algorithm is widely used but has the drawback of a high quadratic running time. The faster approach extends Smith-Waterman using Associative Massive parallelism (SWAMP+) for three different parallel architectures: ASsociative computing (ASC), the ClearSpeed coprocessor, and the Convey Computer FPGA coprocessor. We show that parallel versions of Smith-Waterman can be successfully modified to produce multiple BLAST-like sub-alignments while maintaining the original precision. SWAMP+ combines parallelism and the novel extension producing multiple sub-alignments for pairwise comparisons. Two parallel SWAMP+ implementations for the ASC model and the ClearSpeed CSX-620 use a wavefront approach. Both perform a full traceback in parallel memory, returning multiple sub-alignments. Results show a linear speedup for the 96 processing elements (PEs) on a single ClearSpeed chip. The third SWAMP+ adaptation uses the non-associative Convey Computer FPGA coprocessor. The hybrid system has a Smith-Waterman algorithm suite designed to produce high-speed, high-throughput alignments, optimized for large databases. The Convey Computer Smith-Waterman algorithm suite was extended to produce the additional SWAMP+ sub-alignments efficiently. The parallel sequence alignment algorithms were designed for three different computer systems, all of which contain extensions to produce multiple, additional sub-alignments. This work creates a speedup while providing a deeper exploration of the matched query sequences previously unavailable. (C) 2013 Elsevier B.V. All rights reserved.
暂无评论