检索结果-内蒙古大学图书馆

Fast parallel skew and prefix-doubling suffix array construction on the GPU

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2016年第12期28卷 3466-3484页

作者： Wang, Leyuan Baxter, Sean Owens, John D. Univ Calif Davis Davis CA 95616 USA DE Shaw Res New York NY USA

Suffix arrays are fundamental full-text index data structures of importance to a broad spectrum of applications in such fields as bioinformatics, Burrows-Wheeler transform-based lossless data compression, and information retrieval. In this work, we propose and implement two massively parallel approaches on the graphics processing unit (GPU) based on two classes of suffix array construction algorithms. The first, parallel skew, makes algorithmic improvements to the previous work of Deo and Keely to achieve a speedup of 1.45x over their work. The second, a hybrid skew and prefix-doubling implementation, is the first of its kind on the GPU and achieves a speedup of 2.3-4.4x over Osipov's prefix-doubling and 2.4-7.9x over our skew implementation on large datasets. Our implementations rely on two efficient parallel primitives, a merge and a segmented sort. We theoretically analyze the two formulations of suffix array construction algorithms and show performance comparisons on a large variety of practical inputs. We conclude that, with the novel use of our efficient segmented sort, prefix-doubling is more competitive than skew on the GPU. We also demonstrate the effectiveness of our methods in our implementations of the Burrows-Wheeler transform and in a parallel full-text, minute-space-index for pattern searching. Copyright (C) 2016 John Wiley & Sons, Ltd.

关键词： suffix array parallel GPU skew prefix-doubling segmented sort

来源：评论

学校读者我要写书评

暂无评论

Unleashing the performance of bmSparse for the sparse matrix multiplication in GPUs 12

Unleashing the performance of bmSparse for the sparse matrix...

引用

12th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA)

作者： Berger, Gonzalo Freire, Manuel Marini, Renzo Dufrechou, Ernesto Ezzatti, Pablo Univ Republica Fac Ingn INCO Montevideo Uruguay

ISBN: (纸本)9781665411288

The evolution of data science and machine learning has increased the applicability of the sparse matrix multiplication (SPGEMM) kernel. Unlike more well-known operations such as the SPMV, in the SPGEMM the nonzero pattern of the result is determined by the interaction between the nonzero patterns of the inputs, which impose serious challenges to the development of high-performance implementations for accelerators. Recent efforts in this subject aim to mitigate this irregularity through the use of block-based sparse storage formats, obtaining promissing results on accelerators such as GPUs. In this work we study the format bmSparse [1] and propose optimizations to attack the principal bottlenecks of the original SPGEMM implementation for Nvidia GPUs. We evaluate the proposal using nine sparse matrices of different sizes, showing remarkable speedups with respect to CUSPARSE's CSR variant.

关键词： bmSparse spGemm GPU task list indexing segmented sort

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：