检索结果-内蒙古大学图书馆

A parallel space saving algorithm for frequent items and the Hurwitz zeta distribution

INFORMATION SCIENCES 2016年 329卷 1-19页

作者： Cafaro, Massimo Pulimeno, Marco Tempesta, Piergiulio Univ Salento Lecce Italy Univ Complutense Fac Fis Dept Fis Teor 2 E-28040 Madrid Spain Inst Ciencias Matemat Madrid 28049 Spain

We present a message-passing based parallel version of the space saving algorithm designed to solve the k-majority problem. The algorithm determines in parallel frequent items, i.e., those whose frequency is greater than a given threshold, and is therefore useful for iceberg queries and many other different contexts. We apply our algorithm to the detection of frequent items in both real and synthetic datasets whose probability distribution functions are a Hurwitz and a Zipf distribution respectively. Also, we compare its parallel performances and accuracy against a parallel algorithm recently proposed for merging summaries derived by the space saving or Frequent algorithms. (C) 2015 Elsevier Inc. All rights reserved.

关键词： Frequent items space saving algorithm Message-passing

来源：评论

学校读者我要写书评

暂无评论

TopKmer: Parallel High Frequency K-mer Counting on Distributed Memory 19th

TopKmer: Parallel High Frequency K-mer Counting on Distribut...

引用

19th International Conference on Network and Parallel Computing (NPC)

作者： Li Mocheng Chen Zhiguang Xiao Nong Liu Yang Luo Xi Chen Tao Natl Univ Def Technol Coll Comp Inst Quantum Informat Changsha Peoples R China Natl Univ Def Technol Coll Comp State Key Lab High Performance Comp Changsha Peoples R China Sun Yat Sen Univ Sch Comp Guangzhou Peoples R China Shenzhen 2 Vocat & Tech Sch Automot Dept Shenzhen Peoples R China Beijing Inst Life Beijing Proteome Res Ctr State Key Lab Prote Natl Ctr Prot Sci Beijing Beijing Peoples R China

ISBN: (纸本)9783031213946;9783031213953

High-throughput DNA sequencing is a crucial technology for genomics research. As genetic data grows to hundreds of gigabytes or even terabytes that traditional devices cannot support, high-performance computing plays an important role. However, current high-performance approaches to extracting k-mers cost a large memory footprint due to the high error rate of short-read sequences. This paper proposes Top-Kmer, a parallel k-mer counting workflow that indexes high-frequency k-mers within a tiny counting structure. On the 2048 cores of Tianhe-2, we construct k-mer index tables in 18 s for 174 GB fastq files and complete queries in 1 s for 1 billion k-mers, with a scaling efficiency of 95%. Compared with the state of the art, the counting table's memory usage is reduced by 50% with no performance degradation.

关键词： k-mer counting Distributed hash table space saving algorithm High scalability Hybrid parallel

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：