检索结果-内蒙古大学图书馆

Algorithm design and performance evaluation of sparse induced suffix sorting

INFORMATION PROCESSING & MANAGEMENT 2024年第5期61卷

作者： Wu, Wenbo Nong, Ge Sun Yat Sen Univ Sch Comp Sci & Engn Guangzhou Peoples R China Sun Yat Sen Univ Guangzhou Peoples R China

Sorting any m target suffixes of an input string X of n characters from a constant alphabet is a key task for building the sparse suffix array SSA(X) for index construction. A number of probabilistic and deterministic algorithms have been proposed for sorting sparse suffixes with varying time and space complexities, but only some experimental results are available for performance evaluation of these algorithms. We design a divide-and-conquer algorithm called sSAIS for computing SSA(X) in O(n log m log(n/m)) time and O(m) workspace by using the induced sorting principle, and conduct an experimental performance study on real and artificial datasets. This work reveals that to design an efficient deterministic algorithm for sorting sparse suffixes is a tough challenge and the density of target suffixes might be considered as a critical design parameter.

关键词： sparse suffix array Induced sorting Algorithm design Performance evaluation

来源：评论

学校读者我要写书评

暂无评论

Sampled suffix array with minimizers

引用

SOFTWARE-PRACTICE & EXPERIENCE 2017年第11期47卷 1755-1771页

作者： Grabowski, Szymon Raniszewski, Marcin Lodz Univ Technol Inst Appl Comp Sci Al Politech 11 PL-90924 Lodz Poland

Sampling (evenly) the suffixes from the suffix array is an old idea trading the pattern search time for reduced index space. A few years ago Claude et al. showed an alphabet sampling scheme allowing for more efficient pattern searches compared with the sparse suffix array, for long enough patterns. A drawback of their approach is the requirement that sought patterns need to contain at least one character from the chosen subalphabet. In this work, we propose an alternative suffix sampling approach with only a minimum pattern length as a requirement, which is more convenient in practice. Experiments show that our algorithm (in a few variants) achieves competitive time-space tradeoffs on most standard benchmark data. Copyright (C) 2017 John Wiley & Sons, Ltd.

关键词： full-text indexing sparse suffix array sampled suffix array minimizers

来源：评论

学校读者我要写书评

暂无评论

PSI-RA: A Parallel sparse Index for Read Alignment on Genomes

PSI-RA: A Parallel Sparse Index for Read Alignment on Genome...

引用

IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

作者： Kulekci, M. Oguzhan Hon, Wing-Kai Shah, Rahul Vitter, Jeffrey Scott Xu, Bojian Natl Res Inst Elect & Cryptol Gebze Turkey Natl Tsing Hua Univ Dept Comp Sci Hsinchu Taiwan Louisiana State Univ Dept Comp Sci Baton Rouge LA 70803 USA Univ Kansas Informat & Telecomunicat Technol Ctr Lawrence KS 66045 USA

ISBN: (纸本)9781424483075

We concentrate on indexing DNA sequences via sparse suffix arrays (SSAs) and propose a new short read aligner named PSI-RA (parallel sparse index read aligner). The motivation in using SSAs is the ability to trade memory against time. It is possible to tune the space consumption of the index based on the available memory of the machine and the minimum length of the arriving pattern queries. Although SSAs have been studied before on exact matching of short reads, an elegant way of approximate matching capability was missing. We provide this by defining the right-most mismatch criteria that prioritizes the errors towards the end of the reads since it is known that the errors are more probable at that area. PSI-RA supports any number of mismatches in aligning reads. We give comparisons with some of the well known short read aligners, and show that indexing genome with SSA is a good alternative to Burrows-Wheeler transform or seed based solutions.

关键词： short read alignment genome indexing sparse suffix array

来源：评论

学校读者我要写书评

暂无评论

DGCF: A Distributed Greedy Clustering Framework for Large-scale Genomic Sequences

DGCF: A Distributed Greedy Clustering Framework for Large-sc...

引用

IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

作者： Yin, Zekun Xu, Xiaoming Fan, Kaichao Li, Ruilin Li, Weizhong Liu, Weiguo Niu, Beifang Shandong Univ Sch Software Jinan Peoples R China Shandong Univ Shenzhen Res Inst Jinan Peoples R China Univ Chinese Acad Sci Chinese Acad Sci Comp Network Informat Ctr Beijing Peoples R China J Craig Venter Inst La Jolla CA 92037 USA Natl Supercomp Ctr Wuxi Wuxi Jiangsu Peoples R China

ISBN: (纸本)9781728118673

Clustering is a very fundamental while time-consuming compute operation in biological sequence analysis. New sequencing technologies such as NGS and 3GS have dramatically increased both the dataset size and the length of a single read sequence. However, existing tools lack scalability for handling large-scale datasets as well as long sequences. A feasible solution to this problem is to use parallel and distributed systems. The efficient deployment of such systems, however, requires high parallelism in both software implementations as well as algorithmic optimizations. In this paper, we propose DGCF, a Distributed Greedy Clustering Framework which is capable to handle large-scale datasets and long sequences. Our framework adopts a greedy clustering strategy which overlaps communication with computation among many distributed computing nodes. We also design and implement a sparse suffix array (SSA)-based alignment algorithm that can support long sequences. Experiments show that our framework achieves near linear speedups on a distributed memory cluster.

关键词： greedy clustering sparse suffix array sequence analysis parallel computing

来源：评论

学校读者我要写书评

暂无评论

sparse Text Indexing in Small Space

引用

ACM TRANSACTIONS ON ALGORITHMS 2016年第3期12卷 39-39页

作者： Bille, Philip Fischer, Johannes Gortz, Inge Li Kopelowitz, Tsvi Sach, Benjamin Vildhoj, Hjalte Wedel Tech Univ Denmark DTU Compute DK-2800 Lyngby Denmark TU Dortmund Dept Comp Sci Otto Hahn Str 14 D-44227 Dortmund Germany Weizmann Inst Sci Fac Math & Comp Sci 234 Herzl St IL-76100 Rehovot Israel Univ Bristol Dept Comp Sci Merchant Venturers Bldg Bristol BS8 1TH Avon England

In this work, we present efficient algorithms for constructing sparse suffix trees, sparse suffix arrays, and sparse position heaps for b arbitrary positions of a text T of length n while using only O(b) words of space during the construction. Attempts at breaking the naive bound of Omega(nb) time for constructing sparse suffix trees in O(b) space can be traced back to the origins of string indexing in 1968. First results were not obtained until 1996, but only for the case in which the b suffixes were evenly spaced in T. In this article, there is no constraint on the locations of the suffixes. Our main contribution is to show that the sparse suffix tree (and array) can be constructed in O(n log(2) b) time. To achieve this, we develop a technique that allows one to efficiently answer b longest common prefix queries on suffixes of T, using only O(b) space. We expect that this technique will prove useful in many other applications in which space usage is a concern. Our first solution is Monte Carlo, and outputs the correct tree with high probability. We then give a Las Vegas algorithm, which also uses O(b) space and runs in the same time bounds with high probability when b = O(root n). Additional trade-offs between space usage and construction time for the Monte Carlo algorithm are given. Finally, we show that, at the expense of slower pattern queries, it is possible to construct sparse position heaps in O(n+b log b) time and O(b) space.

关键词： sparse text indexing sparse suffix tree sparse suffix array sparse suffix sorting sparse position heap Karp Rabin fingerprints

来源：评论

学校读者我要写书评

暂无评论

DGCF: A Distributed Greedy Clustering Framework for Large-scale Genomic Sequences

DGCF: A Distributed Greedy Clustering Framework for Large-sc...

引用

IEEE International Conference on Bioinformatics and Biomedicine

作者： Zekun Yin Xiaoming Xu Kaichao Fan Ruilin Li Beifang Niu Weizhong Li Weiguo Liu School of Software Shandong University School of software Shandong University Computer Network Information Center Chinese Academy of Sciences University of Chinese Academy of Sciences J. Craig Venter Institute

ISBN: (纸本)9781728118680

关键词： greedy clustering sparse suffix array sequence analysis parallel computing Analyses, Sequence Frameworks PARALLEL PROCESSING (COMPUTERS) greedy method Cluster Analysis Feasible Solution Dataset

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：