文献详情 >Locality-sensitive bucketing f... 收藏

arXiv

Locality-sensitive bucketing functions

作者机构：Department of Computer Science and Engineering School of Electronic Engineering and Computer Science The Pennsylvania State University United States Huck Institutes of the Life Sciences The Pennsylvania State University United States

出版物：《arXiv》 (arXiv)

年卷期：2022年

核心收录：

主　　题：Data handling

摘要：Many bioinformatics applications involve bucketing a set of sequences where each sequence is allowed to be assigned into multiple buckets. To achieve both high sensitivity and precision, bucketing methods are desired to assign similar sequences into the same buckets while assigning dissimilar sequences into distinct buckets. Existing k-mer-based bucketing methods have been efficient in processing sequencing data with low-error rate, but encounter much reduced sensitivity on data with high-error rate. Locality-sensitive hashing (LSH) schemes are able to mitigate this issue through tolerating the edits in similar sequences, but state-of-the-art methods still have large gap. Here we generalize the LSH function by allowing it to hash one sequence into multiple buckets. Formally, a bucketing function, which maps a sequence (of fixed length) into a subset of buckets, is defined to be (d1, d2)-sensitive if any two sequences within an edit distance of d1 are mapped into at least one shared bucket, and any two sequences with distance at least d2 are mapped into disjoint subsets of buckets. We construct LSB functions with a variety of values of (d1, d2) and analyze their efficiency with respect to the total number of buckets needed as well as the number of buckets that a specific sequence is mapped to. We also prove lower bounds of these two parameters in different settings and show that some of our constructed LSB functions are optimal. These results provide theoretical foundations for their practical use in analyzing sequencing data with high error rate while also providing insights for the hardness of designing ungapped LSH functions. Copyright © 2022, The Authors. All rights reserved.

本地馆藏 | 借阅须知 | 我要预约

已订购，未入库

sda

目录详情 | 试阅读 |

读者评论与其他读者分享你的观点

学校读者

用户名:未登录

我的评分

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Locality-sensitive bucketing functions

读者评论与其他读者分享你的观点

请选择收藏分类：

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Locality-sensitive bucketing functions

读者评论 与其他读者分享你的观点

请选择收藏分类： 新增自定义分类 确定 取消

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

读者评论与其他读者分享你的观点

请选择收藏分类：