版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Diego Portales Univ Santiago Chile CEBIB Santiago Chile Univ Piemonte Orientale Alessandiia Italy CNR IIT Pisa Italy Univ Helsinki Helsinki Finland
出 版 物:《MATHEMATICS IN COMPUTER SCIENCE》 (计算机科学中的数学)
年 卷 期:2017年第11卷第2期
页 面:151-157页
学科分类:08[工学] 0701[理学-数学] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:Academy of FinlandAcademy of Finland [258308 250345]
主 题:Compressed data structures Similarity search Spaced seeds Spaced suffix arrays Relative compression
摘 要:As a first step in designing relatively-compressed data structures-i.e., such that storing an instance for one dataset helps us store instances for similar datasets-we consider how to compress spaced suffix arrays relative to normal suffix arrays and still support fast access to them. This problem is of practical interest when performing similarity search with spaced seeds because using several seeds in parallel significantly improves their performance, but with existing approaches we keep a separate linear-space hash table or spaced suffix array for each seed. We first prove a theoretical upper bound on the space needed to store a spaced suffix array when we already have the suffix array. We then present experiments indicating that our approach works even better in practice.