检索结果-内蒙古大学图书馆

The string B-tree: A new data structure for string search in external memory and its applications

JOURNAL OF THE ACM 1999年第2期46卷 236-280页

作者： Ferragina, P Grossi, R Univ Pisa Dipartimento Informat I-56125 Pisa Italy Univ Florence Florence Italy

We introduce a new text-indexing data structure, the String B-Tree, that can be seen as a link between some traditional external-memory and string-matching data structures. In a short phrase, it is a combination of B-trees and Patricia tries for internal-node indices that is made more effective by adding extra pointers to speed up search and update operations. Consequently, the String B-Tree overcomes the theoretical limitations of inverted files, B-trees, prefix B-trees, suffix arrays, compacted tries and suffix trees. String B-trees have the same worst-case performance as B-trees but they manage unbounded-length strings and perform much more powerful search operations such as the ones supported by suffix trees. String B-trees are also effective in main memory (RAM model) because they improve the online suffix tree search on a dynamic set of strings. They also can be successfully applied to database indexing and software duplication.

关键词： B-tree external-memory data structure Patricia trie prefix and range search string searching and sorting suffix array suffix tree text index

来源：评论

学校读者我要写书评

暂无评论

A data structure for a Sequence of String Accesses in external memory

引用

ACM TRANSACTIONS ON ALGORITHMS 2007年第1期3卷 1-23页

作者： Ciriani, Valentina Ferragina, Paolo Luccio, Fabrizio Muthukrishnan, S. Univ Milan Dept Informat Technol I-126013 Crema Italy Univ Pisa Dept Comp Sci Largo Pontecorro 3 I-156127 Pisa Italy Rutgers State Univ Dept Comp Sci Piscataway NJ 08854 USA

We introduce a new paradigm for querying strings in external memory, suited to the execution of sequences of operations. Formally, given a dictionary of n strings S-1, . . . , S-n, we aim at supporting a search sequence for m not necessarily distinct strings T-1, T-2 , . . . , T-m, as well as inserting and deleting individual strings. The dictionary is stored on disk, where each access to a disk page fetches B items, the cost of an operation is the number of pages accessed (I/Os), and efficiency must be attained on entire sequences of string operations rather than on individual ones. Our approach relies on a novel and conceptually simple self-adjusting data structure (SASL) based on skip lists, that is also interesting per se. The search for the whole sequence T-1, T-2 , . . . , T-m can be done in an expected number of I/Os: O(Sigma(m)(j=1) vertical bar T-j vertical bar/B + Sigma(n)(j=1) (n(i) log(B) m/n(i))), where each T-j may or may not be present in the dictionary, and n(i) is the number of times Si is queried (i.e., the number of T(j)s equal to S-i). Moreover, inserting or deleting a string S-i takes an expected amortized number O(vertical bar S-i vertical bar/B + log(B)n) of I/Os. The term Sigma(n)(j=1) vertical bar T-j vertical bar/B in the search formula is a lower bound for reading the input, and the term Sigma(n)(i=1) n(i) log(B) m/n(i) (entropy of the query sequence) is a standard information-theoretic lower bound. We regard this result as the static optimality theorem for external-memory string access, as compared to Sleator and Tarjan's classical theorem for numerical dictionaries [Sleator and Tarjan 1985]. Finally, we reformulate the search bound if a cache is available, taking advantage of common prefixes among the strings examined in the search.

关键词： Skip list external-memory data structure sequence of string searches and updates caching

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：