版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:NIT Hamirpur Dept Comp Sci & Engn Hamirpur India Ajay Kumar Garg Engn Coll Dept Comp Sci & Engn Ghaziabad UP India
出 版 物:《MULTIMEDIA TOOLS AND APPLICATIONS》 (多媒体工具和应用)
年 卷 期:2023年第82卷第9期
页 面:14037-14053页
核心收录:
学科分类:0808[工学-电气工程] 08[工学] 0835[工学-软件工程] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:UK Research and Innovation UKRI (105141)
主 题:Wavelet tree Inverted index Map-reduce Search time Parallel algorithms Dictionary searching
摘 要:Indexing is one of the key components of any search tool to be optimized for searching documents. Among the existing indexing techniques, inverted indexing is one of the best methods used at a larger scale for various applications. Under this method, the index is designed using a signature file, hash tree and B-tree to retrieve the required document in efficient time. B-tree is popular due to its searching efficiency, but its performance degrades with increasing data set size. The wavelet tree has become a popular and versatile data structure in the last decade, used in various domains such as sequences, indexing, compression, and grid-point with surprising results. This study proposes a parallel wavelet tree algorithm with hybridization of the Map-Reduce concept to construct an index for textual search. The proposed algorithm reduces the index construction time considerably. Experiments show that the proposed algorithm takes a reasonable trade-off with existing indexing approaches. For large data sets, index construction time has been reduced with respect to other existing state-of-art schemes. Also, results show that the algorithm performs well when the data-set scales up to up-to-the full utilization of available cores. It is possible due to the use of multiple threads working in parallel. Our experiment demonstrated consistent performance with 2-core, 4-core, 8-core, 12-core and results of 16-core show increase in index construction time due to parallel overhead when the data-set in not sufficiently large.