检索结果-内蒙古大学图书馆

Scalable thread based index construction using wavelet tree

MULTIMEDIA TOOLS AND APPLICATIONS 2023年第9期82卷 14037-14053页

作者： Yadav, Arun Kumar Yadav, Divakar Verma, Akhilesh Akbar, Mohd Tewari, Kartikey NIT Hamirpur Dept Comp Sci & Engn Hamirpur India Ajay Kumar Garg Engn Coll Dept Comp Sci & Engn Ghaziabad UP India

Indexing is one of the key components of any search tool to be optimized for searching documents. Among the existing indexing techniques, inverted indexing is one of the best methods used at a larger scale for various applications. Under this method, the index is designed using a signature file, hash tree and B-tree to retrieve the required document in efficient time. B-tree is popular due to its searching efficiency, but its performance degrades with increasing data set size. The wavelet tree has become a popular and versatile data structure in the last decade, used in various domains such as sequences, indexing, compression, and grid-point with surprising results. This study proposes a parallel wavelet tree algorithm with hybridization of the Map-Reduce concept to construct an index for textual search. The proposed algorithm reduces the index construction time considerably. Experiments show that the proposed algorithm takes a reasonable trade-off with existing indexing approaches. For large data sets, index construction time has been reduced with respect to other existing state-of-art schemes. Also, results show that the algorithm performs well when the data-set scales up to up-to-the full utilization of available cores. It is possible due to the use of multiple threads working in parallel. Our experiment demonstrated consistent performance with 2-core, 4-core, 8-core, 12-core and results of 16-core show increase in index construction time due to parallel overhead when the data-set in not sufficiently large.

关键词： Wavelet tree Inverted index Map-reduce Search time parallel algorithms Dictionary searching

来源：评论

学校读者我要写书评

暂无评论

Fast, parallel, and Cache-Friendly Suffix Array Construction 23

Fast, Parallel, and Cache-Friendly Suffix Array Construction

引用

23rd International Workshop on algorithms in Bioinformatics, WABI 2023

作者： Khan, Jamshed Rubel, Tobias Dhulipala, Laxman Molloy, Erin Patro, Rob University of Maryland College ParkMD United States

ISBN: (纸本)9783959772945

String indexes such as the suffix array (SA) and the closely related longest common prefix (LCP) array are fundamental objects in bioinformatics and have a wide variety of applications. Despite their importance in practice, few scalable parallel algorithms for constructing these are known, and the existing algorithms can be highly non-trivial to implement and parallelize. In this paper we present CaPS-SA, a simple and scalable parallel algorithm for constructing these string indexes inspired by samplesort. Due to its design, CaPS-SA has excellent memory-locality and thus incurs fewer cache misses and achieves strong performance on modern multicore systems with deep cache hierarchies. We show that despite its simple design, CaPS-SA outperforms existing state-of-the-art parallel SA and LCP-array construction algorithms on modern hardware. Finally, motivated by applications in modern aligners where the query strings have bounded lengths, we introduce the notion of a bounded-context SA and show that CaPS-SA can easily be extended to exploit this structure to obtain further speedups. © Jamshed Khan, Tobias Rubel, Laxman Dhulipala, Erin Molloy, and Rob Patro;licensed under Creative Commons License CC-BY 4.0.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Accelerating massive queries of approximate nearest neighbor search on high-dimensional data

引用

KNOWLEDGE AND INFORMATION SYSTEMS 2023年第10期65卷 4185-4212页

作者： Liu, Yingfan Song, Chaowei Cheng, Hong Xia, Xiaofang Cui, Jiangtao Xidian Univ Sch Comp Sci & Technol Xian Shaanxi Peoples R China Chinese Univ Hong Kong Hong Kong Peoples R China

Approximate nearest neighbor (ANN) search on high-dimensional data is a fundamental operation in many applications. In this paper, we study massive queries of ANN (MQ-ANN) search, which deals with a large number of queries simultaneously. To improve the throughput, we combine the parallel capacity of multi-core CPUs and the filtering power of the state-of-the-art index methods, i.e., proximity graphs. However, there are no solutions that exploit proximity graphs to handle MQ-ANN in parallel, except the one called query view, which simply assigns each query to a hardware thread but suffers from numerous cache misses. As the first attempt, we design efficient methods for MQ-ANN with proximity graphs and propose a novel scheduling mechanism called bridge view, which shares the same data access across multiple queries in order to reduce cache misses. Moreover, we extend our method to deal with MQ-ANN on large-scale data sets (e.g. 10(8) points). Finally, we conduct extensive experiments on real data sets to demonstrate the advantages of our method. According to our experimental results, bridge view significantly outperforms query view in various settings. In particular, bridge view with 8 hardware threads even outperforms query view with 24 hardware threads.

关键词： Massive queries Approximate nearest neighbor search High-dimensional data Proximity graphs parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Time and Space Optimal Massively parallel Algorithm for the 2-Ruling Set Problem 37

Time and Space Optimal Massively Parallel Algorithm for the ...

引用

37th International Symposium on Distributed Computing, DISC 2023

作者： Cambus, Mélanie Kuhn, Fabian Pai, Shreyas Uitto, Jara Aalto University Finland University of Freiburg Germany

ISBN: (纸本)9783959773010

In this work, we present a constant-round algorithm for the 2-ruling set problem in the Congested Clique model. As a direct consequence, we obtain a constant round algorithm in the MPC model with linear space-per-machine and optimal total space. Our results improve on the O(log log log n)-round algorithm by [HPS, DISC'14] and the O(log log ∆)-round algorithm by [GGKMR, PODC'18]. Our techniques can also be applied to the semi-streaming model to obtain an O(1)-pass algorithm. Our main technical contribution is a novel sampling procedure that returns a small subgraph such that almost all nodes in the input graph are adjacent to the sampled subgraph. An MIS on the sampled subgraph provides a 2-ruling set for a large fraction of the input graph. As a technical challenge, we must handle the remaining part of the graph, which might still be relatively large. We overcome this challenge by showing useful structural properties of the remaining graph and show that running our process twice yields a 2-ruling set of the original input graph with high probability. © Mélanie Cambus, Fabian Kuhn, Shreyas Pai, and Jara Uitto;licensed under Creative Commons License CC-BY 4.0.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Theoretically and Practically Efficient parallel Nucleus Decomposition (Abstract)

Theoretically and Practically Efficient Parallel Nucleus Dec...

引用

2023 ACM Workshop on Highlights of parallel Computing, HOPC 2023

作者： Shi, Jessica Dhulipala, Laxman Shun, Julian Massachusetts Institute of Technology CambridgeMA United States University of Maryland College ParkMD United States

来源：评论

学校读者我要写书评

暂无评论

A Heterogeneous KBA parallel Algorithm for the Cartesian Discrete Ordinates for Multizone Heterogeneous System 8

A Heterogeneous KBA Parallel Algorithm for the Cartesian Dis...

引用

8th International Conference on Computer and Communication Systems, ICCCS 2023

作者： Li, Runhua Liu, Jie National University of Defense Technology Science and Technology on Parallel and Distributed Processing Laboratory Changsha China

ISBN: (纸本)9781665456128

Innovations in powerful high-performance computing (HPC) architecture are enabling high-fidelity whole-core neutron transport simulations at reasonable time. Especially, the currently fashionable heterogeneous architectures make the cost of such simulations at very low level. Neutron distribution of a reactor core is governed by the Boltzmann neutron transport equation (BTE), first viable solutions of which need tremendous computer resources. Among of the high-fidelity numerical methods, the discrete ordinates method (SN) is becoming popular in the reaction design community by taking a good balance between computational cost and accuracy. Recently, MT-3000, which is a multizone heterogeneous architecture with a peak double precision performance of 11.6 TFLOPS, is proposed. In this work, the BTE is solved by the SN with heterogenous Koch-Baker-Alcouffe (KBA) parallel algorithms based on the MT-3000 architecture. A communication mechanism has been established to efficiently transmit data among the acceleration cores and the CPU cores. The kernel computation procedure is largely accelerated by the vectorization and instruction pipelining techniques. Numerical experiments show that our formulation could achieve 1.37 TFLOPs with single MT-3000, that is 11.8% of its peak performance. © 2023 IEEE.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Fast and Robust parallel Simplification Algorithm for Triangular Mesh 3

Fast and Robust Parallel Simplification Algorithm for Triang...

引用

3rd International Conference on Computer Graphics, Image, and Virtualization, ICCGIV 2023

作者： Yan, Zhenqi Yang, Li Wang, Song College of Information Engineering China Jiliang University Hangzhou China VoxelDance Company Shanghai China

ISBN: (纸本)9781510671720

Mesh simplification is a fundamental problem in geometry processing. Since general simplification algorithms are difficult to parallelize, the main challenge is to process meshes of tens of millions of faces with fast and low memory consumption and maintain high-quality output. In this paper, we propose a multi-threaded algorithmic framework for mesh simplification. First, we design a robust and fast serial simplification model based on edge collapsing with low memory consumption. We implement a simplified algorithm based on Probabilistic QEM, and we take strict measures to protect the mesh topology as well as a greedy strategy to speed up the algorithm. Then we design a parallel simplification algorithm framework based on the idea of divide-and-conquer followed by global optimization. This method can execute the algorithm much faster with the same memory consumption as the serial method and maintain high-quality output results. Experiments show that our parallel algorithm outperforms current open-source software in terms of speed and memory consumption, and maintains good output for all models tested. © 2023 SPIE.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel Strong Connectivity Based on Faster Reachability (Abstract)

Parallel Strong Connectivity Based on Faster Reachability (A...

引用

2023 ACM Workshop on Highlights of parallel Computing, HOPC 2023

作者： Wang, Letong Dong, Xiaojun Gu, Yan Sun, Yihan University of California Riverside RiversideCA United States

In this paper, we propose a parallel strongly connected components (SCC) implementation that is efficient on a wide range of graphs. Our speedup comes from two novel techniques: vertical granularity control (VGC) and ... 详细信息

ISBN: (纸本)9798400702181

关键词： graph algorithms graph analytics parallel algorithms reachability strong connectivity

来源：评论

学校读者我要写书评

暂无评论

Fast parallel algorithms for Edge-Switching to Achieve a Target Visit Rate in Heterogeneous Graphs 43

Fast Parallel Algorithms for Edge-Switching to Achieve a Tar...

引用

43rd Annual International Conference on parallel Processing (ICPP)

作者： Bhuiyan, Hasanuzzaman Chen, Jiangzhuo Khan, Maleq Marathe, Madhav V. Virginia Tech Dept Comp Sci Blacksburg VA 24061 USA Virginia Tech Virginia Bioinformat Inst Network Dynam & Simulat Sci Lab Blacksburg VA 24061 USA

ISBN: (纸本)9781479956180

An edge switch is an operation on a network (graph) where two edges are selected randomly and one of their end vertices are swapped with each other. Usually, a sequence of these operations are performed to generate network perturbations having the same degree sequence of the original network. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks (e.g., peer-to-peer networks), studying various dynamic phenomena over a network (e.g., disease dynamics over a social contact network). The growth of real-world networks motivates the need to develop efficient parallel algorithms for performing a large sequence of edge switch operations. The dependencies among successive edge switch operations and the requirement of keeping the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors. In this paper, we present a distributed memory parallel algorithm for switching edges in massive networks (networks with billions of edges) and achieve a speedup factor of 85 with 1024 processors. One of the steps in our edge switch algorithm requires the computation of multinomial random variables in parallel. The paper presents the first non-trivial parallel algorithm for the problem. The algorithm achieves a speedup of 925 using 1024 processors.

关键词： edge switch massive networks multinomial distribution parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

High-performance parallel implementations of flow accumulation algorithms for multicore architectures

引用

COMPUTERS & GEOSCIENCES 2021年 151卷 104741-104741页

作者： Kotyra, Bartlomiej Chabudzinski, Lukasz Stpiczynski, Przemyslaw Marie Curie Sklodowska Univ Inst Comp Sci Ul Akad 9 PL-20031 Lublin Poland Marie Curie Sklodowska Univ Inst Earth & Environm Sci Al Krasnicka 2d PL-20718 Lublin Poland

The calculation of flow accumulation is one of the tasks in digital terrain analysis that is not easy to parallelize. The aim of this work was to develop new, faster ways to calculate flow accumulation and achieve shorter execution times than popular software tools for this purpose. We prepared six implementations of algorithms based on both top-down and bottom-up approaches and compared their performance using 118 different data sets (including 59 subcatchments and 59 full frames) of various sizes but the same area and resolution. Our results clearly show that the parallel top-down algorithm (without the use of OpenMP tasks) is the most suitable implementation for flow accumulation calculations of all we have tested. The mean and median execution times of this algorithm are the shortest in all cases studied. The implementation is characterized by high speedups. The execution times of the parallel top-down implementation are two orders of magnitude shorter compared to the Flow Accumulation tool from ArcGIS Desktop. This is important, considering the performance of popular GIS platforms, where it takes hours to perform the same kind of operations with the use of similar equipment.

关键词： Flow accumulation parallel algorithms OpenMP Multicore processors Manycore architectures GIS

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：