检索结果-内蒙古大学图书馆

Theoretically Efficient parallel graph algorithms Can Be Fast and Scalable

ACM TRANSACTIONS ON parallel COMPUTING 2021年第1期8卷 1–70页

作者： Dhulipala, Laxman Blelloch, Guy E. Shun, Julian MIT CSAIL 32 Vassar St Cambridge MA 02139 USA Carnegie Mellon Univ Comp Sci Dept Pittsburgh PA 15213 USA

There has been significant recent interest in parallel graph processing due to the need to quickly analyze the large graphs available today. Many graph codes have been designed for distributed memory or external memory. However, today even the largest publicly-available real-world graph (the Hyperlink Web graph with over 3.5 billion vertices and 128 billion edges) can fit in the memory of a single commodity multicore server. Nevertheless, most experimental work in the literature report results on much smaller graphs, and the ones for the Hyperlink graph use distributed or external memory. Therefore, it is natural to ask whether we can efficiently solve a broad class of graph problems on this graph in memory. This paper shows that theoretically-efficient parallel graph algorithms can scale to the largest publicly available graphs using a single machine with a terabyte of RAM, processing them in minutes. We give implementations of theoretically-efficient parallel algorithms for 20 important graph problems. We also present the interfaces, optimizations, and graph processing techniques that we used in our implementations, which were crucial in enabling us to process these large graphs quickly. We show that the running times of our implementations outperform existing state-of-the-art implementations on the largest real-world graphs. For many of the problems that we consider, this is the first time they have been solved on graphs at this scale. We have made the implementations developed in this work publicly-available as the graph Based Benchmark Suite (GBBS).

关键词： parallel graph algorithms parallel graph processing

来源：评论

学校读者我要写书评

暂无评论

Theoretically Efficient parallel graph algorithms Can Be Fast and Scalable 18

Theoretically Efficient Parallel Graph Algorithms Can Be Fas...

引用

30th ACM Symposium on parallelism in algorithms and Architectures (SPAA)

作者： Dhulipala, Laxman Blelloch, Guy E. Shun, Julian Carnegie Mellon Univ Pittsburgh PA 15213 USA MIT CSAIL Cambridge MA USA

ISBN: (纸本)9781450357999

There has been significant recent interest in parallel graph processing due to the need to quickly analyze the large graphs available today. Many graph codes have been designed for distributed memory or external memory. However, today even the largest publicly-available real-world graph (the Hyperlink Web graph with over 3.5 billion vertices and 128 billion edges) can fit in the memory of a single commodity multicore server. Nevertheless, most experimental work in the literature report results on much smaller graphs, and the ones for the Hyperlink graph use distributed or external memory. Therefore, it is natural to ask whether we can efficiently solve a broad class of graph problems on this graph in memory. This paper shows that theoretically-efficient parallel graph algorithms can scale to the largest publicly-available graphs using a single machine with a terabyte of RAM, processing them in minutes. We give implementations of theoretically-efficient parallel algorithms for 13 important graph problems. We also present the optimizations and techniques that we used in our implementations, which were crucial in enabling us to process these large graphs quickly. We show that the running times of our implementations outperform existing state-of-the-art implementations on the largest real-world graphs. For many of the problems that we consider, this is the first time they have been solved on graphs at this scale. We provide a publicly-available benchmark suite containing our implementations.

关键词： work-efficiency multicore graph processing shared-memory parallel graph algorithms

来源：评论

学校读者我要写书评

暂无评论

High-performance and balanced parallel graph coloring on multicore platforms

引用

JOURNAL OF SUPERCOMPUTING 2023年第6期79卷 6373-6421页

作者： Giannoula, Christina Peppas, Athanasios Goumas, Georgios Koziris, Nectarios Natl Tech Univ Athens Sch Elect & Comp Engn Athens Greece

graph coloring is widely used to parallelize scientific applications by identifying subsets of independent tasks that can be executed simultaneously. graph coloring assigns colors the vertices of a graph, such that no adjacent vertices have the same color. The number of colors used corresponds to the number of parallel steps in a real-world end-application. Therefore, the total runtime of the graph coloring kernel adds to the overall parallel overhead of the real-world end-application, whereas the number of the vertices of each color class determines the number of the independent concurrent tasks of each parallel step, thus affecting the amount of parallelism and hardware resource utilization in the execution of the real-world end-application. In this work, we propose a high-performance graph coloring algorithm, named ColorTM, that leverages Hardware Transactional Memory (HTM) to detect coloring inconsistencies between adjacent vertices. ColorTM detects and resolves coloring inconsistencies between adjacent vertices with an eager approach to minimize data access costs, and implements a speculative synchronization scheme to minimize synchronization costs and increase parallelism. We extend our proposed algorithmic design to propose a balanced graph coloring algorithm, named BalColorTM, with which all color classes include almost the same number of vertices to achieve high parallelism and resource utilization in the execution of the real-world endapplications. We evaluate ColorTM and BalColorTM using a wide variety of large real-world graphs with diverse characteristics. ColorTM and BalColorTM improve performance by 12.98x and 1.78x on average using 56 parallel threads compared to prior state-of-the-art approaches. Moreover, we study the impact of our proposed graph coloring algorithmic designs on a popular end-application, i.e., Community Detection, and demonstrate the ColorTM and BalColorTM can provide high-performance improvements in real-world end-applications acr

关键词： parallel graph coloring Balanced graph coloring Distance-1 coloring Pagerank Community detection Multicore architectures Shared memory architectures parallel graph algorithms High-performance computing HPC

来源：评论

学校读者我要写书评

暂无评论

GreediRIS: Scalable influence maximization using distributed streaming maximum cover

引用

JOURNAL OF parallel AND DISTRIBUTED COMPUTING 2025年 198卷

作者： Barik, Reet Cappa, Wade Ferdous, S. M. Minutoli, Marco Halappanavar, Mahantesh Kalyanaraman, Ananth Washington State Univ Pullman WA 99164 USA Pacific Northwest Natl Lab Richland WA 99354 USA

Influence maximization-the problem of identifying a subset of k influential seeds (vertices) in a network- is a classical problem in network science with numerous applications. The problem is NP-hard, but there exist efficient polynomial time approximations. However, scaling these algorithms still remain a daunting task due to the complexities associated with steps involving stochastic sampling and large-scale aggregations. In this paper, we present a new parallel distributed approximation algorithm for influence maximization with provable approximation guarantees. Our approach, which we call GreediRIS, leverages the RANDGREEDI framework-a state-of-the-art approach for distributed submodular optimization-for solving a step that computes a maximum k cover. GreediRIS combines distributed and streaming models of computations, along with pruning techniques, to effectively address the communication bottlenecks of the algorithm. Experimental results on up to 512 nodes (32K cores) of the NERSC Perlmutter supercomputer show that GreediRIS can achieve good strong scaling performance, preserve quality, and significantly outperform the other state-of-theart distributed implementations. For instance, on 512 nodes, the most performant variant of GreediRIS achieves geometric mean speedups of 28.99x and 36.35x for two different diffusion models, over a state-of-the-art parallel implementation. We also present a communication-optimized version of GreediRIS that further improves the speedups by two orders of magnitude.

关键词： Distributed influence maximization Distributed submodular maximization Streaming maximum k-cover parallel graph algorithms

来源：评论

学校读者我要写书评

暂无评论

DECENTRALIZED LOW-STRETCH TREES VIA LOW DIAMETER graph DECOMPOSITIONS

引用

SIAM JOURNAL ON COMPUTING 2024年第2期53卷 247-286页

作者： Becker, Ruben Emek, Yuval Ghaffari, Mohsen Lenzen, Christoph CaFoscari Univ Venice I-30123 Venice Italy Technion Israel Inst Technol IL-3200003 Haifa Israel MIT Cambridge MA 02139 USA CISPA Helmholtz Ctr Informat Secur Saarland Informat Campus D-66123 Saarbrucken Germany

We study the problem of approximating the distances in an undirected weighted graph G by the distances in trees based on the notion of stretch. Focusing on decentralized models of computation such as the CONGEST, PRAM, and semi-streaming models, our main results are as follows: (1) We develop a simple randomized algorithm that constructs a spanning tree such that the expected stretch of every edge is O(log(3) n), where n is the number of nodes in G. If G is unweighted, then this algorithm can be implemented to run in O(hop(G)) rounds in the CONGEST model, where hop(G) is the hop-diameter of G;thus our algorithm is asymptotically optimal in this case. In the weighted case, the run-time of the algorithm matches the currently best known bound for exact single source shortest path (SSSP) computations, which despite recent progress is still separated from the lower bound of Omega(root n + hop(G)) by polynomial factors. A naive attempt to replace exact SSSP computations with approximate ones in order to improve the complexity in the weighted case encounters a fundamental challenge, as the underlying decomposition technique fails to work under distance approximation. (2) We overcome this obstacle by developing a technique termed blurry ball growing. This technique, in combination with a clever algorithmic idea of Miller, Peng, and Xu (SPAA 2013), allows us to obtain low diameter graph decompositions with small edge cutting probabilities based solely on approximate SSSP computations. (3) Using these decompositions, we in turn obtain metric tree embedding algorithms in the vein of the celebrated work of Bartal (FOCS 1996), whose computational complexity is optimal up to polylogarithmic factors not only in the CONGEST model but also in the PRAM and semi-streaming models. Our embeddings have the additional useful property that the tree can be mapped back to the original graph such that each edge is "used" only logarithmically many times. This property is of interest for capaci

关键词： distributed graph algorithms parallel graph algorithms (semi-)streaming graph algorithms metric tree embeddings low-stretch trees graph decompositions

来源：评论

学校读者我要写书评

暂无评论

Scalable High-Performance Community Detection Using Label Propagation in Massive Networks 16th

Scalable High-Performance Community Detection Using Label Pr...

引用

16th International Conference on Social Networks Analysis and Mining

作者： Boddu, Sharon Khan, Maleq Texas A&M Univ Dept Elect Engn & Comp Sci Kingsville TX 78363 USA

ISBN: (纸本)9783031785405;9783031785412

Community detection is the problem of finding naturally forming clusters in networks. It is an important problem in mining and analyzing social and other complex networks. Community detection can be used to analyze complex systems in the real world and has applications in many areas, including network science, data mining, and computational biology. Label propagation is a community detection method that is simpler and faster than other methods such as Louvain, InfoMap, and spectral-based approaches. Some real-world networks can be very large and have billions of nodes and edges. Sequential algorithms might not be suitable for dealing with such large networks. This paper presents distributed-memory and hybrid parallel community detection algorithms based on the label propagation method. We incorporated novel optimizations and communication schemes, leading to very efficient and scalable algorithms. We also discuss various load-balancing schemes and present their comparative performances. These algorithms have been implemented and evaluated using large high-performance computing systems. Our hybrid algorithm is scalable to thousands of processors and has the capability to process massive networks. This algorithm was able to detect communities in the Metaclust50 network, a massive network with 282 million nodes and 42 billion edges, in 654 s using 4096 processors.

关键词： Community detection parallel graph algorithms network analysis graph mining

来源：评论

学校读者我要写书评

暂无评论

Fast parallel algorithms for Enumeration of Simple, Temporal, and Hop-constrained Cycles

引用

ACM TRANSACTIONS ON parallel COMPUTING 2023年第3期10卷 1-35页

作者： Blanusa, Jovan Atasu, Kubilay Ienne, Paolo IBM Res Europe Zurich Saumerstr 4 CH-8803 Ruschlikon Switzerland Ecole Polytech Fed Lausanne Sch Comp & Commun Sci Route Cantonale CH-1015 Lausanne Switzerland Ecole Polytech Fed Lausanne Sch Comp & Commun Sci Lausanne Switzerland

Cycles are one of the fundamental subgraph patterns and being able to enumerate them in graphs enables important applications in a wide variety of fields, including finance, biology, chemistry, and network science. However, to enable cycle enumeration in real-world applications, efficient parallel algorithms are required. In this work, we propose scalable parallelisation of state-of-the-art sequential algorithms for enumerating simple, temporal, and hop-constrained cycles. First, we focus on the simple cycle enumeration problem and parallelise the algorithms by Johnson and by Read and Tarjan in a fine-grained manner. We theoretically show that our resulting fine-grained parallel algorithms are scalable, with the fine-grained parallel ReadTarjan algorithm being strongly scalable. In contrast, we show that straightforward coarse-grained parallel versions of these simple cycle enumeration algorithms that exploit edge- or vertex-level parallelism are not scalable. Next, we adapt our fine-grained approach to enable the enumeration of cycles under time-window, temporal, and hop constraints. Our evaluation on a cluster with 256 CPU cores that can execute up to 1,024 simultaneous threads demonstrates a near-linear scalability of our fine-grained parallel algorithms when enumerating cycles under the aforementioned constraints. On the same cluster, our fine-grained parallel algorithms achieve, on average, one order of magnitude speedup compared to the respective coarse-grained parallel versions of the state-of-the-art algorithms for cycle enumeration. The performance gap between the fine-grained and the coarse-grained parallel algorithms increases as we use more CPU cores.

关键词： Cycle enumeration parallel graph algorithms graph pattern mining

来源：评论

学校读者我要写书评

暂无评论

FuseIM: Fusing Probabilistic Traversals for Influence Maximization on Exascale Systems 24

FuseIM: Fusing Probabilistic Traversals for Influence Maximi...

引用

38th ACM International Conference on Supercomputing (ACM ICS)

作者： Neff, Reece Zarch, Mostafa Eghbali Minutoli, Marco Halappanavar, Mahantesh Tumeo, Antonino Kalyanaraman, Ananth Becchi, Michela North Carolina State Univ Raleigh NC 27695 USA Pacific Northwest Natl Lab Richland WA USA Washington State Univ Pullman WA USA

ISBN: (纸本)9798400706103

Probabilistic breadth-first traversals (BPTs) are used in many network science and graph machine learning applications. In this paper, we are motivated by the application of BPTs in stochastic diffusion-based graph problems such as influence maximization. These applications heavily rely on BPTs to implement a Monte-Carlo sampling step for their approximations. Given the large sampling complexity, stochasticity of the diffusion process, and the inherent irregularity in real-world graph topologies, efficiently parallelizing these BPTs remains significantly challenging. In this paper, we present a new algorithm to fuse a massive number of concurrently executing BPTs with random starts on the input graph. Our algorithm is designed to fuse BPTs by combining separate probabilistic traversals into a unified frontier. To show the general applicability of the fused BPT technique, we have incorporated it into two state-of-the-art influence maximization parallel implementations (gIM and Ripples). Our experiments on up to 4K nodes of the OLCF Frontier supercomputer (32, 768 GPUs and 196K CPU cores) show strong scaling behavior, and that fused BPTs can improve the performance of these implementations up to 182.13x (avg. 75.15x) and 359.86x (avg. 135.17x) for gIM and Ripples, respectively.

关键词： parallel graph algorithms Influence Maximization

来源：评论

学校读者我要写书评

暂无评论

Deterministic and Low-Span Work-Efficient parallel Batch-Dynamic Trees 24

Deterministic and Low-Span Work-Efficient Parallel Batch-Dyn...

引用

36th ACM Symposium on parallelism in algorithms and Architectures (SPAA)

作者： Anderson, Daniel Blelloch, Guy E. Carnegie Mellon Univ Pittsburgh PA 15213 USA

ISBN: (纸本)9798400704161

Dynamic trees are a well-studied and fundamental building block of dynamic graph algorithms dating back to the seminal work of Sleator and Tarjan [STOC'81, (1981), pp. 114-122]. The problem is to maintain a tree subject to online edge insertions and deletions while answering queries about the tree, such as the heaviest weight on a path, etc. In the parallel batch-dynamic setting, the goal is to process batches of edge updates work efficiently in low (polylog n) span. Two work-efficient algorithms are known: batch-parallel Euler Tour Trees by Tseng et al. [ALENEX'19, (2019), pp. 92-106] and parallel Rake-Compress (RC) Trees by Acar et al. [ESA'20, (2020), pp. 2:1-2:23]. Both however are randomized and work efficient in expectation. Several downstream results that use these data structures (and indeed to the best of our knowledge, all known workefficient parallel batch-dynamic graph algorithms) are therefore also randomized. In this work, we give the first deterministic work-efficient solution to the problem. Our algorithm maintains a parallel RC-Tree on n vertices subject to batches of k edge updates deterministically in worst-case O(k log(1 + n/k)) work and O(log n log log k) span on the Common-CRCW PRAM. We also show how to improve the span of the randomized algorithm from O(log n log* n) to O(log n). Lastly, as a result of our new deterministic algorithm, we also derandomize several downstream results that make use of parallel batch-dynamic dynamic trees, previously for which the only efficient solutions were randomized.

关键词： dynamic trees parallel graph algorithms batch-dynamic algorithms

来源：评论

学校读者我要写书评

暂无评论

Batch Updates of Distributed Streaming graphs using Linear Algebra

Batch Updates of Distributed Streaming Graphs using Linear A...

引用

2024 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC Workshops 2024

作者： Hassani, Elaheh Hussain, Md Taufique Azad, Ariful Indiana University Dept. of Intelligent Systems Engineering BloomingtonIN United States

ISBN: (纸本)9798350355543

We develop a distributed-memory parallel algorithm for performing batch updates on streaming graphs, where vertices and edges are continuously added or removed. Our algorithm leverages distributed sparse matrices as the core data structures, utilizing equivalent sparse matrix operations to execute graph updates. By reducing unnecessary communication among processes and employing shared-memory parallelism, we accelerate updates of distributed graphs. Additionally, we maintain a balanced load in the output matrix by permuting the resultant matrix during the update process. We demonstrate that our streaming update algorithm is at least 25 times faster than alternative linear-algebraic methods and scales linearly up to 4,096 cores (32 nodes) on a Cray EX supercomputer. © 2024 IEEE.

关键词： batch graph updates distributed-memory algorithms parallel computing parallel graph algorithms scalability in graph processing sparse matrices streaming graphs

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：