检索结果-内蒙古大学图书馆

Theoretically Efficient parallel graph algorithms Can Be Fast and Scalable

ACM TRANSACTIONS ON parallel COMPUTING 2021年第1期8卷 1–70页

作者： Dhulipala, Laxman Blelloch, Guy E. Shun, Julian MIT CSAIL 32 Vassar St Cambridge MA 02139 USA Carnegie Mellon Univ Comp Sci Dept Pittsburgh PA 15213 USA

There has been significant recent interest in parallel graph processing due to the need to quickly analyze the large graphs available today. Many graph codes have been designed for distributed memory or external memory. However, today even the largest publicly-available real-world graph (the Hyperlink Web graph with over 3.5 billion vertices and 128 billion edges) can fit in the memory of a single commodity multicore server. Nevertheless, most experimental work in the literature report results on much smaller graphs, and the ones for the Hyperlink graph use distributed or external memory. Therefore, it is natural to ask whether we can efficiently solve a broad class of graph problems on this graph in memory. This paper shows that theoretically-efficient parallel graph algorithms can scale to the largest publicly available graphs using a single machine with a terabyte of RAM, processing them in minutes. We give implementations of theoretically-efficient parallel algorithms for 20 important graph problems. We also present the interfaces, optimizations, and graph processing techniques that we used in our implementations, which were crucial in enabling us to process these large graphs quickly. We show that the running times of our implementations outperform existing state-of-the-art implementations on the largest real-world graphs. For many of the problems that we consider, this is the first time they have been solved on graphs at this scale. We have made the implementations developed in this work publicly-available as the graph Based Benchmark Suite (GBBS).

关键词： parallel graph algorithms parallel graph processing

来源：评论

学校读者我要写书评

暂无评论

Theoretically Efficient parallel graph algorithms Can Be Fast and Scalable 18

Theoretically Efficient Parallel Graph Algorithms Can Be Fas...

引用

30th ACM Symposium on parallelism in algorithms and Architectures (SPAA)

作者： Dhulipala, Laxman Blelloch, Guy E. Shun, Julian Carnegie Mellon Univ Pittsburgh PA 15213 USA MIT CSAIL Cambridge MA USA

ISBN: (纸本)9781450357999

There has been significant recent interest in parallel graph processing due to the need to quickly analyze the large graphs available today. Many graph codes have been designed for distributed memory or external memory. However, today even the largest publicly-available real-world graph (the Hyperlink Web graph with over 3.5 billion vertices and 128 billion edges) can fit in the memory of a single commodity multicore server. Nevertheless, most experimental work in the literature report results on much smaller graphs, and the ones for the Hyperlink graph use distributed or external memory. Therefore, it is natural to ask whether we can efficiently solve a broad class of graph problems on this graph in memory. This paper shows that theoretically-efficient parallel graph algorithms can scale to the largest publicly-available graphs using a single machine with a terabyte of RAM, processing them in minutes. We give implementations of theoretically-efficient parallel algorithms for 13 important graph problems. We also present the optimizations and techniques that we used in our implementations, which were crucial in enabling us to process these large graphs quickly. We show that the running times of our implementations outperform existing state-of-the-art implementations on the largest real-world graphs. For many of the problems that we consider, this is the first time they have been solved on graphs at this scale. We provide a publicly-available benchmark suite containing our implementations.

关键词： work-efficiency multicore graph processing shared-memory parallel graph algorithms

来源：评论

学校读者我要写书评

暂无评论

Efficient parallel algorithms for Some graph Theory Problems

引用

Journal of Computer Science & Technology 1993年第4期8卷 362-366页

作者：马军马绍汉 Dept.of Computer Science Shandong UniversityJinan 250100

In this paper,a sequential algorithm computing the all vertex pair distance matrix D and the path matrix Pis *** a PRAM EREW model with p,1≤p≤n^2,processors,a parallel version of the sequential algorithm is *** method can also be used to get a parallel algorithm to compute transitive closure arrayof an undirected *** time complexify of the parallel algorithm is O(n^3/p).If D,P andare known,it is shown that the problems to find all connected components, to compute the diameter of an undirected graph,to determine the center of a directed graph and to search for a directed cycle with the minimum(maximum)length in a directed graph can all be solved in O(n^2/p^+ logp)time.

关键词： parallel graph algorithms shortest paths transitive closure connected components diameter of graph center of graph directed cycle with the minimum (maximum)length parallel random access machines (PRAMs)

来源：评论

学校读者我要写书评

暂无评论

Lifting sequential graph algorithms for distributed-memory parallel computation

Lifting sequential graph algorithms for distributed-memory p...

引用

20th Conference on Object-Oriented Programming, Systems, Languages and Applications

作者： Gregor, D Lumsdaine, A Indiana Univ Open Syst Lab Bloomington IN 47405 USA

This paper describes the process used to extend the Boost graph Library (BGL) for parallel operation with distributed memory. The BGL consists of a rich set of generic graph algorithms and supporting data structures, but it was not originally designed with parallelism in mind. In this paper, we revisit the abstractions comprising the BGL in the context of distributed-memory parallelism, lifting away the implicit requirements of sequential execution and a single shared address space. We illustrate our approach by describing the process as applied to one of the core algorithms in the BGL, breadth-first search. The result is a generic algorithm that is unchanged from the sequential algorithm, requiring only the introduction of external (distributed) data structures for parallel execution. More importantly, the generic implementation retains its interface and semantics, such that other distributed algorithms can be built upon it, just as algorithms are layered in the sequential case. By characterizing these extensions as well as the extension process, we develop general principles and patterns for using (and reusing) generic, object-oriented parallel software libraries. We demonstrate that the resulting algorithm implementations are both efficient and scalable with performance results for several algorithms.

关键词： algorithms design generic programming parallel graph algorithms distributed computing

来源：评论

学校读者我要写书评

暂无评论

Scalable High-Performance Community Detection Using Label Propagation in Massive Networks 16th

Scalable High-Performance Community Detection Using Label Pr...

引用

16th International Conference on Social Networks Analysis and Mining

作者： Boddu, Sharon Khan, Maleq Texas A&M Univ Dept Elect Engn & Comp Sci Kingsville TX 78363 USA

ISBN: (纸本)9783031785405;9783031785412

Community detection is the problem of finding naturally forming clusters in networks. It is an important problem in mining and analyzing social and other complex networks. Community detection can be used to analyze complex systems in the real world and has applications in many areas, including network science, data mining, and computational biology. Label propagation is a community detection method that is simpler and faster than other methods such as Louvain, InfoMap, and spectral-based approaches. Some real-world networks can be very large and have billions of nodes and edges. Sequential algorithms might not be suitable for dealing with such large networks. This paper presents distributed-memory and hybrid parallel community detection algorithms based on the label propagation method. We incorporated novel optimizations and communication schemes, leading to very efficient and scalable algorithms. We also discuss various load-balancing schemes and present their comparative performances. These algorithms have been implemented and evaluated using large high-performance computing systems. Our hybrid algorithm is scalable to thousands of processors and has the capability to process massive networks. This algorithm was able to detect communities in the Metaclust50 network, a massive network with 282 million nodes and 42 billion edges, in 654 s using 4096 processors.

关键词： Community detection parallel graph algorithms network analysis graph mining

来源：评论

学校读者我要写书评

暂无评论

Lifting sequential graph algorithms for distributed-memory parallel computation

引用

ACM SIGPLAN NOTICES 2005年第10期40卷 423-437页

作者： Gregor, D Lumsdaine, A Indiana Univ Open Syst Lab Bloomington IN 47405 USA

关键词： algorithms design generic programming parallel graph algorithms distributed computing

来源：评论

学校读者我要写书评

暂无评论

What structural features make graph problems to have efficient parallel algorithms? Using outerplanar graphs, trapezoid graphs and in-tournament graphs as examples

引用

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS 2000年第3期E83D卷 541-549页

作者： Masuyama, S Nakayama, S Toyohashi Univ Technol Dept Knowledge Based Informat Engn Toyohashi Aichi 4418580 Japan Univ Tokushima Fac Integral Arts & Sci Dept Math Sci Tokushima 7708502 Japan

This paper analyzes what structural features of graph problems allow efficient parallel algorithms. We survey some parallel algorithms for typical problems on three kinds of graphs, outerplanar graphs, trapezoid graphs and in-tournament graphs. Our results on the shortest path problem, the longest path problem and the maximum flow problem on outerplanar graphs, the minimum-weight connected dominating set problem and the coloring problem on trapezoid graphs and Hamiltonian path and Hamiltonian cycle problem on in-tournament graphs are adopted as working examples.

关键词： parallel graph algorithms structure and complexity outerplanar graph trapezoid graph in-tournament graph

来源：评论

学校读者我要写书评

暂无评论

Lifting sequential graph algorithms for distributed-memory parallel computation 05

Lifting sequential graph algorithms for distributed-memory p...

引用

Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications

作者： Douglas Gregor Andrew Lumsdaine Indiana University Bloomington IN

ISBN: (纸本)9781595930316

关键词： distributed computing generic programming parallel graph algorithms

来源：评论

学校读者我要写书评

暂无评论

GreediRIS: Scalable influence maximization using distributed streaming maximum cover

引用

JOURNAL OF parallel AND DISTRIBUTED COMPUTING 2025年 198卷

作者： Barik, Reet Cappa, Wade Ferdous, S. M. Minutoli, Marco Halappanavar, Mahantesh Kalyanaraman, Ananth Washington State Univ Pullman WA 99164 USA Pacific Northwest Natl Lab Richland WA 99354 USA

Influence maximization-the problem of identifying a subset of k influential seeds (vertices) in a network- is a classical problem in network science with numerous applications. The problem is NP-hard, but there exist efficient polynomial time approximations. However, scaling these algorithms still remain a daunting task due to the complexities associated with steps involving stochastic sampling and large-scale aggregations. In this paper, we present a new parallel distributed approximation algorithm for influence maximization with provable approximation guarantees. Our approach, which we call GreediRIS, leverages the RANDGREEDI framework-a state-of-the-art approach for distributed submodular optimization-for solving a step that computes a maximum k cover. GreediRIS combines distributed and streaming models of computations, along with pruning techniques, to effectively address the communication bottlenecks of the algorithm. Experimental results on up to 512 nodes (32K cores) of the NERSC Perlmutter supercomputer show that GreediRIS can achieve good strong scaling performance, preserve quality, and significantly outperform the other state-of-theart distributed implementations. For instance, on 512 nodes, the most performant variant of GreediRIS achieves geometric mean speedups of 28.99x and 36.35x for two different diffusion models, over a state-of-the-art parallel implementation. We also present a communication-optimized version of GreediRIS that further improves the speedups by two orders of magnitude.

关键词： Distributed influence maximization Distributed submodular maximization Streaming maximum k-cover parallel graph algorithms

来源：评论

学校读者我要写书评

暂无评论

High-performance and balanced parallel graph coloring on multicore platforms

引用

JOURNAL OF SUPERCOMPUTING 2023年第6期79卷 6373-6421页

作者： Giannoula, Christina Peppas, Athanasios Goumas, Georgios Koziris, Nectarios Natl Tech Univ Athens Sch Elect & Comp Engn Athens Greece

graph coloring is widely used to parallelize scientific applications by identifying subsets of independent tasks that can be executed simultaneously. graph coloring assigns colors the vertices of a graph, such that no adjacent vertices have the same color. The number of colors used corresponds to the number of parallel steps in a real-world end-application. Therefore, the total runtime of the graph coloring kernel adds to the overall parallel overhead of the real-world end-application, whereas the number of the vertices of each color class determines the number of the independent concurrent tasks of each parallel step, thus affecting the amount of parallelism and hardware resource utilization in the execution of the real-world end-application. In this work, we propose a high-performance graph coloring algorithm, named ColorTM, that leverages Hardware Transactional Memory (HTM) to detect coloring inconsistencies between adjacent vertices. ColorTM detects and resolves coloring inconsistencies between adjacent vertices with an eager approach to minimize data access costs, and implements a speculative synchronization scheme to minimize synchronization costs and increase parallelism. We extend our proposed algorithmic design to propose a balanced graph coloring algorithm, named BalColorTM, with which all color classes include almost the same number of vertices to achieve high parallelism and resource utilization in the execution of the real-world endapplications. We evaluate ColorTM and BalColorTM using a wide variety of large real-world graphs with diverse characteristics. ColorTM and BalColorTM improve performance by 12.98x and 1.78x on average using 56 parallel threads compared to prior state-of-the-art approaches. Moreover, we study the impact of our proposed graph coloring algorithmic designs on a popular end-application, i.e., Community Detection, and demonstrate the ColorTM and BalColorTM can provide high-performance improvements in real-world end-applications acr

关键词： parallel graph coloring Balanced graph coloring Distance-1 coloring Pagerank Community detection Multicore architectures Shared memory architectures parallel graph algorithms High-performance computing HPC

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：