检索结果-内蒙古大学图书馆

22nd Design, Automation and Test in Europe Conference and Exhibition (DATE)

作者： Kalyanaraman, Ananth Pande, Partha Pratim Washington State Univ Sch EECS Pullman WA 99164 USA

ISBN: (纸本)9783981926323

The notion of networks is inherent in the structure, function and behavior of the natural and engineered world that surround us. Consequently, graph models and methods have assumed a prominent role to play in this modern era of Big Data, and are taking a center stage in the discovery pipelines of various data-driven scientific domains. In this paper, we present a brief review of the state-of-the-art in parallel graph analytics, particularly focusing on iterative graph algorithms and their implementation on modern day multicore/manycore architectures. The class of iterative graph algorithms covers a broad class of graph operations of varying complexities, from simpler routines such as Breadth-First Search (BFS), to polynomially-solvable problems such as shortest path computations, to NP-Hard problems such as community detection and graph coloring. We cover a set of common algorithmic abstractions used in implementing such iterative graph algorithms, state the challenges around parallelization on contemporary parallel platforms (including commodity multicores and emerging manycore platforms), and describe a set of approaches that have led to efficient implementations. We also report on advances in manycore architectural frameworks that have found application in parallel graph analytics. We conclude the paper identifying potential research directions, opportunities, and challenges that lay ahead in the path toward enabling graph analytics at exascale.

关键词： parallel graph algorithms parallel architectures irregular applications extreme-scale computing

来源：评论

学校读者我要写书评

暂无评论

parallel Heuristics for Scalable Community Detection 28

Parallel Heuristics for Scalable Community Detection

引用

28th IEEE International parallel & Distributed Processing Symposium Workshops (IPDPSW)

作者： Lu, Hao Kalyanaraman, Ananth Halappanavar, Mahantesh Choudhury, Sutanay Washington State Univ Sch Elect Engn & Comp Sci Pullman WA 99164 USA Pacific Northwest Natl Lab Computat Sci & Math Div Richland WA 99352 USA

ISBN: (纸本)9781479941162

Community detection has become a fundamental operation in numerous graph-theoretic applications. It is used to reveal natural divisions that exist within real world networks without imposing prior size or cardinality constraints on the set of communities. Despite its potential for application, there is only limited support for community detection on large-scale parallel computers, largely owing to the irregular and inherently sequential nature of the underlying heuristics. In this paper, we present parallelization heuristics for fast community detection using the Louvain method as the serial template. The Louvain method is an iterative heuristic for modularity optimization. Originally developed by Blondel et al. in 2008, the method has become increasingly popular owing to its ability to detect high modularity community partitions in a fast and memory-efficient manner. However, the method is also inherently sequential, thereby limiting its scalability. Here, we observe certain key properties of this method that present challenges for its parallelization, and consequently propose heuristics that are designed to break the sequential barrier. For evaluation purposes, we implemented our heuristics using OpenMP multithreading, and tested them over real world graphs derived from multiple application domains (e.g., internet, citation, biological). Compared to the serial Louvain implementation, our parallel implementation is able to produce community outputs with a higher modularity for most of the inputs tested, in comparable number of iterations, while providing real speedups of up to 8x using 32 threads. In addition, our parallel implementation was able to exhibit weak scaling properties on up to 32 threads.

关键词： Community detection graph coloring Louvain method parallel graph algorithms parallel heuristics

来源：评论

学校读者我要写书评

暂无评论

AM++: A Generalized Active Message Framework 10

AM++: A Generalized Active Message Framework

引用

19th International Conference on parallel Architectures and Compilation Techniques

作者： Willcock, Jeremiah J. Hoefler, Torsten Edmonds, Nicholas G. Lumsdaine, Andrew Indiana Univ Bloomington IN 47401 USA

ISBN: (纸本)9781450301787

Active messages have proven to be an effective approach for certain communication problems in high performance computing. Many MPI implementations, as well as runtimes for Partitioned Global Address Space languages, use active messages in their low-level transport layers. However, most active message frameworks have low-level programming interfaces that require significant programming effort to use directly in applications and that also prevent optimization opportunities. In this paper we present AM++, a new user-level library for active messages based on generic programming techniques. Our library allows message handlers to be run in an explicit loop that can be optimized and vectorized by the compiler and that can also be executed in parallel on multicore architectures. Runtime optimizations, such as message combining and filtering, are also provided by the library, removing the need to implement that functionality at the application level. Evaluation of AM++ with distributed-memory graph algorithms shows the usability benefits provided by these library features, as well as their performance advantages.

关键词： Active Messages parallel graph algorithms parallel Programming Interfaces

来源：评论

学校读者我要写书评

暂无评论

Scaling graph Community Detection on the Tilera Many-core Architecture 21

Scaling Graph Community Detection on the Tilera Many-core Ar...

引用

21st International Conference on High Performance Computing (HiPC)

作者： Chavarria-Miranda, Daniel Halappanavar, Mahantesh Kalyanaraman, Ananth Pacific Northwest Natl Lab High Performance Comp Richland WA 99352 USA Washington State Univ Sch Elect Engn & Comp Sci Pullman WA 99164 USA

ISBN: (纸本)9781479959754

In an era when power constraints and data movement are proving to be significant barriers for the application of high-end computing, the Tilera many-core architecture offers a low-power platform exhibiting many important characteristics of future systems, including a large number of simple cores, a sophisticated network-on-chip, and fine-grained control over memory and caching policies. While this emerging architecture has been previously studied for structured compute-intensive kernels, benchmarking the platform for data-bound, irregular applications present significant challenges that have remained unexplored. Community detection is an advanced prototypical graph-theoretic operation with applications in numerous scientific domains including life sciences, cyber security, and power systems. In this work, we explore multiple design strategies toward developing a scalable tool for community detection on the Tilera platform. Using several memory layout and work scheduling techniques we demonstrate speedups of up to 47x on 36 cores of the Tilera TileGX36 platform over the best serial implementation, and also show results that have comparable quality and performance to mainstream x86 platforms. To the best of our knowledge this is the first work addressing graph algorithms on the Tilera platform. This study demonstrates that through careful design space exploration, low-power many-core platforms like Tilera can be effectively exploited for graph algorithms that embody all the essential characteristics of an irregular application.

关键词： Tilera community detection many-core parallel graph algorithms

来源：评论

学校读者我要写书评

暂无评论

Towards a graphBLAS Library in Chapel 31

Towards a GraphBLAS Library in Chapel

引用

31st IEEE International parallel and Distributed Processing Symposium Workshops (IPDPS)

作者： Azad, Ariful Buluc, Aydin Lawrence Berkeley Natl Lab Computat Res Div Berkeley CA 94720 USA

ISBN: (纸本)9780769561493

The adoption of a programming language is positively influenced by the breadth of its software libraries. Chapel is a modern and relatively young parallel programming language. Consequently, not many domain-specific software libraries exists that are written for Chapel. graph processing is an important domain with many applications in cyber security, energy, social networking, and health. Implementing graph algorithms in the language of linear algebra enables many advantages including rapid development, flexibility, high-performance, and scalability. graphBLAS initiative aims to standardize an interface for linear-algebraic primitives for graph computations. This paper presents initial experiences and findings of implementing a subset of important graphBLAS operations in Chapel. We analyzed the bottlenecks in both shared and distributed memory. We also provided alternative implementations whenever the default implementation lacked performance or scaling.

关键词： Chapel graphBLAS parallel graph algorithms PGAS

来源：评论

学校读者我要写书评

暂无评论

NetworKit: A tool suite for large-scale complex network analysis

引用

NETWORK SCIENCE 2016年第4期4卷 508-530页

作者： Staudt, Christian L. Sazonovs, Aleksejs Meyerhenke, Henning Karlsruhe Inst Technol Inst Theoret Informat D-76131 Karlsruhe Germany Wellcome Trust Sanger Inst Wellcome Genome Campus Cambridge CB10 1SA England

We introduce NetworKit, an open-source software package for analyzing the structure of large complex networks. Appropriate algorithmic solutions are required to handle increasingly common large graph data sets containing up to billions of connections. We describe the methodology applied to develop scalable solutions to network analysis problems, including techniques like parallelization, heuristics for computationally expensive problems, efficient data structures, and modular software architecture. Our goal for the software is to package results of our algorithm engineering efforts and put them into the hands of domain experts. NetworKit is implemented as a hybrid combining the kernels written in C++ with a Python frontend, enabling integration into the Python ecosystem of tested tools for data analysis and scientific computing. The package provides a wide range of functionality (including common and novel analytics algorithms and graph generators) and does so via a convenient interface. In an experimental comparison with related software, NetworKit shows the best performance on a range of typical analysis tasks.

关键词： complex networks network analysis network science parallel graph algorithms data analysis software

来源：评论

学校读者我要写书评

暂无评论

Scalable Fine-Grained parallel Cycle Enumeration algorithms 22

Scalable Fine-Grained Parallel Cycle Enumeration Algorithms

引用

34th ACM Symposium on parallelism in algorithms and Architectures (SPAA)

作者： Blanusa, Jovan Ienne, Paolo Atasu, Kubilay IBM Res Europe Zurich Switzerland Ecole Polytechn Federale Lausanne EPFL Sch Comp & Commun Sci CH-1015 Lausanne Switzerland

ISBN: (纸本)9781450391467

Enumerating simple cycles has important applications in computational biology, network science, and financial crime analysis. In this work, we focus on parallelising the state-of-the-art simple cycle enumeration algorithms by Johnson and Read-Tarjan along with their applications to temporal graphs. To our knowledge, we are the first ones to parallelise these two algorithms in a fine-grained manner. We are also the first to demonstrate experimentally a linear performance scaling. Such a scaling is made possible by our decomposition of long sequential searches into fine-grained tasks, which are then dynamically scheduled across CPU cores, enabling an optimal load balancing. Furthermore, we show that coarse-grained parallel versions of the Johnson and the Read-Tarjan algorithms that exploit edge- or vertex-level parallelism are not scalable. On a cluster of four multi-core CPUs with 256 physical cores, our fine-grained parallel algorithms are, on average, an order of magnitude faster than their coarse-grained parallel counterparts. The performance gap between the fine-grained and the coarse-grained parallel algorithms widens as we use more CPU cores. When using all 256 CPU cores, our parallel algorithms enumerate temporal cycles, on average, 260x faster than the serial algorithm of Kumar and Calders. Code repository: https://***/IBM/parallel-cycle-enumeration

关键词： Cycle enumeration parallel graph algorithms graph mining

来源：评论

学校读者我要写书评

暂无评论

Log(graph): A Near-Optimal High-Performance graph Representation 18

Log(Graph): A Near-Optimal High-Performance Graph Representa...

引用

27th IEEE/ACM/IFIP International Conference on parallel Architectures and Compilation Techniques (PACT)

作者： Besta, Maciej Stanojevic, Dimitri Zivic, Tijana Singh, Jagpreet Hoerold, Maurice Hoefler, Torsten Swiss Fed Inst Technol Dept Comp Sci Zurich Switzerland

ISBN: (纸本)9781450359863

Today's graphs used in domains such as machine learning or social network analysis may contain hundreds of billions of edges. Yet, they are not necessarily stored efficiently, and standard graph representations such as adjacency lists waste a significant number of bits while graph compression schemes such as Webgraph often require time-consuming decompression. To address this, we propose Log(graph): a graph representation that combines high compression ratios with very low-overhead decompression to enable cheaper and faster graph processing. The key idea is to encode a graph so that the parts of the representation approach or match the respective storage lower bounds. We call our approach "graph logarithmization" because these bounds are usually logarithmic. Our high-performance Log(graph) implementation based on modern bitwise operations and state-of-the-art succinct data structures achieves high compression ratios as well as performance. For example, compared to the tuned graph Algorithm Processing Benchmark Suite (GAPBS), it reduces graph sizes by 20-35% while matching GAPBS' performance or even delivering speedups due to reducing amounts of transferred data. It approaches the compression ratio of the established Webgraph compression library while enabling speedups of up to more than 2x. Log(graph) can improve the design of various graph processing engines or libraries on single NUMA nodes as well as distributed-memory systems.

关键词： graph compression graph representation graph layout parallel graph algorithms ILP succinct data structures

来源：评论

学校读者我要写书评

暂无评论

Accelerating CUDA graph algorithms at Maximum Warp 11

Accelerating CUDA Graph Algorithms at Maximum Warp

引用

16th ACM Symposium on Principles and Practice of parallel Programming

作者： Hong, Sungpack Kim, Sang Kyun Oguntebi, Tayo Olukotun, Kunle Stanford Univ Comp Syst Lab Stanford CA 94305 USA

ISBN: (纸本)9781450301190

graphs are powerful data representations favored in many computational domains. Modern GPUs have recently shown promising results in accelerating computationally challenging graph problems but their performance suffers heavily when the graph structure is highly irregular, as most real-world graphs tend to be. In this study, we first observe that the poor performance is caused by work imbalance and is an artifact of a discrepancy between the GPU programming model and the underlying GPU architecture. We then propose a novel virtual warp-centric programming method that exposes the traits of underlying GPU architectures to users. Our method significantly improves the performance of applications with heavily imbalanced workloads, and enables trade-offs between workload imbalance and ALU underutilization for fine-tuning the performance. Our evaluation reveals that our method exhibits up to 9x speedup over previous GPU algorithms and 12x over single thread CPU execution on irregular graphs. When properly configured, it also yields up to 30% improvement over previous GPU algorithms on regular graphs. In addition to performance gains on graph algorithms, our programming method achieves 1.3x to 15.1x speedup on a set of GPU benchmark applications. Our study also confirms that the performance gap between GPUs and other multi-threaded CPU graph implementations is primarily due to the large difference in memory bandwidth.

关键词： algorithms Performance parallel graph algorithms CUDA GPGPU

来源：评论

学校读者我要写书评

暂无评论

Scalable parallel Minimum Spanning Forest Computation

Scalable Parallel Minimum Spanning Forest Computation

引用

17th ACM SIGPLAN Symposium on Principles and Practice of parallel Programming

作者： Nobari, Sadegh Cao, Thanh-Tung Karras, Panagiotis Bressan, Stephane Natl Univ Singapore Singapore Singapore Rutgers State Univ Piscataway NJ 08855 USA

The proliferation of data in graph form calls for the development of scalable graph algorithms that exploit parallel processing environments. One such problem is the computation of a graph's minimum spanning forest (MSF). Past research has proposed several parallel algorithms for this problem, yet none of them scales to large, high-density graphs. In this paper we propose a novel, scalable, parallel MSF algorithm for undirected weighted graphs. Our algorithm leverages Prim's algorithm in a parallel fashion, concurrently expanding several subsets of the computed MSF. Our effort focuses on minimizing the communication among different processors without constraining the local growth of a processor's computed subtree. In effect, we achieve a scalability that previous approaches lacked. We implement our algorithm in CUDA, running on a GPU and study its performance using real and synthetic, sparse as well as dense, structured and unstructured graph data. Our experimental study demonstrates that our algorithm outperforms the previous state-of-the-art GPU-based MSF algorithm, while being several order of magnitude faster than sequential CPU-based algorithms.

关键词： algorithms Experimentation Performance parallel graph algorithms Minimum Spanning Forest GPU

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：