检索结果-内蒙古大学图书馆

12th International Conference on parallel Processing and Applied Mathematics (PPAM)

作者： Ryczkowska, Magdalena Nowicki, Marek Univ Warsaw Interdisciplinary Ctr Math & Computat Modeling Pawinskiego 5a PL-02106 Warsaw Poland Nicolaus Copernicus Univ Fac Math & Comp Sci Chopina 12-18 PL-87100 Torun Poland

ISBN: (纸本)9783319780542;9783319780535

Computations based on graphs are very common problems but complexity, increasing size of analyzed graphs and a huge amount of communication make this analysis a challenging task. In this paper, we present a comparison of two parallel BFS (Breath- First Search) implementations: MapReduce run on Hadoop infrastructure and in PGAS (Partitioned Global Address Space) model. The latter implementation has been developed with the help of the PCJ (parallel Computations in Java) - a library for parallel and distributed computations in Java. Both implementations realize the level synchronous strategy - Hadoop algorithm assumes iterative MapReduce jobs, whereas PCJ uses explicit synchronization after each level. The scalability of both solutions is similar. However, the PCJ implementation is much faster (about 100 times) than the MapReduce Hadoop solution.

关键词： High performance computing Hadoop MapReduce PGAS parallel and distributed computation Performance evaluation parallel graph algorithms Java

来源：评论

学校读者我要写书评

暂无评论

Fast Approximate Distance Queries in Unweighted graphs Using Bounded Asynchrony 1

引用

29th International Workshop on Languages and Compilers for parallel Computing (LCPC)

作者： Fidel, Adam Sabido, Francisco Coral Riedel, Colton Amato, Nancy M. Rauchwerger, Lawrence Texas A&M Univ Dept Comp Sci & Engn Parasol Lab College Stn TX 77843 USA

ISBN: (数字)9783319527093

ISBN: (纸本)9783319527093;9783319527086

We introduce a new parallel algorithm for approximate breadth-first ordering of an unweighted graph by using bounded asynchrony to parametrically control both the performance and error of the algorithm. This work is based on the k-level asynchronous (KLA) paradigm that trades expensive global synchronizations in the level-synchronous model for local synchronizations in the asynchronous model, which may result in redundant work. Instead of correcting errors introduced by asynchrony and redoing work as in KLA, in this work we control the amount of work that is redone and thus the amount of error allowed, leading to higher performance at the expense of a loss of precision. Results of an implementation of this algorithm are presented on up to 32,768 cores, showing 2.27x improvement over the exact KLA algorithm and 3.8x improvement over the level-synchronous version with minimal error on several graph inputs.

关键词： parallel graph algorithms Breadth-first search Distance query Approximate algorithms Asynchronous Distributed memory

来源：评论

学校读者我要写书评

暂无评论

Distributed, Shared-Memory parallel Triangle Counting 18

Distributed, Shared-Memory Parallel Triangle Counting

引用

5th Platform for Advanced Scientific Computing Conference (PASC)

作者： Kanewala, Thejaka Amila Zalewski, Marcin Lumsdaine, Andrew Indiana Univ Sch Informat Comp & Engn Bloomington IN 47405 USA Pacific Northwest Natl Lab Seattle WA USA Univ Washington Seattle WA 98195 USA

ISBN: (纸本)9781450358910

Triangles are the most basic non-trivial subgraphs. Triangle counting is used in a number of different applications, including social network mining, cyber security, and spam detection. In general, triangle counting algorithms are readily parallelizable, but when implemented in distributed, shared-memory, their performance is poor due to high communication, imbalance of work, and the difficulty of exploiting locality available in shared memory. In this paper, we discuss four different (but related) triangle counting algorithms and how their performance can be improved in distributed, shared-memory by reducing in-node load imbalance, improving cache utilization, minimizing network overhead, and minimizing algorithmic work. We generalize the four different triangle counting algorithms into a common framework and show that for all four algorithms the in-node load imbalance can be minimized while utilizing caches by partitioning work into blocks of vertices, the network overhead can be minimized by aggregation of blocks of work, and algorithm work can be reduced by partitioning vertex neighbors by degree. We experimentally evaluate the weak and the strong scaling performance of the proposed algorithms with two types of synthetic graph inputs and three real-world graph inputs. We also compare the performance of our implementations with the distributed, shared-memory triangle counting algorithms available in Powergraph-graphLab and show that our proposed algorithms outperform those algorithms, both in terms of space and time.

关键词： triangle counting distributed shared-memory graph algorithms parallel graph algorithms

来源：评论

学校读者我要写书评

暂无评论

Towards a graphBLAS Library in Chapel 31

Towards a GraphBLAS Library in Chapel

引用

31st IEEE International parallel and Distributed Processing Symposium Workshops (IPDPS)

作者： Azad, Ariful Buluc, Aydin Lawrence Berkeley Natl Lab Computat Res Div Berkeley CA 94720 USA

ISBN: (纸本)9780769561493

The adoption of a programming language is positively influenced by the breadth of its software libraries. Chapel is a modern and relatively young parallel programming language. Consequently, not many domain-specific software libraries exists that are written for Chapel. graph processing is an important domain with many applications in cyber security, energy, social networking, and health. Implementing graph algorithms in the language of linear algebra enables many advantages including rapid development, flexibility, high-performance, and scalability. graphBLAS initiative aims to standardize an interface for linear-algebraic primitives for graph computations. This paper presents initial experiences and findings of implementing a subset of important graphBLAS operations in Chapel. We analyzed the bottlenecks in both shared and distributed memory. We also provided alternative implementations whenever the default implementation lacked performance or scaling.

关键词： Chapel graphBLAS parallel graph algorithms PGAS

来源：评论

学校读者我要写书评

暂无评论

Efficient GPU algorithms for parallel decomposition of graphs into strongly connected and maximal end components

引用

FORMAL METHODS IN SYSTEM DESIGN 2016年第3期48卷 274-300页

作者： Wijs, Anton Katoen, Joost-Pieter Bosnacki, Dragan Eindhoven Univ Technol Eindhoven Netherlands Rhein Westfal TH Aachen Aachen Germany

This article presents parallel algorithms for component decomposition of graph structures on general purpose graphics processing units (GPUs). In particular, we consider the problem of decomposing sparse graphs into strongly connected components, and decomposing graphs induced by stochastic games (such as Markov decision processes) into maximal end components. These problems are key ingredients of many (probabilistic) model-checking algorithms. We explain the main rationales behind our GPU-algorithms, and show a significant speed-up over the sequential (as well as existing parallel) counterparts in several case studies.

关键词： parallel graph algorithms Strongly connected components Maximal end components Probabilistic model checking Markov decision processes GPU

来源：评论

学校读者我要写书评

暂无评论

An Efficient GPU Implementation of Inclusion-Based Pointer Analysis

引用

IEEE TRANSACTIONS ON parallel AND DISTRIBUTED SYSTEMS 2016年第2期27卷 353-366页

作者： Su, Yu Ye, Ding Xue, Jingling Liao, Xiang-Ke UNSW Sch Comp Sci & Engn Programming Language & Compilers Grp Sydney NSW Australia Natl Univ Def Technol Sch Comp Sci Changsha Hunan Peoples R China

We present an efficient GPU implementation of Andersen's whole-program inclusion-based pointer analysis, a fundamental analysis on which many others are based, including optimising compilers, bug detection and security analyses. Andersen's algorithm makes extensive modifications to the graph that represents the pointer-manipulating statements in a program. These modifications are highly irregular, input-dependent and statically unpredictable, making it much more challenging to balance such graph workloads across a multitude of GPU cores than those dealt with by traditional graph algorithms such as DFS and BFS. To parallelise Andersen's analysis efficiently on GPUs, we introduce an imbalance-aware workload partitioning scheme that divides its workload dynamically among the concurrent warps, initially in a warp-centric manner (during the coarsegrain stage) but later switches to a task-pool-based model when a workload imbalance is detected (during the fine-grain stage). We improve further its performance by using an adaptive group propagation scheme to reduce some redundant traversals. For a set of 14 C benchmarks evaluated, our parallel implementation of Andersen's analysis achieves a significant speedup of 46 percent on average over the state-of-the art on an NVIDIA Tesla K20c GPU.

关键词： parallel graph algorithms GPGPU pointer analysis compilers

来源：评论

学校读者我要写书评

暂无评论

A Practical parallel Algorithm for Diameter Approximation of Massive Weighted graphs 30

A Practical Parallel Algorithm for Diameter Approximation of...

引用

30th IEEE International parallel and Distributed Processing Symposium (IPDPS)

作者： Ceccarello, Matteo Pietracaprina, Andrea Pucci, Geppino Upfal, Eli Univ Padua Dept Informat Engn Padua Italy Brown Univ Dept Comp Sci Providence RI USA

ISBN: (纸本)9781509021406

We present a space and time efficient practical parallel algorithm for approximating the diameter of massive weighted undirected graphs on distributed platforms supporting a MapReduce-like abstraction. The core of the algorithm is a weighted graph decomposition strategy generating disjoint clusters of bounded weighted radius. Theoretically, our algorithm uses linear space and yields a polylogarithmic approximation guarantee;moreover, for important practical classes of graphs, it runs in a number of rounds asymptotically smaller than those required by the natural approximation provided by the state-of-the-art.-stepping SSSP algorithm, which is its only practical linear-space competitor in the aforementioned computational scenario. We complement our theoretical findings with an extensive experimental analysis on large benchmark graphs, which demonstrates that our algorithm attains substantial improvements on a number of key performance indicators with respect to the aforementioned competitor, while featuring a similar approximation ratio (a small constant less than 1.4, as opposed to the polylogarithmic theoretical bound).

关键词： graph Analytics parallel graph algorithms Weighted graph Decomposition Weighted Diameter Approximation MapReduce

来源：评论

学校读者我要写书评

暂无评论

NetworKit: A tool suite for large-scale complex network analysis

引用

NETWORK SCIENCE 2016年第4期4卷 508-530页

作者： Staudt, Christian L. Sazonovs, Aleksejs Meyerhenke, Henning Karlsruhe Inst Technol Inst Theoret Informat D-76131 Karlsruhe Germany Wellcome Trust Sanger Inst Wellcome Genome Campus Cambridge CB10 1SA England

We introduce NetworKit, an open-source software package for analyzing the structure of large complex networks. Appropriate algorithmic solutions are required to handle increasingly common large graph data sets containing up to billions of connections. We describe the methodology applied to develop scalable solutions to network analysis problems, including techniques like parallelization, heuristics for computationally expensive problems, efficient data structures, and modular software architecture. Our goal for the software is to package results of our algorithm engineering efforts and put them into the hands of domain experts. NetworKit is implemented as a hybrid combining the kernels written in C++ with a Python frontend, enabling integration into the Python ecosystem of tested tools for data analysis and scientific computing. The package provides a wide range of functionality (including common and novel analytics algorithms and graph generators) and does so via a convenient interface. In an experimental comparison with related software, NetworKit shows the best performance on a range of typical analysis tasks.

关键词： complex networks network analysis network science parallel graph algorithms data analysis software

来源：评论

学校读者我要写书评

暂无评论

Counting Triangles in Large graphs on GPU 30

Counting Triangles in Large Graphs on GPU

引用

30th IEEE International parallel and Distributed Processing Symposium (IPDPS)

作者： Polak, Adam Jagiellonian Univ Fac Math & Comp Sci Dept Theoret Comp Sci Krakow Poland

ISBN: (纸本)9781509036820

The clustering coefficient and the transitivity ratio are concepts often used in network analysis, which creates a need for fast practical algorithms for counting triangles in large graphs. Previous research in this area focused on sequential algorithms, MapReduce parallelization, and fast approximations. In this paper we propose a parallel triangle counting algorithm for CUDA GPU. We describe the implementation details necessary to achieve high performance and present the experimental evaluation of our approach. The algorithm achieves 15 to 35 times speedup over our CPU implementation, and is capable of finding 8.8 billion triangles in a 180 million edges graph in 12 seconds on the Nvidia GeForce GTX 980 GPU.

关键词： GPU CUDA parallel graph algorithms triangles clustering coefficient

来源：评论

学校读者我要写书评

暂无评论

The Performance Evaluation of the Java Implementation of graph500 11th

The Performance Evaluation of the Java Implementation of Gra...

引用

11th International Conference on parallel Processing and Applied Mathematics (PPAM)

作者： Ryczkowska, Magdalena Nowicki, Marek Bala, Piotr Nicolaus Copernicus Univ Fac Math & Comp Sci Chopina 12-18 PL-87100 Torun Poland Univ Warsaw Interdisciplinary Ctr Math & Computat Modeling Pawinskiego 5a PL-02106 Warsaw Poland

ISBN: (纸本)9783319321523;9783319321516

graph-based computations are used in many applications. Increasing size of analyzed data and its complexity make graph analysis a challenging task. In this paper we present performance evaluation of Java implementation of graph500 benchmark. It has been developed with the help of the PCJ (parallel Computations in Java) library for parallel and distributed computations in Java. PCJ is based on a PGAS (Partitioned Global Address Space) programming paradigm, where all communication details such as threads or network programming are hidden. In this paper, we present Java implementation details of first and second kernel from graph500 benchmark. The results are compared with the existing MPI implementations of graph500 benchmark, showing good scalability of PCJ library.

关键词： High performance computing graph processing PGAS parallel and distributed computation Performance evaluation parallel graph algorithms Java

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：