检索结果-内蒙古大学图书馆

Understanding parallelism in graph traversal on multi-core clusters

COMPUTER SCIENCE-RESEARCH AND DEVELOPMENT 2013年第2-3期28卷 193-201页

作者： Lv, Huiwei Tan, Guangming Chen, Mingyu Sun, Ninghui Chinese Acad Sci Inst Comp Technol State Key Lab Comp Architecture Beijing Peoples R China Chinese Acad Sci Natl Res Ctr Intelligent Comp Syst Inst Comp Technol State Key Lab Comp Architecture Beijing Peoples R China Chinese Acad Sci Inst Comp Technol Beijing Peoples R China Chinese Acad Sci Grad Sch Beijing Peoples R China

There is an ever-increasing need for exploring large-scale graph data sets in computational sciences, social networks, and business analytics. However, due to irregular and memory-intensive nature, graph applications are notoriously known for their poor performance on parallel computer systems. In this paper we propose a new hybrid MPI/Pthreads breadth-first search (BFS) algorithm featuring with (i) overlapping computation and communication by separating them into multiple threads, (ii) maximizing multi-threading parallelism on multi-cores with massive threads to improve throughputs, and (iii) exploiting pipeline parallelism using lock-free queues for asynchronous communication. By comparing it with traditional MPI-only BFS algorithm, we learned several valuable lessons that would help to understand and exploit parallelism in graph traversal applications. Experiments show our algorithm is 1.9x faster than the MPI-only version, capable of processing 1.45 billion edges per second on a 32-node SMP cluster. At a large scale, our algorithm is 1.49x than the MPI-only BFS algorithm in Combinatorial BLAS Library with 6,144 cores.

关键词： Breadth-first search graph algorithms Hybrid MPI/Pthreads programming Lock-free queues

来源：评论

学校读者我要写书评

暂无评论

Scaling Techniques for Massive Scale-Free graphs in Distributed (External) Memory

Scaling Techniques for Massive Scale-Free Graphs in Distribu...

引用

IEEE 27th International Parallel and Distributed Processing Symposium (IPDPS)

作者： Pearce, Roger Gokhale, Maya Amato, Nancy M. Texas A&M Univ Parasol Lab College Stn TX 77843 USA Lawrence Livermore Natl Lab Ctr Appl Sci Comp Livermore CA USA

ISBN: (纸本)9780769549712

We present techniques to process large scale-free graphs in distributed memory. Our aim is to scale to trillions of edges, and our research is targeted at leadership class supercomputers and clusters with local non-volatile memory, e. g., NAND Flash. We apply an edge list partitioning technique, designed to accommodate high-degree vertices (hubs) that create scaling challenges when processing scale-free graphs. In addition to partitioning hubs, we use ghost vertices to represent the hubs to reduce communication hotspots. We present a scaling study with three important graph algorithms: Breadth-First Search (BFS), K-Core decomposition, and Triangle Counting. We also demonstrate scalability on BG/P Intrepid by comparing to best known graph500 results [1]. We show results on two clusters with local NVRAM storage that are capable of traversing trillion-edge scale-free graphs. By leveraging node-local NAND Flash, our approach can process thirty-two times larger datasets with only a 39% performance degradation in Traversed Edges Per Second (TEPS).

关键词： parallel algorithms graph algorithms big data distributed computing

来源：评论

学校读者我要写书评

暂无评论

Exploring Agent-Based Simulations in Political Science Using Aggregate Temporal graphs

Exploring Agent-Based Simulations in Political Science Using...

引用

6th IEEE Symposium on Pacific Visualization (PacificVis)

作者： Crouser, R. Jordan Freeman, Jeremy G. Winslow, Andrew Chang, Remco Tufts Univ Medford MA 02155 USA

ISBN: (纸本)9781467347976

Agent-based simulation has become a key technique for modeling and simulating dynamic, complicated behaviors in social and behavioral sciences. As these simulations become more complex, they generate an increasingly large amount of data. Lacking the appropriate tools and support, it has become difficult for social scientists to interpret and analyze the results of these simulations. In this paper, we introduce the Aggregate Temporal graph (ATG), a graph formulation that can be used to capture complex relationships between discrete simulation states in time. Using this formulation, we can assist social scientists in identifying critical simulation states by examining graph substructures. In particular, we define the concept of a Gateway and its inverse, a Terminal, which capture the relationships between pivotal states in the simulation and their inevitable outcomes. We propose two real-time computable algorithms to identify these relationships and provide a proof of correctness, complexity analysis, and empirical run-time analysis. We demonstrate the use of these algorithms on a large-scale social science simulation of political power and violence in present-day Thailand, and discuss broader applications of the ATG and associated algorithms in other domains such as analytic provenance.

关键词： G.2.2 [Discrete Mathematics]: graph Theory graph algorithms H.1.2 [Information Systems]: User/Machine Systems Human information processing

来源：评论

学校读者我要写书评

暂无评论

In-Core Computation of Geometric Centralities with HyperBall: A Hundred Billion Nodes and Beyond

In-Core Computation of Geometric Centralities with HyperBall...

引用

IEEE 13th International Conference on Data Mining (ICDM)

作者： Boldi, Paolo Vigna, Sebastiano Univ Milan Dipartimemo Informat I-20122 Milan Italy

ISBN: (纸本)9780769551098

Given a social network, which of its nodes are more central? This question was asked many times in sociology, psychology and computer science, and a whole plethora of centrality measures (a.k.a. centrality indices, or rankings) were proposed to account for the importance of the nodes of a network. In this paper, we approach the problem of computing geometric centralities, such as closeness [1] and harmonic centrality [2], on very large graphs;traditionally this task requires an all-pairs shortest-path computation in the exact case, or a number of breadth-first traversals for approximated computations, but these techniques yield very weak statistical guarantees on highly disconnected graphs. We rather assume that the graph is accessed in a semi-streaming fashion, that is, that adjacency lists are scanned almost sequentially, and that a very small amount of memory (in the order of a dozen bytes) per node is available in core memory. We leverage the newly discovered algorithms based on HyperLogLog counters [3], making it possible to approximate a number of geometric centralities at a very high speed and with high accuracy. While the application of similar algorithms for the approximation of closeness was attempted in the MapReduce [4] framework [5], our exploitation of HyperLogLog counters reduces exponentially the memory footprint, paving the way for in-core processing of networks with a hundred billion nodes using "just" 2 1113 of RAM. Moreover, the computations we describe are inherently parallelizable, and scale linearly with the number of available cores.

关键词： Centrality Distance distribution graph algorithms Probabilistic counters graph algorithms Ganglia Intimacy core memory Counters arithmetic

来源：评论

学校读者我要写书评

暂无评论

NUMA-optimized Parallel Breadth-first Search on Multicore Single-node System

NUMA-optimized Parallel Breadth-first Search on Multicore Si...

引用

IEEE International Conference on Big Data (Big Data)

作者： Yasui, Yuichiro Fujisawa, Katsuki Goto, Kazushige Chuo Univ Tokyo 112 Japan Intel Corp Hillsboro OR 97124 USA

ISBN: (纸本)9781479912926;9781479912933

The breadth-first search (BFS) is one of the most important kernels in graph theory. The graph500 benchmark measures the performance of any supercomputer performing a BFS in terms of traversed edges per second (TEPS). Previous studies have proposed hybrid approaches that combine a well-known top-down algorithm and an efficient bottom-up algorithm for large frontiers. This reduces some unnecessary searching of outgoing edges in the BFS traversal of a small-world graph, such as a Kronecker graph. In this paper, we describe a highly efficient BFS using column-wise partitioning of the adjacency list while carefully considering the non-uniform memory access (NUMA) architecture. We explicitly manage the way in which each working thread accesses a partial adjacency list in local memory during BFS traversal. Our implementation has achieved a processing rate of 11.15 billion edges per second on a 4-way Intel Xeon E5-4640 system for a scale-26 problem of a Kronecker graph with 2(26) vertices and 2(30) edges. Not all of the speedup techniques in this paper are limited to the NUMA architecture system. With our winning Green graph500 submission of June 2013, we achieved 64.12 GTEPS per kilowatt hour on an ASUS Pad TF700T with an NVIDIA Tegra 3 mobile processor.

关键词： Breadth-first search graph algorithms parallel algorithms multicore processing

来源：评论

学校读者我要写书评

暂无评论

On Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World graphs 13

On Fast Parallel Detection of Strongly Connected Components ...

引用

International Conference for High Performance Computing, Networking, Storage and Analysis (SC)

作者： Hong, Sungpack Rodia, Nicole C. Olukotun, Kunle Oracle Labs Redwood Shores CA 94065 USA Stanford Univ Pervas Parallelism Lab Stanford CA USA

ISBN: (纸本)9781450323789

Detecting strongly connected components (SCCs) in a directed graph is a fundamental graph analysis algorithm that is used in many science and engineering domains. Traditional approaches in parallel SCC detection, however, show limited performance and poor scaling behavior when applied to large real-world graph instances. In this paper, we investigate the shortcomings of the conventional approach and propose a series of extensions that consider the fundamental properties of real-world graphs, e.g. the small-world property. Our scalable implementation offers excellent performance on diverse, small-world graphs resulting in a 5.01x to 29.41x parallel speedup over the optimal sequential algorithm with 16 cores and 32 hardware threads.

关键词： strongly connected components (SCC) multicore parallel algorithms graph algorithms small-world graphs

来源：评论

学校读者我要写书评

暂无评论

Identifying Overlapping Communities and Their Leading Members in Social Networks

Identifying Overlapping Communities and Their Leading Member...

引用

15th Conference of the Spanish-Association-for-Artificial-Intelligence (CAEPIA)

作者： Palazuelos, Camilo Zorrilla, Marta Univ Cantabria Dept Math Stat & Comp Sci Santander 39005 Spain

ISBN: (纸本)9783642406423;9783642406430

With the recent increasing popularity of social networking services like Facebook and Twitter, community structure has become a problem of considerable interest. Although there are more than a hundred algorithms that find communities in networks, only a few are able to detect overlapping communities, and an even smaller number of them follow an approach based on the evolution dynamics of these networks. Thus, we present FRINGE, an algorithm for the detection of overlapping communities in networks, which, based on the ideas of friendship and leadership, not only returns the overlapping communities detected, but also specifies their leading members. We describe the algorithm in detail and compare its results with those obtained by CFinder and iLCD for both synthetic and real-life networks. These results show that our proposal behaves well in networks with a clear social hierarchy, as seen in modern social networks.

关键词： community detection graph algorithms overlapping communities social influence social networks

来源：评论

学校读者我要写书评

暂无评论

SCALABLE PARALLEL algorithms FOR MASSIVE SCALE-FREE graphS

SCALABLE PARALLEL ALGORITHMS FOR MASSIVE SCALE-FREE GRAPHS

引用

作者： ROGER ALLAN PEARCE Texas A&M University

学位级别：博士

Efficiently storing and processing massive graph data sets is a challenging prob- lem as researchers seek to leverage "Big Data" to answer next-generation scientific questions. New techniques are required to process large scale-free graphs in shared, distributed, and external memory. This dissertation develops new techniques to parallelize the storage, computation, and communication for scale-free graphs with high-degree vertices. Our work facilitates the processing of large real-world graph datasets through the development of parallel algorithms and tools that scale to large computational and memory resources, overcoming challenges not addressed by exist- ing techniques. Our aim is to scale to trillions of edges, and our research is targeted at leadership class supercomputers, clusters with local non-volatile memory, and shared memory systems. We present three novel techniques to address scaling challenges in processing large scale-free graphs. We apply an asynchronous graph traversal technique using prioritized visitor queues that is capable of tolerating data latencies to the external graph storage media and message passing communication. To accommodate large high-degree vertices, we present an edge list partitioning technique that evenly parti- tions graphs containing high-degree vertices. Finally, we propose a technique we call distributed delegates that distributes and parallelizes the storage, computation, and communication when processing high-degree vertices. The edges of high-degree ver- tices are distributed, providing additional opportunities for parallelism not present in existing methods. We apply our techniques to multiple graph algorithms: Breadth-First Search, Single Source Shortest Path, Connected Components, K-Core decomposition, Trian- gle Counting, and Page Rank. Our experimental study of these algorithms demon- strates excellent scalability on supercomputers, clusters with non-volatile memory, and shared memory systems. Our study includes multi

关键词： parallel algorithms graph algorithms scale-free graphs graph partitioning Thesis

来源：评论

学校读者我要写书评

暂无评论

Swendsen-Wang Multi-Cluster Algorithm for the 2D/3D Ising Model on Xeon Phi and GPU 13

Swendsen-Wang Multi-Cluster Algorithm for the 2D/3D Ising Mo...

引用

International Conference for High Performance Computing, Networking, Storage and Analysis (SC)

作者： Wende, Florian Steinke, Thomas Zuse Inst Berlin D-14195 Berlin Germany

ISBN: (纸本)9781450323789

Simulations of the critical Ising model by means of local update algorithms suffer from critical slowing down. One way to partially compensate for the influence of this phenomenon on the runtime of simulations is using increasingly faster and parallel computer hardware. Another approach is using algorithms that do not suffer from critical slowing down, such as cluster algorithms. This paper reports on the Swendsen-Wang multi-cluster algorithm on Intel Xeon Phi coprocessor 5110P, Nvidia Tesla M2090 GPU, and x86 multi-core CPU. We present shared memory versions of the said algorithm for the simulation of the two- and three-dimensional Ising model. We use a combination of local cluster search and global label reduction by means of atomic hardware primitives. Further, we describe an MPI version of the algorithm on Xeon Phi and CPU, respectively. Significant performance improvements over known implementations of the Swendsen-Wang algorithm are demonstrated.

关键词： Many-core processors Xeon Phi GPGPU CUDA Ising model Swendsen-Wang multi-cluster algorithm performance evaluation graph algorithms

来源：评论

学校读者我要写书评

暂无评论

The generalized split probe problem

引用

Electronic Notes in Discrete Mathematics 2013年 44卷 39-45页

作者： Dantas, Simone Faria, Luerbio de Figueiredo, Celina M.H. Teixeira, Rafael B. IME Universidade Federal Fluminense Brazil IME Universidade do Estado do Rio de Janeiro Brazil COPPE Universidade Federal do Rio de Janeiro Brazil ICE Universidade Federal Rural do Rio de Janeiro Brazil

A generalized split (k, l) partition is a vertex set partition into at most k independent sets and l cliques. We prove that the (2, 1) partitioned probe problem is in P whereas the (2, 2) partitioned probe is NP-complete. The full complexity dichotomy into polynomial time and NP-complete for the class of generalized split partitioned probe problems establishes (2, 2) as the first NP-complete self-complementary partitioned probe problem, and answers negatively the PGC conjecture by finding a polynomial time recognition problem whose partitioned probe version is NP-complete. © 2013 Elsevier B.V.

关键词： Computational complexity graph algorithms graph sandwich problems Partition problems Probe graphs

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：