Tower of Hanoi is a classical tutorial example traditionally for introducing recursive algorithms in CS1. This poster describes a lecture of teaching graph algorithms using the graphical representation of the game and...
详细信息
ISBN:
(纸本)9781450394338
Tower of Hanoi is a classical tutorial example traditionally for introducing recursive algorithms in CS1. This poster describes a lecture of teaching graph algorithms using the graphical representation of the game and its variants in upper-level courses. More specifically, this poster provides a totally different perspective on solving the tower of Hanoi using DFS, BFS, A*, greedy, and other search algorithms on graphs. Students will not only solve the game and its variants in an inspiring way, but also learn a modeling method to convert problems into graphs.
Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to non-contiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical ...
详细信息
ISBN:
(纸本)0769523803
Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to non-contiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, contiguous memory accesses, and so are inefficient platforms for such algorithms. Few parallel graph algorithms outperform their best sequential implementation on SMP clusters due to long memory latencies and high synchronization costs. In this paper, we consider the performance and scalability of two graph algorithms, list ranking and connected components, on two classes of shared-memory computers: symmetric multiprocessors such as the Sun Enterprise servers and multithreaded architectures (MTA) such as the Cray MTA-2. While previous studies have shown that parallel graph algorithms can speedup on SMPs, the systems' reliance on cache microprocessors limits performance. The MTA's latency tolerant processors and hardware support for fine-grain synchronization makes performance a function of parallelism. Since parallel graph algorithms have an abundance of parallelism, they perform and scale significantly better on the MTA. We describe and give a performance model for each architecture. We analyze the performance of the two algorithms and discuss how the features of each architecture affects algorithm development, ease of programming, performance, and scalability.
graph algorithms currently play increasingly important roles, especially in social networks and language modeling scenarios. Recently, accelerating graph algorithms by heterogeneous high performance computers with the...
详细信息
ISBN:
(纸本)9781509048250
graph algorithms currently play increasingly important roles, especially in social networks and language modeling scenarios. Recently, accelerating graph algorithms by heterogeneous high performance computers with the integrated cores and expanded SIMD lanes has been becoming the mainstream. However, the existing methods, restricted by the low-efficiency grouping strategy and the non-optimized selection mechanism of tile size of a graph, are far below our expectations in many ways. Moreover, there are few convenient integrated tools provided for deploying the graph algorithms on MIC architecture. In this paper, we propose a high-efficiency framework MicRun, which is flexible to be used for graph algorithms on SIMD architecture of the Xeon Phi. There are two key components in MicRun, the Bucket Grouping module and Auto-tuning module. In the Grouping module, an optimization algorithm is designed for splitting graph tiles into contlict-free groups, which can be directly processed on SIMD parallelism. In the Auto-tuning module, a novel strategy is proposed for optimizing the tile size to boost execution efficiency of the graph computation. MicRun currently supports Bellman-Ford and PageRank algorithms, we also conduct extensive validation experiments on MicRun. Experimental results show that MicRun outperforms existing mechanisms in terms of storage and time overhead. As a consequence, both graph algorithms achieve an average speedup of 1.1x by MicRun, compared with the state-of-the-art.
Distributed graph platforms like Pregel have used vertex-centric programming models to process the growing corpus of graph datasets using commodity clusters. However, the irregular structure of graphs causes load imba...
详细信息
ISBN:
(纸本)9781509024537
Distributed graph platforms like Pregel have used vertex-centric programming models to process the growing corpus of graph datasets using commodity clusters. However, the irregular structure of graphs causes load imbalances across machines, and this is exacerbated for non-stationary graph algorithms where not all parts of the graph are active at the same time. As a result, such graph platforms do not make efficient use of distributed resources. In this paper, we decouple graph partitioning from placement on hosts, and introduce strategies for elastic placement of graph partitions on Cloud VMs to reduce the cost of execution compared to a static placement, even as we minimize the increase in makespan. These strategies are innovative in modeling the graph algorithm's non-stationary behavior a priori using a metagraph sketch. We validate our strategies for several real-world graphs, using runtime traces for approximate Betweenness Centrality (BC) algorithm on our subgraph-centric GoFFish graph platform. Our strategies are able to reduce the cost of execution by up to 54%, compared to a static placement, while achieving a makespan that is within 25% of the optimal.
The Euler tour technique is a classical tool for designing parallel graph algorithms, originally proposed for the PRAM model. We ask whether it can be adapted to run efficiently on GPU. We focus on two established app...
详细信息
ISBN:
(纸本)9781665440660
The Euler tour technique is a classical tool for designing parallel graph algorithms, originally proposed for the PRAM model. We ask whether it can be adapted to run efficiently on GPU. We focus on two established applications of the technique: (1) the problem of finding lowest common ancestors (LCA) of pairs of nodes in trees, and (2) the problem of finding bridgis in undirected graphs. In our experiments, we compare theoretically optimal algorithms using the Euler tour technique against simpler heuristics supposed to perform particularly well on typical instances. We show that the Euler tour-based algorithms not only fulfill their theoretical promises and outperform practical heuristics on hard instances, but also perform on par with them on easy instances.
We reconsider basic algorithmic graph problems in a setting where an n-vertex input graph is read-only and the computation must take place in a working memory of O(n) bits or little more than that. For computing conne...
详细信息
ISBN:
(纸本)9783939897781
We reconsider basic algorithmic graph problems in a setting where an n-vertex input graph is read-only and the computation must take place in a working memory of O(n) bits or little more than that. For computing connected components and performing breadth-first search, we match the running times of standard algorithms that have no memory restrictions, for depth-first search and related problems we come within a factor of Theta(log log n), and for computing minimum spanning forests and single-source shortest-paths trees we come close for sparse input graphs.
graph problems are common across many fields, from scientific computing to social sciences. Despite their importance and the attention received, implementing graph algorithms effectively on modern computing systems re...
详细信息
ISBN:
(纸本)9798350342543
graph problems are common across many fields, from scientific computing to social sciences. Despite their importance and the attention received, implementing graph algorithms effectively on modern computing systems remains a challenging task that requires significant programming effort and generally results in customized implementations. Current computing and memory hierarchies are not architected for irregular computations, resulting performance that is far from the theoretical architectural peak. In this paper, we propose a compiler framework to simplify the development of graph algorihtm implementations that can achieve high performance on modern computing systems. We provide a high-level domain specific language (DSL) to represent graph algorithms through sparse linear algebra expressions and graph primitives including semiring and masking. The compiler leverages the semantics information expressed through the DSL during the optimization and code transformation passes, resulting in more efficient IR passed to the compiler backend. In particular, we introduce an Index Tree Dialect that preserves the semantic information of the graph algorithm to perform high-level, domain-specific optimizations, including workspace transformation, two-phase computation, and automatic parallelization. We demonstrate that this work outperforms state-of-the-art graph libraries LAgraph by up to 3.7x speedup in semiring operations, 2.19x speedup in an important sparse computational kernel, and 9.05x speedup in graph processing algorithms.
Existing state-of-the-art CPU graph frameworks take advantage of multiple cores, but not the SIMD capability within each core. In this work, we retarget an existing GPU graph algorithm compiler to obtain the first gra...
详细信息
ISBN:
(纸本)9781728186139
Existing state-of-the-art CPU graph frameworks take advantage of multiple cores, but not the SIMD capability within each core. In this work, we retarget an existing GPU graph algorithm compiler to obtain the first graph framework that uses SIMD extensions on CPUs to efficiently execute graph algorithms. We evaluate this compiler on 10 benchmarks and 3 graphs on 3 different CPUs and also compare to the GPU. Evaluation results show that on a 8-core machine, enabling SIMD on a naive multi-core implementation achieves an additional 7.48x speedup, averaged across 10 benchmarks and 3 inputs. Applying our SIMD-targeted optimizations improves the plain SIMD implementation by 1.67x, outperforming a serial implementation by 12.46x. On average, the optimized multi-core SIMD version also outperforms the state-of-the-art graph framework, graphIt, by 1.53x, averaged across 5 (common) benchmarks. SIMD execution on CPUs closes the gap between the CPU and GPU to 1.76x, but the CPU virtual memory performs better when graphs are much bigger than available physical memory.
graph algorithms can be expressed in terms of linear algebra. graphBLAS is a library of low-level building blocks for such algorithms that targets algorithm developers. LAgraph builds on top of the graphBLAS to target...
详细信息
ISBN:
(纸本)9781665435772
graph algorithms can be expressed in terms of linear algebra. graphBLAS is a library of low-level building blocks for such algorithms that targets algorithm developers. LAgraph builds on top of the graphBLAS to target users of graph algorithms with high-level algorithms common in network analysis. In this paper, we describe the first release of the LAgraph library, the design decisions behind the library, and performance using the GAP benchmark suite. LAgraph, however, is much more than a library. It is also a project to document and analyze the full range of algorithms enabled by the graphBLAS. To that end, we have developed a compact and intuitive notation for describing these algorithms. In this paper, we present that notation with examples from the GAP benchmark suite.
In this thesis we give new algorithms for two fundamental graph problems. We develop novel ways of using linear programming formulations, even exponential-sized ones, to extract structure from problem instances and to...
详细信息
In this thesis we give new algorithms for two fundamental graph problems. We develop novel ways of using linear programming formulations, even exponential-sized ones, to extract structure from problem instances and to guide algorithms in making progress. Somewhat surprisingly, similar polyhedral techniques can be harnessed in the two seemingly disparate settings. In the first part of the thesis we address a benchmark problem in combinatorial optimization: the asymmetric traveling salesman problem (ATSP). It consists in finding the shortest tour that visits all vertices of a given directed graph with weights on edges. Due to its NP-hardness, the theoretical study of algorithms for ATSP has focused on approximation algorithms: ones that are provably both efficient and give solutions competitive with the optimum. Specifically, a ρ-approximation algorithm for ATSP is one that runs in polynomial time and always outputs a tour that is at most ρ times longer than the shortest tour. Finding such an approximation algorithm with ρ bounded (i. e., a constant factor) had been a long-standing open problem. In this thesis, we give such an algorithm. Our approximation guarantee is analyzed with re- spect to the standard linear programming relaxation, and thus our result also confirms the conjec- tured constant integrality gap of that relaxation. Our techniques build upon the constant-factor approximation algorithm for the special case of node-weighted metrics due to Svensson [Sve15]. In particular, we give a generic reduction to structured instances that resemble but are more general than those arising from node-weighted metrics. This reduction takes advantage of a laminar family of vertex sets that arises from the linear programming relaxation. In the second part of the thesis we address the perfect matching problem. The first polynomial- time algorithm for it, given by Edmonds in 1965 [Edm65b], is historically associated with the in- troduction of the class P and our notion that
暂无评论