Large-scale graph-structured computation usually exhibits iterative and convergence-oriented computing nature, where input data is computed iteratively until a convergence condition is reached. Such features have led ...
详细信息
ISBN:
(纸本)9781450332057
Large-scale graph-structured computation usually exhibits iterative and convergence-oriented computing nature, where input data is computed iteratively until a convergence condition is reached. Such features have led to the development of two different computation modes for graph-structured programs, namely synchronous (Sync) and asynchronous (Async) modes. Unfortunately, there is currently no in-depth study on their execution properties and thus programmers have to manually choose a mode, either requiring a deep understanding of underlying graph engines, or suffering from suboptimal performance. this paper makes the first comprehensive characterization on the performance of the two modes on a set of typical graph-parallel applications. Our study shows that the performance of the two modes varies significantly with different graph algorithms, partitioning methods, execution stages, input graphs and cluster scales, and no single mode consistently outperforms the other. To this end, this paper proposes Hsync, a hybrid graph computation mode that adaptively switches a graph-parallel program between the two modes for optimal performance. Hsync constantly collects execution statistics on-the-fly and leverages a set of heuristics to predict future performance and determine when a mode switch could be profitable. We have built online sampling and offline profiling approaches combined with a set of heuristics to accurately predicting future performance in the two modes. A prototype called PowerSwitch has been built based on PowerGraph, a state-of-the-art distributed graph-parallel system, to support adaptive execution of graph algorithms. On a 48-node EC2-like cluster, PowerSwitch consistently outperforms the best of both modes, with a speedup ranging from 9% to 73% due to timely switch between two modes. Copyright 2015 acm.
Large-scale graph computing has become critical due to the ever-increasing size of data. However, distributed graph computations are limited in their scalability and performance due to the heavy communication inherent...
详细信息
ISBN:
(纸本)9781450332057
Large-scale graph computing has become critical due to the ever-increasing size of data. However, distributed graph computations are limited in their scalability and performance due to the heavy communication inherent in such computations. this is exacerbated in scale-free networks, such as social and web graphs, which contain hub vertices that have large degrees and therefore send a large number of messages over the network. Furthermore, many graph algorithms and computations send the same data to each of the neighbors of a vertex. Our proposed approach recognizes this, and reduces communication performed by the algorithm without change to user-code, through a hierarchical machine model imposed upon the input graph. the hierarchical model takes advantage of locale information of the neighboring vertices to reduce communication, both in message volume and total number of bytes sent. It is also able to better exploit the machine hierarchy to further reduce the communication costs, by aggregating traffic between different levels of the machine hierarchy. Results of an implementation in the STAPL GL shows improved scalability and performance over the traditional level-synchronous approach, with 2.5 × - 8× improvement for a variety of graph algorithms at 12, 000+ cores.
Graph-structured analytics has been widely adopted in a number of big data applications such as social computation, web-search and recommendation systems. though much prior research focuses on scaling graph-analytics ...
详细信息
ISBN:
(纸本)9781450332057
Graph-structured analytics has been widely adopted in a number of big data applications such as social computation, web-search and recommendation systems. though much prior research focuses on scaling graph-analytics on distributed environments, the strong desire on performance per core, dollar and joule has generated considerable interests of processing large-scale graphs on a single server-class machine, which may have several terabytes of RAM and 80 or more cores. However, prior graph-analytics systems are largely neutral to NUMA characteristics and thus have suboptimal performance. this paper presents a detailed study of NUMA characteristics and their impact on the efficiency of graph-analytics. Our study uncovers two insights: 1) either random or interleaved allocation of graph data will significantly hamper data locality and parallelism;2) sequential inter-node (i.e., remote) memory accesses have much higher bandwidththan both intra- and inter-node random ones. Based on them, this paper describes Polymer, a NUMA-aware graph-analytics system on multicore with two key design decisions. First, Polymer differentially allocates and places topology data, application-defined data and mutable runtime states of a graph system according to their access patterns to minimize remote accesses. Second, for some remaining random accesses, Polymer carefully converts random remote accesses into sequential remote accesses, by using lightweight replication of vertices across NUMA nodes. To improve load balance and vertex convergence, Polymer is further built with a hierarchical barrier to boost parallelism and locality, an edge-oriented balanced partitioning for skewed graphs, and adaptive data structures according to the proportion of active vertices. A detailed evaluation on an 80-core machine shows that Polymer often outperforms the state-of-the-art single-machine graph-analytics systems, including Ligra, X-Stream and Galois, for a set of popular real-world and synthetic grap
We develop a type theoretic specification of offline partial evaluation for the simply-typed lambda calculus in the dependently-typed programming language Agda. We establish the correctness of the specification by pro...
详细信息
ISBN:
(纸本)9781450329477
We develop a type theoretic specification of offline partial evaluation for the simply-typed lambda calculus in the dependently-typed programming language Agda. We establish the correctness of the specification by proving termination, typing preservation, and semantics preservation using logical relations. Typing preservation is achieved by relying on a typed syntax representation based on De Bruijn indices for the source and the target language. the full calculus contains primitive recursion on natural numbers and higherorder lifting for function, product, and sum types.
We address the problem of designing constraint logic languages that usefully combine backward and forward chaining in a sound and complete way. Following the approach of Constraint Logic programming, we define a class...
详细信息
ISBN:
(纸本)9781450329477
We address the problem of designing constraint logic languages that usefully combine backward and forward chaining in a sound and complete way. Following the approach of Constraint Logic programming, we define a class of programming languages that generalize both Constraint Logic and Concurrent Constraint programming. Syntactically, this class corresponds to Constraint Handling Rules with disjunctions, but differ operationally by featuring set-based semantics instead of multiset-based ones;i. e., conjunction and disjunction are idempotent. the assumption of program confluence is the crux on which boththe committed choice strategy and the logical completeness of the languages rely.
the proceedings contain 45 papers. the topics discussed include: a peta-scalable CPU-GPU algorithm for global atmospheric simulations;adoption protocols for fanout-optimal fault-tolerant termination detection;betweenn...
ISBN:
(纸本)9781450319225
the proceedings contain 45 papers. the topics discussed include: a peta-scalable CPU-GPU algorithm for global atmospheric simulations;adoption protocols for fanout-optimal fault-tolerant termination detection;betweenness centrality: algorithms and implementations;complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU;fast concurrent queues for x86 processors;FASTLANE: improving performance of software transactional memory for low thread counts;Ligra: a lightweight graph processing framework for shared memory;ownership passing: efficient distributed memory programming on multi-core systems;parallel suffix array and least common prefix for the GPU;Streamscan: fast scan algorithms for GPUs without global barrier synchronization;using hardware transactional memory to correct and simplify a readers-writer lock algorithm;and exploring different automata representations for efficient regular expression matching on GPUs.
Laziness is a powerful concept in functional programmingthat enables reusing general functions in a specific context, while keeping performance close to the efficiency of dedicated definitions. Lazy evaluation can be...
详细信息
ISBN:
(纸本)9781450329477
Laziness is a powerful concept in functional programmingthat enables reusing general functions in a specific context, while keeping performance close to the efficiency of dedicated definitions. Lazy evaluation can be used in imperative programming too. Twenty years ago, John Launchbury was already advocating for lazy imperative programming, but the level of laziness of his framework remained limited: a single effect can trigger numerous delayed computations, even if those are not required for the correctness of the evaluation. Twenty years after, the picture has not changed. In this article, we propose an Haskell framework to specify computational effects of imperative programs as well as their dependencies. Our framework is based on the operational monad transformer which encapsulates an algebraic presentation of effectful operations. A lazy monad transformer is then in charge of delaying non-necessary computations by maintaining a trace of imperative closures. We present a semantics of a call-by-need A-calculus extended with imperative strict and lazy features and prove the correctness of our approach. While originally motivated by a less rigid use of foreign functions, we show that our approach is fruitful for a simple scenario based on sorted mutable arrays. Furthermore, we can take advantage of equations between algebraic operations to dynamically optimize imperative computations composition.
Video games are usually not programmed very declaratively. there are a number of reasons for this, from low-level efficiency concerns, via the nature of commonly employed programming languages, libraries, and framewor...
详细信息
this poster proposes an efficient runtime scheduler that provides provable performance guarantees to parallel programs that use data structures through the use of implicit batching.
ISBN:
(纸本)9781450326568
this poster proposes an efficient runtime scheduler that provides provable performance guarantees to parallel programs that use data structures through the use of implicit batching.
online matching. Antescofo is a real-time system for performance coordination between musicians and computer processes during live music concert. ATP are used to define complex events that correspond to a combination ...
详细信息
ISBN:
(纸本)9781450329477
online matching. Antescofo is a real-time system for performance coordination between musicians and computer processes during live music concert. ATP are used to define complex events that correspond to a combination of perceived events in the musical environment as well as arbitrary logical and metrical temporal conditions. the real-time recognition of such event is used to trigger arbitrary actions in the style of event-condition-action rules. the musical context, the rationales of temporal patterns and several illustrative examples are introduced to motivate the design of ATP. the semantics of ATP matching is defined to parallelthe well-known notion of regular expression and Brzozowski's derivatives but extended to handle an infinite alphabet, arbitrary predicates, elapsing time and inhibitory conditions. this approach is compared to those developed in log auditing and for the runtime verification of realtime logics. ATP are implemented by translation into a core subset of the Antescofo domain-specific language. this compilation has proven efficient enough to avoid the extension of the real-time runtime of the language and has been validated with composers in actual pieces.
暂无评论