As computation becomes increasingly limited by data movement and energy consumption, exploiting locality throughout the memory hierarchy becomes critical for maintaining the performance scaling that many have come to ...
详细信息
Over the last five years, graphics cards have become a tempting target for scientific computing, thanks to unrivaled peak performance, often producing a runtime speed-up of x10 to x25 over comparable CPU solutions. Ho...
详细信息
ISBN:
(纸本)9781450312059
Over the last five years, graphics cards have become a tempting target for scientific computing, thanks to unrivaled peak performance, often producing a runtime speed-up of x10 to x25 over comparable CPU solutions. However, this increase can be difficult to achieve, and doing so often requires a fundamental rethink. This is especially problematic in scientific computing, where experts do not want to learn yet another architecture. In this paper we develop a method for automatically parallelising recursive functions of the sort found in scientific papers. Using a static analysis of the function dependencies we identify sets - partitions - of independent elements, which we use to synthesise an efficient GPU implementation using polyhedral code generation techniques. We then augment our language with DSL extensions to support a wider variety of applications, and demonstrate the effectiveness of this with three case studies, showing significant performance improvement over equivalent CPU methods, and similar efficiency to hand-tuned GPU implementations.
Most existing reference-based distributed object systems include some kind of acyclic garbage collection, but fail to provide acceptable collection of cyclic garbage. Those that do provide such GC currently suffer fro...
详细信息
Most existing reference-based distributed object systems include some kind of acyclic garbage collection, but fail to provide acceptable collection of cyclic garbage. Those that do provide such GC currently suffer from one or more problems: synchronous operation, the need for expensive global consensus or termination algorithms, susceptibility to communication problems, or an algorithm that does not scale. We present a simple, complete, fault-tolerant, asynchronous extension to the (acyclic) cleanup protocol of the SSP Chains system. This extension is scalable, consumes few resources, and could easily be adapted to work in other reference-based distributed object systems - rendering them usable for very large-scale applications.
We present associated effects, a programminglanguage feature that enables type classes to abstract over the effects of their function signatures, allowing each type class instance to specify its concrete effects. Ass...
详细信息
We present associated effects, a programminglanguage feature that enables type classes to abstract over the effects of their function signatures, allowing each type class instance to specify its concrete effects. Associated effects significantly increase the flexibility and expressive power of a programminglanguage that combines a type and effect system with type classes. In particular, associated effects allow us to (i) abstract over total and partial functions, where partial functions may throw exceptions, (ii) abstract over immutable data structures and mutable data structures that have heap effects, and (iii) implement adaptors that combine type classes with algebraic effects. We implement associated effects as an extension of the Flix programminglanguage and refactor the Flix Standard Library to use associated effects, significantly increasing its flexibility and expressive power. Specifically, we add associated effects to 11 type classes, which enables us to add 28 new type class instances.
This paper presents McNetKAT, a scalable tool for verifying probabilistic network programs. McNetKAT is based on a new semantics for the guarded and history-free fragment of Probabilistic NetKAT in terms of finite-sta...
详细信息
ISBN:
(纸本)9781450367127
This paper presents McNetKAT, a scalable tool for verifying probabilistic network programs. McNetKAT is based on a new semantics for the guarded and history-free fragment of Probabilistic NetKAT in terms of finite-state, absorbing Markov chains. This view allows the semantics of all programs to be computed exactly, enabling construction of an automatic verification tool. Domain-specific optimizations and a parallelizing backend enable McNetKAT to analyze networks with thousands of nodes, automatically reasoning about general properties such as probabilistic program equivalence and refinement, as well as networking properties such as resilience to failures. We evaluate McNetKAT's scalability using real-world topologies, compare its performance against state-of-the-art tools, and develop an extended case study on a recently proposed data center network design.
We present the first shape analysis for multithreaded programs that avoids the explicit enumeration of execution-interleavings. Our approach is to automatically infer a resource invariant associated with each lock tha...
详细信息
ISBN:
(纸本)9781595936332
We present the first shape analysis for multithreaded programs that avoids the explicit enumeration of execution-interleavings. Our approach is to automatically infer a resource invariant associated with each lock that describes the part of the heap protected by the lock. This allows us to use a sequential shape analysis on each thread. We show that resource invariants of a certain class can be characterized as least fixed points and computed via repeated applications of shape analysis only on each individual thread. Based on this approach, we have implemented a thread-modular shape analysis tool and applied it to concurrent heap-manipulating code from Windows device drivers.
This conference contains 30 papers all of which are abstracted and indexed individually. The topics covered are: program debugging techniques;debugging optimized codes;concurrent compilers;abstract machine;false data ...
详细信息
ISBN:
(纸本)0897914759
This conference contains 30 papers all of which are abstracted and indexed individually. The topics covered are: program debugging techniques;debugging optimized codes;concurrent compilers;abstract machine;false data dependency;loop transformations;pointer aliasing;program transformation;register allocation;and code replication.
This paper presents a new static analysis for deriving upper bounds on the expected resource consumption of probabilistic programs. The analysis is fully automatic and derives symbolic bounds that are multivariate pol...
详细信息
ISBN:
(纸本)9781450356985
This paper presents a new static analysis for deriving upper bounds on the expected resource consumption of probabilistic programs. The analysis is fully automatic and derives symbolic bounds that are multivariate polynomials in the inputs. The new technique combines manual state-of-the-art reasoning techniques for probabilistic programs with an effective method for automatic resource-bound analysis of deterministic programs. It can be seen as both, an extension of automatic amortized resource analysis (AARA) to probabilistic programs and an automation of manual reasoning for probabilistic programs that is based on weakest preconditions. An advantage of the technique is that it combines the clarity and compositionality of a weakest-precondition calculus with the efficient automation of AARA. As a result, bound inference can be reduced to off-the-shelf LP solving in many cases and automatically-derived bounds can be interactively extended with standard program logics if the automation fails. Building on existing work, the soundness of the analysis is proved with respect to an operational semantics that is based on Markov decision processes. The effectiveness of the technique is demonstrated with a prototype implementation that is used to automatically analyze 39 challenging probabilistic programs and randomized algorithms. Experiments indicate that the derived constant factors in the bounds are very precise and even optimal for some programs.
A classic problem in parallel computing is to take a high-level parallel program written, for example, in nested-parallel style with fork-join constructs and run it efficiently on a real machine. The problem could be ...
详细信息
ISBN:
(纸本)9781450356985
A classic problem in parallel computing is to take a high-level parallel program written, for example, in nested-parallel style with fork-join constructs and run it efficiently on a real machine. The problem could be considered solved in theory, but not in practice, because the overheads of creating and managing parallel threads can overwhelm their benefits. Developing efficient parallel codes therefore usually requires extensive tuning and optimizations to reduce parallelism just to a point where the overheads become acceptable. In this paper, we present a scheduling technique that delivers provably efficient results for arbitrary nested-parallel programs, without the tuning needed for controlling parallelism overheads. The basic idea behind our technique is to create threads only at a beat (which we refer to as the "heartbeat") and make sure to do useful work in between. We specify our heartbeat scheduler using an abstract-machine semantics and provide mechanized proofs that the scheduler guarantees low overheads for all nested parallel programs. We present a prototype C++ implementation and an evaluation that shows that Heartbeat competes well with manually optimized Cilk Plus codes, without requiring manual tuning.
The proceedings contain 5 papers. The topics discussed include: machine learning in python with no strings attached;triton: an intermediate language and compiler for tiled neural network computations;HackPPL: a univer...
ISBN:
(纸本)9781450367196
The proceedings contain 5 papers. The topics discussed include: machine learning in python with no strings attached;triton: an intermediate language and compiler for tiled neural network computations;HackPPL: a universal probabilistic programminglanguage;neural query expansion for code search;and a case study on machine learning for synthesizing benchmarks.
暂无评论