Modern parallel microprocessors deliver high performance on applications that expose substantial fine-grained data parallelism. Although data parallelism is widely available in many computations, implementing data par...
详细信息
ISBN:
(纸本)9781450301190
Modern parallel microprocessors deliver high performance on applications that expose substantial fine-grained data parallelism. Although data parallelism is widely available in many computations, implementing data parallel algorithms in low-level languages is often an unnecessarily difficult task. the characteristics of parallel microprocessors and the limitations of current programming methodologies motivate our design of Copperhead, a high-level data parallel language embedded in Python. the Copperhead programmer describes parallel computations via composition of familiar data parallel primitives supporting both flat and nested data parallel computation on arrays of data. Copperhead programs are expressed in a subset of the widely used Python programming language and interoperate with standard Python modules, including libraries for numeric computation, data visualization, and analysis. In this paper, we discuss the language, compiler, and runtime features that enable Copperhead to efficiently execute data parallel code. We define the restricted subset of Python which Copperhead supports and introduce the program analysis techniques necessary for compiling Copperhead code into efficient low-level implementations. We also outline the runtime support by which Copperhead programs interoperate with standard Python modules. We demonstrate the effectiveness of our techniques with several examples targeting the CUDA platform for parallelprogramming on GPUs. Copperhead code is concise, on average requiring 3.6 times fewer lines of code than CUDA, and the compiler generates efficient code, yielding 45-100% of the performance of hand-crafted, well optimized CUDA code.
We develop a type theoretic specification of offline partial evaluation for the simply-typed lambda calculus in the dependently-typed programming language Agda. We establish the correctness of the specification by pro...
详细信息
ISBN:
(纸本)9781450329477
We develop a type theoretic specification of offline partial evaluation for the simply-typed lambda calculus in the dependently-typed programming language Agda. We establish the correctness of the specification by proving termination, typing preservation, and semantics preservation using logical relations. Typing preservation is achieved by relying on a typed syntax representation based on De Bruijn indices for the source and the target language. the full calculus contains primitive recursion on natural numbers and higherorder lifting for function, product, and sum types.
We address the problem of designing constraint logic languages that usefully combine backward and forward chaining in a sound and complete way. Following the approach of Constraint Logic programming, we define a class...
详细信息
ISBN:
(纸本)9781450329477
We address the problem of designing constraint logic languages that usefully combine backward and forward chaining in a sound and complete way. Following the approach of Constraint Logic programming, we define a class of programming languages that generalize both Constraint Logic and Concurrent Constraint programming. Syntactically, this class corresponds to Constraint Handling Rules with disjunctions, but differ operationally by featuring set-based semantics instead of multiset-based ones;i. e., conjunction and disjunction are idempotent. the assumption of program confluence is the crux on which boththe committed choice strategy and the logical completeness of the languages rely.
In this paper, we evaluate the performance and usability of the parallelprogramming model OpenMP Superscalar (OmpSs), apply it to 10 different benchmarks and compare its performance with corresponding POSIX threads i...
详细信息
ISBN:
(纸本)9781450311601
In this paper, we evaluate the performance and usability of the parallelprogramming model OpenMP Superscalar (OmpSs), apply it to 10 different benchmarks and compare its performance with corresponding POSIX threads implementations.
In this tutorial participants learn how to build their own parallelprogramming language features by developing them as language extensions in the ableC [4] extensible C compiler framework. By implementing new paralle...
详细信息
ISBN:
(纸本)9781450362252
In this tutorial participants learn how to build their own parallelprogramming language features by developing them as language extensions in the ableC [4] extensible C compiler framework. By implementing new parallelprogramming abstractions as language extensions one can build on an existing host language and thus avoid re-implementing common language features such as the type checking and code generation of arithmetic expressions and control flow statements. Using ableC, one can build expressive language features that fit seamlessly into the C11 host language.
Graphs are powerful data representations favored in many computational domains. Modern GPUs have recently shown promising results in accelerating computationally challenging graph problems but their performance suffer...
详细信息
ISBN:
(纸本)9781450301190
Graphs are powerful data representations favored in many computational domains. Modern GPUs have recently shown promising results in accelerating computationally challenging graph problems but their performance suffers heavily when the graph structure is highly irregular, as most real-world graphs tend to be. In this study, we first observe that the poor performance is caused by work imbalance and is an artifact of a discrepancy between the GPU programming model and the underlying GPU architecture. We then propose a novel virtual warp-centric programming method that exposes the traits of underlying GPU architectures to users. Our method significantly improves the performance of applications with heavily imbalanced workloads, and enables trade-offs between workload imbalance and ALU underutilization for fine-tuning the performance. Our evaluation reveals that our method exhibits up to 9x speedup over previous GPU algorithms and 12x over single thread CPU execution on irregular graphs. When properly configured, it also yields up to 30% improvement over previous GPU algorithms on regular graphs. In addition to performance gains on graph algorithms, our programming method achieves 1.3x to 15.1x speedup on a set of GPU benchmark applications. Our study also confirms that the performance gap between GPUs and other multi-threaded CPU graph implementations is primarily due to the large difference in memory bandwidth.
Object-oriented programming languages like Java provide only low-level constructs (e.g., starting a thread) to describe concurrency. High-level abstractions (e.g., thread pools) are merely provided as a library. As a ...
详细信息
ISBN:
(纸本)9781450311601
Object-oriented programming languages like Java provide only low-level constructs (e.g., starting a thread) to describe concurrency. High-level abstractions (e.g., thread pools) are merely provided as a library. As a result, a compiler is not aware of the high-level semantics of a parallel library and therefore misses important optimization opportunities. this paper presents a simple source language extension based on which a compiler can perform new optimizations that are particularly effective for parallel code.
暂无评论