Executing multiple OpenCL kernels on the same GPU concurrently is a promising method for improving hardware utilisation and system performance. Schemes of scheduling impact the resulting performance significantly by s...
详细信息
High-Performance Computing (HPC) applications consist of concurrent programs with multi-process and/or multithreaded models with varying degrees of parallelism. Although their design patterns, models, and principles a...
详细信息
The proceedings contain 6 papers. The topics discussed include: dart2java: running Dart in Java-based environments;VM wrapping;a metaobject protocol for optimizing application-specific run-time variability;diff graphs...
ISBN:
(纸本)9781450350884
The proceedings contain 6 papers. The topics discussed include: dart2java: running Dart in Java-based environments;VM wrapping;a metaobject protocol for optimizing application-specific run-time variability;diff graphs for a fast incremental pointer analysis;code generation in serializers and comparators of Apache Flink;and a formalization IDE integrated with a verifying compiler.
We present the design and implementation of dart2java, an experimental Dart to Java compiler. It is implemented in Dart and currently supports many but not all Dart language constructs. dart2java is a playground to ev...
详细信息
ISBN:
(纸本)9781450350884
We present the design and implementation of dart2java, an experimental Dart to Java compiler. It is implemented in Dart and currently supports many but not all Dart language constructs. dart2java is a playground to evaluate performance implications of running Dart code on the JVM and to investigate if it is possible to write Dart code in a largely Java-dominated environment. This paper describes the architecture of dart2java, performance optimizations such as non-nullability of primitive types and generic specialization (and their implications), as well as ideas for language interoperability, i.e., calling Java code from Dart and vice versa.
Computer systems are increasingly heterogeneous with nodes consisting of CPUs and GPU accelerators. As such systems become mainstream, they move away from specialized high-performance single application platforms to a...
详细信息
ISBN:
(纸本)9781450349154
Computer systems are increasingly heterogeneous with nodes consisting of CPUs and GPU accelerators. As such systems become mainstream, they move away from specialized high-performance single application platforms to a more general setting with multiple, concurrent, application jobs. Determining how jobs should be dynamically best scheduled to heterogeneous devices is non-trivial. In certain cases, performance is maximized if jobs are allocated to a single device, in others, sharing is preferable. In this paper, we present a runtime framework which schedules multi-user OpenCL tasks to their most suitable device in a CPU/GPU system. We use a machine learning-based predictive model at runtime to detect whether to merge OpenCL kernels or schedule them separately to the most appropriate devices without the need for ahead-of-time profiling. We evaluate out approach over a wide range of workloads, on two separate platforms. We consistently show significant performance and turn-around time improvement over the state-of-the-art across programs, workload, and platforms.
object-oriented programming has had a long-standing history with simulation systems in terms of human-computer interaction [1] dating back to Simula and early versions of Smalltalk-72 and Smalltalk-76. These framework...
详细信息
This paper presents implementation and optimization techniques to support objects in Ikra, an array-based parallel extension to Ruby with dynamic compilation. The high-level goal of Ikra is to allow developers to expl...
详细信息
ISBN:
(纸本)9781450343848
This paper presents implementation and optimization techniques to support objects in Ikra, an array-based parallel extension to Ruby with dynamic compilation. The high-level goal of Ikra is to allow developers to exploit GPU-based high-performance computing without paying much attention to intricate details of the underlying GPU infrastructure and CUDA. Ikra supports dynamically-typed object-oriented programming in Ruby and performs a number of optimizations. It supports parallel operations (e.g., map, each) on arrays of polymorphic objects, allowing polymorphic method calls inside a kernel by compiling them to conditional branches. To reduce branch divergence, Ikra shuffles thread assignments to base array elements based on runtime types of elements. To facilitate memory coalescing, Ikra stores objects in a structure-of-arrays (SoA) representation (columnar object layout). To eliminate intermediate data in global memory, Ikra merges cascaded parallel sections into one kernel using symbolic execution. Copyright is held by the owner/author(s). Publication rights licensed to acm.
The proceedings contain 9 papers. The topics discussed include: data-race detection: the missing piece for an end-to-end semantic equivalence checker for parallelizing transformations of array-intensive programs;array...
ISBN:
(纸本)9781450343848
The proceedings contain 9 papers. The topics discussed include: data-race detection: the missing piece for an end-to-end semantic equivalence checker for parallelizing transformations of array-intensive programs;array program transformation with *** by example: high-order finite elements;design and GPGPU performance of Futhark's redomap construct;object support in an array-based GPGPU extension for ruby;the key to a data parallel compiler;TTC: A tensor transposition compiler for multiple architectures;automatic generation of parallel C code for stencil applications written in MATLAB;SSA-based MATLAB-to-C compilation and optimization;and extending C++ with co-array semantics.
Designing efficient concurrentobjects often requires abandoning the standard specification technique of linearizability in favor of more relaxed correctness conditions. However, the variety of alternatives makes it d...
详细信息
The proceedings contain 52 papers. The topics discussed include: automatic parallelization of pure method calls via conditional future synthesis;portable inter-workgroup barrier synchronisation for GPUs;semantics-base...
ISBN:
(纸本)9781450344449
The proceedings contain 52 papers. The topics discussed include: automatic parallelization of pure method calls via conditional future synthesis;portable inter-workgroup barrier synchronisation for GPUs;semantics-based program verifiers for all languages;hoare-style specifications as correctness conditions for non-linearizable concurrentobjects;modeling and analysis of remote memory access programming;deriving divide-and-conquer dynamic programming algorithms using solver-aided transformations;speeding up machine-code synthesis;extensible access control with authorization contracts;and gentrification gone too far? affordable 2nd-class values for fun and (co)effect.
暂无评论