Boosted transactions offer an attractive method that enables programmers to create larger transactions that scale well and offer deadlock-free guarantees. However, as boosted transactions get larger, they become more ...
详细信息
ISBN:
(纸本)9781605583976
Boosted transactions offer an attractive method that enables programmers to create larger transactions that scale well and offer deadlock-free guarantees. However, as boosted transactions get larger, they become more susceptible to conflicts and aborts. We describe a linear-time algorithm to detect transactions that cannot make progress, which transactions need to be aborted, and when. The algorithm guarantees zero false positives with minimal aborts. Our proposals, as implemented in DSTM2, increase the transactional throughput of the system, often by more than 30%.
The advent of new parallel architectures has increased the need for parallel optimizing compilers to assist developers in creating efficient code. OpenUH is a state-of-the-art optimizing compiler, but it only performs...
详细信息
ISBN:
(纸本)9781605583976
The advent of new parallel architectures has increased the need for parallel optimizing compilers to assist developers in creating efficient code. OpenUH is a state-of-the-art optimizing compiler, but it only performs a limited set of optimizations for OpenMP programs due to its conservative assumptions of shared memory programming. These limitations may prevent some OpenMP applications from being fully optimized to the extent of its sequential counterpart. This paper describes our design and implementation of a parallel data flow framework, consisting of a parallel Control Flow Graph (PCFG) and a parallel SSA (PSSA) representation in OpenUH, to model data flow for OpenMP programs. This framework enables the OpenUH compiler to perform all classical scalar optimizations for OpenMP programs, in addition to conducting OpenMP specific optimizations.
We introduce a non-blocking full/empty bit primitive, or NB-FEB for short, as a promising synchronization primitive for parallelprogramming on may-core architectures. We show that the NB-FEB primitive is universal, s...
详细信息
ISBN:
(纸本)9781605583976
We introduce a non-blocking full/empty bit primitive, or NB-FEB for short, as a promising synchronization primitive for parallelprogramming on may-core architectures. We show that the NB-FEB primitive is universal, scalable and feasible. NB-FEB, together with registers, can solve the consensus problem for an arbitrary number of processes (universality). NB-FEB is combinable, namely its memory requests to the same memory location can be combined into only one memory request, which consequently mitigates performance degradation due to synchronization "hot spots" (scalability). Since NB-FEB is a variant of the original full/empty bit that always returns a value instead of waiting for a conditional flag, it is as feasible as the original full/empty bit, which has been implemented in many computer systems (feasibility).
In this poster, we present our vision of refactoring support for languages with a partitioned global address space memory model as embodied in the X10 programming language. We examine a novel refactoring, extract conc...
详细信息
A stream processor executes an application that has been decomposed into a sequence of kernels that operate on streams of data elements. During the execution of a kernel, all streams accessed must be communicated thro...
详细信息
ISBN:
(纸本)9781605583976
A stream processor executes an application that has been decomposed into a sequence of kernels that operate on streams of data elements. During the execution of a kernel, all streams accessed must be communicated through the SRF (Stream Register File), a non-bypassing software-managed on-chip memory. Therefore, optimizing utilization of the SRF is crucial for good performance. The key insight is that the interference graphs formed by the streams in stream applications tend to be comparability graphs or decomposable into a set of multiple comparability graphs. We present a compiler algorithm that can find optimal or near-optimal colorings in stream IGs, thereby improving SRF utilization than the First-Fit bin-packing algorithm, the best in the literature.
Live sequence charts (LSCs) have been proposed as an inter-object scenario-based specification and visual programming language for reactive systems. In this paper, we introduce a logic-based framework to check the con...
详细信息
ISBN:
(纸本)9781605585680
Live sequence charts (LSCs) have been proposed as an inter-object scenario-based specification and visual programming language for reactive systems. In this paper, we introduce a logic-based framework to check the consistency of an LSC specification. An LSC simulator has been implemented in logic programming, utilizing a memoized depth-first search strategy, to show how a reactive system in LSCs would response to a set of external event sequences. A formal notation is defined to specify external event sequences, extending the regular expression with a parallel operator and a testing control. The parallel operator allows interleaved parallel external events to be tested in LSCs simultaneously;while the testing control provides users to a new approach to specify and test certain temporal properties (e.g., CTL formula) in a form of LSC. Our framework further provides either a state transition graph or a failure trace to justify the consistency checking results.
This paper proposes a new parallel execution model where programmers augment a sequential program with pieces of code called serializers that dynamically map computational operations into serialization sets of depende...
详细信息
ISBN:
(纸本)9781605583976
This paper proposes a new parallel execution model where programmers augment a sequential program with pieces of code called serializers that dynamically map computational operations into serialization sets of dependent operations. A runtime system executes operations in the same serialization set in program order, and may concurrently execute operations in different sets. Because serialization sets establish a logical ordering on all operations, the resulting parallel execution is predictable and deterministic. We describe the API and design of Prometheus, a C++ library that implements the serialization set abstraction through compile-time template instantiation and a runtime support library. We evaluate a set of parallel programs running on the x86_64 and SPARC-V9 instruction sets and study their performance on multi-core, symmetric multiprocessor, and ccNUMA parallel machines. By contrast with conventional parallel execution models, we find that Prometheus programs are significantly easier to write, test, and debug, and their parallel execution achieves comparable performance.
On multiprocessors with explicitly managed memory hierarchies (EMM), software has the responsibility of moving data in and out of fast local memories. This task can be complex and error-prone even for expert programme...
详细信息
ISBN:
(纸本)9781605583976
On multiprocessors with explicitly managed memory hierarchies (EMM), software has the responsibility of moving data in and out of fast local memories. This task can be complex and error-prone even for expert programmers. Before we can allow compilers to handle the complexity for us, we must identify the abstractions that are general enough to allow us to write applications with reasonable effort, yet specific enough to exploit the vast on-chip memory bandwidth of EMM multi-processors. To this end, we compare two programming models against hand-tuned codes on the STI Cell, paying attention to programmability and performance. The first programming model, Sequoia, abstracts the memory hierarchy as private address spaces, each corresponding to a parallel task. The second, Cellgen, is a new framework which provides OpenMP-like semantics and the abstraction of a shared address spaces divided into private and shared data. We compare three applications programmed using these models against their hand-optimized counterparts in terms of abstractions, programming complexity, and performance.
暂无评论