This paper presents algorithms for concurrently reading and modifying a red-black tree (RBTree). The algorithms allow wait-free, linearly scalable lookups in the presence of concurrent inserts and deletes. They have d...
详细信息
This paper presents algorithms for concurrently reading and modifying a red-black tree (RBTree). The algorithms allow wait-free, linearly scalable lookups in the presence of concurrent inserts and deletes. They have deterministic response times for a given tree size and uncontended read performance that is at least 60% faster than other known approaches. The techniques used to derive these algorithms arise from a concurrent programming methodology called relativistic programming. Relativistic programming introduces write-side delay primitives that allow the writer to pay most of the cost of synchronization between readers and writers. Only minimal synchronization overhead is placed on readers. Relativistic programming avoids unnecessarily strict ordering of read and write operations while still providing the capability to enforce linearizability. This paper shows how relativistic programming can be used to build a concurrent RBTree with synchronization-free readers and both lock-based and transactional memory-based writers. Copyright (c) 2013 John Wiley & Sons, Ltd.
This paper presents synchronizing interoperable resources (SIR). SIR extends to multi-program environments the concurrent communication mechanisms in the SR concurrent programming language. This paper discusses design...
详细信息
This paper presents synchronizing interoperable resources (SIR). SIR extends to multi-program environments the concurrent communication mechanisms in the SR concurrent programming language. This paper discusses design and implementation issues including implicit binding, a mechanism for providing seamless concurrent communication. It also examines some performance results of SIR as well as presenting qualitative analysis. This paper also compares SIR with CORBA and other systems that provide interoperability. (C) 2002 Elsevier Science Ltd. All rights reserved.
concurrent programming no longer is the sole province of those who design and implement operating systems but has become important to programmers of all types of applications. Among the many concurrent programming la...
详细信息
concurrent programming no longer is the sole province of those who design and implement operating systems but has become important to programmers of all types of applications. Among the many concurrent programming languages, monitor is the most prevailing. Although monitor often is used as a synchronization mechanism, invalidation of execution condition and deadlock due to nested monitor are problems. A new synchronization mechanism - coordinator - is proposed that can circumvent these problems while employing the efficiency and ease of use of monitor. Coordinator is designed as a basis for partially automating the concurrent programming since currently available concurrent languages are inadequate for such a purpose. Novel features of coordinator's language constructs include the adoption of a guarded region to make coordinator an active entity and the introduction of quasi-parallelism within coordinator by parallel guarded region. The flexibility of the control structure is demonstrated with solutions to versions of well-known readers writers problems.
This paper presents a new approach to programming multiway rendezvous problems in the SR language. The approach uses SR's concurrent invocation statement and rendezvous mechanism to coordinate the interacting proc...
详细信息
This paper presents a new approach to programming multiway rendezvous problems in the SR language. The approach uses SR's concurrent invocation statement and rendezvous mechanism to coordinate the interacting processes. This approach is compared with one that suggested an extension to SR's rendezvous mechanism. The two approaches result in differing program structure. The new approach is shown to lead to simpler and cleaner interfaces between the main process and the worker processes, and uses only existing language mechanisms. The results are of importance to both programmers and designers of concurrent program languages.
Data movement in high-performance computing systems accelerated by graphics processing units (GPUs) remains a challenging problem. Data communication in popular parallel programming models, such as the Message Passing...
详细信息
Data movement in high-performance computing systems accelerated by graphics processing units (GPUs) remains a challenging problem. Data communication in popular parallel programming models, such as the Message Passing Interface (MPI), is currently limited to the data stored in the CPU memory space. Auxiliary memory systems, such as GPU memory, are not integrated into such data movement standards, thus providing applications with no direct mechanism to perform end-to-end data movement. We introduce MPI-ACC, an integrated and extensible framework that allows end-to-end data movement in accelerator-based systems. MPI-ACC provides productivity and performance benefits by integrating support for auxiliary memory spaces into MPI. MPI-ACC supports data transfer among CUDA, OpenCL and CPU memory spaces and is extensible to other offload models as well. MPI-ACC's runtime system enables several key optimizations, including pipelining of data transfers, scalable memory management techniques, and balancing of communication based on accelerator and node architecture. MPI-ACC is designed to work concurrently with other GPU workloads with minimum contention. We describe how MPI-ACC can be used to design new communication-computation patterns in scientific applications from domains such as epidemiology simulation and seismology modeling, and we discuss the lessons learned. We present experimental results on a state-of-the-art cluster with hundreds of GPUs;and we compare the performance and productivity of MPI-ACC with MVAPICH, a popular CUDA-aware MPI solution. MPI-ACC encourages programmers to explore novel application-specific optimizations for improved overall cluster utilization.
This article presents an algorithm for detecting deadlocks in concurrent finite-state systems without incurring most of the state explosion due to the modeling of concurrency by interleaving. For systems that have a h...
详细信息
This article presents an algorithm for detecting deadlocks in concurrent finite-state systems without incurring most of the state explosion due to the modeling of concurrency by interleaving. For systems that have a high level of concurrency, our algorithm can be much more efficient than the classical exploration of the whole state space. Finally, we show that our algorithm can also be used for verifying arbitrary safety properties.
This paper proposes a new concurrent data structure, called parallel hash table, for synchronizing the access of multiple threads to resources stored in a shared buffer. We prove theoretically the complexity of the op...
详细信息
This paper proposes a new concurrent data structure, called parallel hash table, for synchronizing the access of multiple threads to resources stored in a shared buffer. We prove theoretically the complexity of the operations and the upper limit on the thread conflict probability of the parallel hash table. To empirically evaluate the proposed concurrent data structure, we compare the performance of a TCP multi-threaded parallel hash table-based server to a conventional TCP multi-threaded shared buffer-based server implemented in Java. The experimental results on a network of 36 workstations running Windows NT, demonstrate that the parallel hash table-based server outperforms the conventional multi-threaded server. (C) 2006 Elsevier B.V. All rights reserved.
The single-instruction multiple thread (SIMT) architecture that can be found in some latest graphical processing units (GPUs) builds on the conventional single-instruction multiple data (SIMD) parallelism while adopti...
详细信息
The single-instruction multiple thread (SIMT) architecture that can be found in some latest graphical processing units (GPUs) builds on the conventional single-instruction multiple data (SIMD) parallelism while adopting the thread programming model. The architecture suffers from a degraded performance caused by the inefficient divergence handling, a problem hidden by the programmer's view of independent threads. A loop optimization technique having the potential to increase efficiency of the core SIMD block while processing embedded divergences is investigated here. concurrent loops are generally not bound to iterate in lock-step, allowing better alignment of thread flows via iteration scheduling. The concept efficiency is analyzed for fixed and flow-adapting scheduling policies. The proposed payoff model captures loop overhead implications, allowing one to assess the tradeoffs of applying the technique to a specific loop instance. Processing speedups can generally be observed in the total running time if kernels are compute-bound, as demonstrated by several examples. The studied iteration scheduling policies do not impose alterations to the core SIMD concept and design, thus preserving the benefits of data level parallelism.
In this paper, a mechanism is presented for reducing priority inversion in multiprogrammed computing systems. Contrary to well-known approaches from the literature, this paper tackles cases where the dependency relati...
详细信息
In this paper, a mechanism is presented for reducing priority inversion in multiprogrammed computing systems. Contrary to well-known approaches from the literature, this paper tackles cases where the dependency relationships among tasks cannot be known in advance to the operating system. The presented mechanism allows tasks to explicitly declare aforementioned relationships, enabling the operating system scheduler to take advantage of such information and trigger priority inheritance, resulting in reduced priority inversion. We present the prototype implementation of the concept within the Linux kernel in the form of modifications to the standard Portable Operating System Interface (POSIX) condition variable code, along with an extensive evaluation, including a quantitative assessment of the benefits for applications making use of the technique and comprehensive overhead measurements. In addition, we present an associated technique for the theoretical schedulability analysis of a system using the new mechanism, which is useful to determine whether all tasks can meet their deadlines or not, in the specific scenario of tasks interacting only through remote procedure calls and under partitioned scheduling.
The monitor concept provides a structured and flexible high-level programming construct to control concurrent accesses to shared resources. It has been widely used in a concurrent programming environment for implicitl...
详细信息
The monitor concept provides a structured and flexible high-level programming construct to control concurrent accesses to shared resources. It has been widely used in a concurrent programming environment for implicitly ensuring mutual exclusion and explicitly achieving process synchronization. This paper proposes an extension to the monitor construct for detecting runtime errors in monitor operations. Monitors are studied and classified according to their functional characteristics. A taxonomy of concurrency control faults over a monitor is then defined. The concepts of a monitor event sequence and a monitor state sequence provide a uniform approach to history information recording and fault detection. Rules for detecting various types of faults are defined. Based on these rules, fault-detection algorithms are developed. A prototypical implementation of the proposed monitor construct with runtime fault detection mechanisms has been developed in Java. We shall briefly report our experience with and the evaluation of the robust monitor prototype. Copyright (c) 2005 John Wiley & Sons, Ltd.
暂无评论