With the rapid growth of number of cores, together with the heterogeneous access latencies, the cost of synchronization and communication between distant components keeps growing. As more general purpose programs expl...
详细信息
ISBN:
(纸本)9781450363563
With the rapid growth of number of cores, together with the heterogeneous access latencies, the cost of synchronization and communication between distant components keeps growing. As more general purpose programs exploit the many-core architectures, the speedup achieved will then be limited by the synchronization needed to access shared objects [2]. When building Internet-scale systems, similar concerns lead to the design of scalable systems that limit global synchronization and operate locally when possible. CRDTs [1] succeed in capturing data types with clear concurrency semantics and are now common components in Internet-scale systems. However, they do not migrate trivially to shared-memory architectures due to high computational costs from merge functions, which becomes apparent once network communication is *** this talk, we discuss multi-view data types for shared-memory architectures, that leverages a global-local view model that distinguishes between a local fast state and a distant shared state. By executing operations on the local state without synchronization, while only synchronizing with the shared state when needed, applications can achieve better scalability at the expense of linearizability - the default correctness criteria for concurrent objects.
In this paper we present a novel algorithm for concurrent lock-free internal binary search trees (BST) and implement a Set abstract data type (ADT) based on that. We show that in the presented lock-free BST algorithm ...
详细信息
ISBN:
(纸本)9781450329446
In this paper we present a novel algorithm for concurrent lock-free internal binary search trees (BST) and implement a Set abstract data type (ADT) based on that. We show that in the presented lock-free BST algorithm the amortized step complexity of each set operation - Add, Remove and Contains - is O(H(n)+c), where H(n) is the height of the BST with n number of nodes and c is the contention during the execution. Our algorithm adapts to contention measures according to read-write load. If the situation is read-heavy, the operations avoid helping the concurrent Remove operations during traversal, and adapt to interval contention. However, for the write-heavy situations we let an operation help a concurrent Remove, even though it is not obstructed. In that case, an operation adapts to point contention. It uses single-word compare-and-swap (CAS) operations. We show that our algorithm has improved disjoint-access-parallelism compared to similar existing algorithms. We prove that the presented algorithm is linearizable. To the best of our knowledge, this is the first algorithm for any concurrent tree datastructure in which the modify operations are performed with an additive term of contention measure.
Combining methods are highly effective for implementing concurrent queues and stacks. These datastructures induce a heavy competition on one or two contention points. However, it was not known whether combining metho...
详细信息
ISBN:
(纸本)9783319144726;9783319144719
Combining methods are highly effective for implementing concurrent queues and stacks. These datastructures induce a heavy competition on one or two contention points. However, it was not known whether combining methods could be made effective for parallel scalable datastructures that do not have a small number of contention points. In this paper, we introduce local combining on-demand, a new combining method for highly parallel datastructures. The main idea is to apply combining locally for resources on which threads contend. We demonstrate the use of local combining on-demand on the common linked-list data structure. Measurements show that the obtained linked-list induces a low overhead when contention is low and outperforms other known implementations by up to 40% when contention is high.
We discuss ways to effectively parallelize the subset construction algorithm, which is used to convert non-deterministic finite automata (NFAs) to deterministic finite automata (DFAs). This conversion is at the heart ...
详细信息
ISBN:
(纸本)9781921770258
We discuss ways to effectively parallelize the subset construction algorithm, which is used to convert non-deterministic finite automata (NFAs) to deterministic finite automata (DFAs). This conversion is at the heart of string pattern matching based on regular expressions and thus has many applications in text processing, compilers, scripting languages and web browsers, security and more recently also with DNA sequence analysis. We discuss sources of parallelism in the sequential algorithm and their profitability on shared-memory multicore architectures. Our NFA and DFA data-structures are designed to improve scalability and keep communication and synchronization overhead to a minimum. We present three different ways for synchronization; the performance of our non-blocking synchronization based on a compare-and-swap (CAS) primitive compares favorably to a lock-based approach. We consider structural NFA properties and their relationship to scalability on highly-parallel multicore architectures. We demonstrate the efficiency of our parallel subset construction algorithm through several benchmarks run on a 4-CPU (40 cores) node of the Intel Manycore Testing Lab. Achieved speedups are up to a factor of 32x with 40 cores.
The Java (TM) m developers kit requires a size () operation for all objects, tracking the number of elements in the object. Unfortunately, the best known solution, available in the Java concurrency package, has a bloc...
详细信息
The Java (TM) m developers kit requires a size () operation for all objects, tracking the number of elements in the object. Unfortunately, the best known solution, available in the Java concurrency package, has a blocking concurrent implementation that does not scale. This paper presents a highly scalable wait-free implementation of a concurrent size () operation based on a new lock-free interrupting snapshots algorithm. The key idea behind the new algorithm is to allow snapshot scan methods to interrupt each other until they agree on a shared linearization point with respect to update methods. This contrasts sharply with past approaches to the classical atomic snapshot problem, that have had threads coordinate the collecting of a shared global view. As we show empirically, the new algorithm scales well, significantly outperforming existing implementations. (C) 2012 Elsevier Inc. All rights reserved.
Traditional data structure designs, whether lock-based or lock-free, provide parallelism via fine grained synchronization among threads. We introduce a new synchronization paradigm based on coarse locking, which we ca...
详细信息
ISBN:
(纸本)9781450300797
Traditional data structure designs, whether lock-based or lock-free, provide parallelism via fine grained synchronization among threads. We introduce a new synchronization paradigm based on coarse locking, which we call fiat combining The cost of synchronization in flat combining is so low, that having a single thread holding a lock perform the combined access requests of all others, delivers, up to a certain non-negligible concurrency level, better performance than the most effective parallel finely synchronized implementations We use fiat-combining to devise, among other structures, new linearizable stack, queue, and priority queue algorithms that greatly outperform all prior algorithms
暂无评论