Common implementations of core memory allocation components handle concurrent allocation/release requests by synchronizing threads via spin-locks. This approach is not prone to scale, a problem that has been addressed...
详细信息
Common implementations of core memory allocation components handle concurrent allocation/release requests by synchronizing threads via spin-locks. This approach is not prone to scale, a problem that has been addressed in the literature by introducing layered allocation services or replicating the core allocators-the bottom-most ones within the layered architecture. Both these solutions tend to reduce the pressure of actual concurrent accesses to each individual core allocator. In this article, we explore an alternative approach to scalability of memory allocation/release, which can be still combined with those literature proposals. We present a fully non-blocking buddy system, where threads performing concurrent allocations/releases do not undergo any spin-lock based synchronization. Our solution allows threads to proceed in parallel, and commit their allocations/releases unless a conflict is materialized while handling the allocator metadata-memory fragmentation and coalescing are also carried out in a fully non-blocking manner. Conflict detection relies in our solution on atomic Read-Modify-Write (RMW) machine instructions, guaranteed to execute atomically by the processor firmware. We also provide a proof of the correctness of our non-blocking buddy system and show the results of an experimental study that outlines the effectiveness of our solution.
In this article, we present mutable locks, a synchronization construct with the same semantic of traditional locks (such as spin locks or sleep locks), but with a self-tuned optimized trade-off between responsiveness ...
详细信息
In this article, we present mutable locks, a synchronization construct with the same semantic of traditional locks (such as spin locks or sleep locks), but with a self-tuned optimized trade-off between responsiveness and CPU-time usage during threads' wait phases. Mutable locks tackle the need for efficient synchronization supports in the era of multicore machines, where the run-time performance should be optimized while reducing resource usage. This goal should be achieved with no intervention by the programmers. Our proposal is intended for exploitation in generic concurrent applications, where scarce or no knowledge is available about the underlying software/hardware stack and the workload. This is an adverse scenario for static choices between spinning and sleeping, which is tackled by our mutable locks thanks to their hybrid waiting phase and self-tuning capabilities.
Mutual exclusion and concurrency are two fundamental and essentially opposite features in distributed systems. However, in some applications such as Computer Supported Cooperative Work (CSCW) we have found it necessar...
详细信息
Mutual exclusion and concurrency are two fundamental and essentially opposite features in distributed systems. However, in some applications such as Computer Supported Cooperative Work (CSCW) we have found it necessary to impose mutual exclusion on different groups of processes in accessing a resource, while allowing processes of the same group to share the resource. To our knowledge, no such design issue has been previously raised in the literature. In this paper we address this issue by presenting a new problem, called Congenial Talking Philosophers, to model group mutual exclusion. We also propose several criteria to evaluate solutions of the problem and to measure their performance. Finally, we provide an efficient and highly concurrent distributed algorithm for the problem in a shared-memory model where processes communicate by reading from and writing to shared variables. The distributed algorithm meets the proposed criteria, and has performance similar to some naive but centralized solutions to the problem.
A snapshot scan algorithm produces an "instantaneous" picture of a region of sharedmemory that may be updated by concurrent processes. Many complex sharedmemoryalgorithms can be greatly simplified by stru...
详细信息
A snapshot scan algorithm produces an "instantaneous" picture of a region of sharedmemory that may be updated by concurrent processes. Many complex sharedmemoryalgorithms can be greatly simplified by structuring them around the snapshot scan abstraction. Unfortunately, the substantial decrease in conceptual complexity quite often is counterbalanced by an increase in computational complexity. In this paper, we introduce the notion of a weak snapshot scan, a slightly weaker primitive that has a more efficient implementation. We propose the following methodology for using this abstraction: first, design and verify an algorithm using the more powerful snapshot scan;second, replace the more powerful but less efficient snapshot with the weaker but more efficient snapshot, and show that the weaker abstraction nevertheless suffices to ensure the correctness of the enclosing algorithm. We give two examples of algorithms whose performance is enhanced while retaining a simple modular structure: bounded concurrent timestamping and bounded randomized consensus. The resulting timestamping protocol dominates all other currently known timestamping protocols: it matches the speed of the fastest known bounded concurrent timestamping protocol while actually reducing the register size by a logarithmic factor. The resulting randomized consensus protocol matches the computational complexity of the best known protocol that uses only bounded values.
Concurrent systems in which there is a known upper bound Delta on memory access time are considered. Two prototypical synchronization problems, mutual exclusion and consensus, are studied, and solutions that have cons...
详细信息
Concurrent systems in which there is a known upper bound Delta on memory access time are considered. Two prototypical synchronization problems, mutual exclusion and consensus, are studied, and solutions that have constant (i.e. independent of Delta and the total number of processes) time complexity in the absence of contention are presented. For mutual exclusion, in the absence of contention, a process needs only five accesses to the sharedmemory to enter its critical section, and in the presence of contention, the winning process may need to delay itself for 4 .Delta time units. For consensus, in absence of contention, a process decides after four accesses to the sharedmemory, and in the presence of contention, it may need to delay itself for Delta time units.
暂无评论