On an atomic read-modify-write (RMW) object one can read the complete old contents s of the object and simultaneously update its contents as a function delta (s) of the old contents in a single, indivisible, atomic op...
详细信息
On an atomic read-modify-write (RMW) object one can read the complete old contents s of the object and simultaneously update its contents as a function delta (s) of the old contents in a single, indivisible, atomic operation. It is known that these RMW objects do not have a wait-free implementation in the asynchronous PRAM model-in which processors can only communicate with each other through atomic read-write registers. For the general case, in which operations P over an object can return a function phi (P)(s) of the old contents s while simultaneously updating the object's state to delta (P)(s), few results are known. We give several characterizations, in terms of phi (P)(.) and delta (P)(.), of such objects for which no wait-free implementation in the asynchronous PRAM model exists. The resulting objects are remarkably similar to RMW objects. Indeed, we also exhibit two objects satisfying weaker conditions which do have a such a wait-free implementation. Our results suggest that only objects as strong as RMW objects do not have wait-free implementation in the asynchronous PRAM model. (C) 2001 Elsevier Science B.V. All rights reserved.
This paper aims at bridging the gap between the lack of synchronization mechanisms in recent graphics processor (GPU) architectures and the need of synchronization mechanisms in parallel applications. Based on the int...
详细信息
ISBN:
(纸本)9781595939890
This paper aims at bridging the gap between the lack of synchronization mechanisms in recent graphics processor (GPU) architectures and the need of synchronization mechanisms in parallel applications. Based on the intrinsic features of recent GPU architectures, we construct strong synchronization objects like wait-free and t-resilient read-modify-write objects for a general model of recent GPU architectures without strong hardware synchronization primitives like test-and-set and compare-and-swap. Accesses to the new wait-free objects have time complexity O(N), where N is the number of concurrent processes. The wait-free objects have space complexity O(N-2): which is optimal. Our result demonstrates that it is possible to construct wait-free synchronization mechanisms for GPUs without the need of strong synchronization primitives in hardware and that wait-free programming is possible for GPUs.
暂无评论