This paper concerns synchronization under read/write atomicity in shared memory multiprocessors. We present a new algorithm for N-process mutual exclusion that requires only read and write operations and that has O(lo...
详细信息
ISBN:
(纸本)0897916131
This paper concerns synchronization under read/write atomicity in shared memory multiprocessors. We present a new algorithm for N-process mutual exclusion that requires only read and write operations and that has O(log2 N) time complexity, where `time' is measured by counting remote memory references. The time complexity of this algorithm is better than that of all prior solutions to the mutual exclusion problem that are based upon atomic read and write instructions;in fact, the time complexity of most prior solutions is unbounded. Performance studies are presented that show that our mutual exclusion algorithm exhibits scalable performance under heavy contention. In the second part of the paper, we discuss two extensions of our mutual exclusion algorithm. In the first extension, we modify the algorithm so that in the absence of contention only O(1) memory references are required. In the second extension, the technique of software combining is introduced for the purpose of increasing concurrency when using the algorithm to implement combinable read-modify-write operations.
This volume of the proceedings contains 42 articles devoted to parallel processing algorithms and architectures used in computers with multiple processors. algorithms are presented for routing the processing steps, ma...
详细信息
ISBN:
(纸本)089791483X
This volume of the proceedings contains 42 articles devoted to parallel processing algorithms and architectures used in computers with multiple processors. algorithms are presented for routing the processing steps, mapping the data paths, sorting and interconnecting networks. Operating system program algorithms are discussed which optimize performance for hypercube, mesh connected arrays and to order sets, to select and label gray scale images and to schedule tasks.
This paper presents a method for mapping binary hypercube-algorithms onto lower dimensional meshes and analyzes this method in a model derived from the architecture of modern mesh machines. We outline the criteria use...
详细信息
ISBN:
(纸本)089791483X
This paper presents a method for mapping binary hypercube-algorithms onto lower dimensional meshes and analyzes this method in a model derived from the architecture of modern mesh machines. We outline the criteria used to evaluate graph embeddings for mapping supercomputer communication networks. The measured sorting rate of more than 2x106 keys per second on an iWarp torus with just 64 processors shows the excellent absolute performance of our approach.
We present two asymptotically optimal Θ(n) time algorithms for labeling the connected components of a gray-scale image on a mesh-connected computer. We assume that the input is an n × n gray-scale image mapped o...
详细信息
ISBN:
(纸本)089791483X
We present two asymptotically optimal Θ(n) time algorithms for labeling the connected components of a gray-scale image on a mesh-connected computer. We assume that the input is an n × n gray-scale image mapped one pixel per processor onto an n × n mesh-connected computer. Our algorithms label the components so that every component is connected, the maximum difference in the gray-scale values of the pixels within any component does not exceed a given value, and no component can be merged with a neighboring component. The first algorithm is based on a divide-and-conquer approach. Although it is simple, this algorithm has the potential drawback of possibly assigning two adjacent pixels with the same gray-scale value to different components. The second algorithm avoids this potential drawback, and exploits the ability of a mesh-connected computer to efficiently determine a maximal independent set of a planar graph.
Traditionally, register files have been the primary agent for inter-operation communication in load/store architectures. As processors start issuing multiple instructions per cycle, a centralized register file can eas...
详细信息
ISBN:
(纸本)0818631759
Traditionally, register files have been the primary agent for inter-operation communication in load/store architectures. As processors start issuing multiple instructions per cycle, a centralized register file can easily become a bottleneck. This paper analyzes the register file traffic in a load/store architecture with a view to motivate the development of alternate inter-operation communication mechanisms that reduce the bandwidth demanded of a centralized register file. We first provide metrics to characterize the register traffic. These metrics deal with the degree and locality of use of the register instances created. We then present the results of a simulation study that uses the MIPS R2000 architecture and the SPEC benchmark programs. We have two major results. First, a large number of the register instances are used only once, and the average degree of use of register instances is about 2. second, most of the register instances are used up soon after they are created (within about 30-40 instructions). This suggests that alternate inter-operation communication mechanisms that exploit the temporal locality of use of register instances are likely to be effective in reducing the traffic burden on the centralized register file. The second result was pivotal in the design of the distributed register file for the multiscalar processing paradigm.
暂无评论