We study integrated prefetching and caching in single and parallel disk systems. There exist two very popular approximation algorithms called Aggressive and Conservative for minimizing the total elapsed time in the si...
详细信息
ISBN:
(纸本)9781581136616
We study integrated prefetching and caching in single and parallel disk systems. There exist two very popular approximation algorithms called Aggressive and Conservative for minimizing the total elapsed time in the single disk problem. For D parallel disks, approximation algorithms are known for both the elapsed time and stall time performance measures. In particular, there exists a D-approximation algorithm for the stall time measure that uses D-1 additional memory locations in cache. In the first part of the paper we investigate approximation algorithms for the single disk problem. We give a refined analysis of the Aggressive algorithm, showing that the original analysis was too pessimistic. We prove that our new bound is tight. Additionally we present a new family of prefetching and caching strategies and give algorithms that perform better than Aggressive and Conservative. In the second part of the paper we investigate the problem of minimizing stall time in parallel disk systems. We present a polynomial time algorithm for computing a prefetching/caching schedule whose stall time is bounded by that of an optimal solution. The schedule uses at most 3(D - 1) extra memory locations in cache. This is the first polynomial time algorithm for computing schedules with a minimum stall time. Our algorithm is based on the linear programming approach of [1]. However, in order to achieve minimum stall times, we introduce the new concept of synchronized schedules in which fetches on the D disks are performed completely in parallel.
Explicit multithreading (XMT) is a parallel programming approach for exploiting on-chip parallelism. XMT introduces a computational framework with (1) a simple programming style that relies on fine-grained PRAM-style ...
详细信息
ISBN:
(纸本)9781581134094
Explicit multithreading (XMT) is a parallel programming approach for exploiting on-chip parallelism. XMT introduces a computational framework with (1) a simple programming style that relies on fine-grained PRAM-style algorithms;(2) hardware support for low-overhead parallel threads, scalable load balancing, and efficient synchronization. The missing link between the algorithmic-programming level and the architecture level is provided by the first prototype XMT compiler. This paper also takes this new opportunity to evaluate the overall effectiveness of the interaction between the programming model and the hardware, and enhance its performance where needed, incorporating new optimizations into the XMT compiler. We present a wide range of applications, which written in XMT obtain significant speedups relative to the best serial programs. We show that XMT is especially useful for more advanced applications with dynamic, irregular access patterns, where for regular computations we demonstrate performance gains that scale up to much higher levels than have been demonstrated before for on-chip systems.
Diffusive schemes have been widely analyzed for parallel and distributed load balancing. It is well known that their convergence rates depend on the eigenvalues of some associated matrices and on the expansion propert...
详细信息
Diffusive schemes have been widely analyzed for parallel and distributed load balancing. It is well known that their convergence rates depend on the eigenvalues of some associated matrices and on the expansion properties of the underlying graphs. In the first part of this paper we make use of these relationships in order to obtain new spectral bounds on the edge and node expansion of graphs. We show that these new bounds are better than the classical bounds for several graph classes. In the second part of the paper, we consider the load balancing problem for indivisible unit size tokens. Since known diffusion schemes do not completely balance the load for such settings, we propose a randomized distributed algorithm based on Markov chains to reduce the load imbalance. We prove that this approach provides the best asymptotic result that can be achieved in l1- or l2-norm concerning the final load situation.
Novel architectures for P2P applications were discussed. A distributed hash table (DHT) approach is studied. The probability a processor participates in a random lookup should be small. The memory requirements from ea...
详细信息
Novel architectures for P2P applications were discussed. A distributed hash table (DHT) approach is studied. The probability a processor participates in a random lookup should be small. The memory requirements from each server should be low. An overlay network should be build that performs lookup operations efficiently.
Previous implementations of out-of-core columnsort limit the problem size to N ≤ √(M/P)3/2, where N is the number of records to sort, P is the number of processors, and M is the total number of records that the enti...
详细信息
ISBN:
(纸本)9781581136616
Previous implementations of out-of-core columnsort limit the problem size to N ≤ √(M/P)3/2, where N is the number of records to sort, P is the number of processors, and M is the total number of records that the entire system can hold in its memory. We implemented two variations to out-of-core columnsort that relax this restriction. Sub-block columnsort is based on an algorithmic modification of the underlying columnsort algorithm, and it improves the problem-size bound to N ≤ (M/P)5/3/42/3 but at the cost of additional disk I/O. M-columnsort changes the notion of the column size in columnsort, improving the maximum problem size to N ≤ √/M3/2 but at the cost of additional computation and communication. Experimental results on a Beowulf cluster show that both subblock columnsort and M-columnsort run well but that M-columnsort is faster. A further advantage of M-columnsort is that it handles a wider range of problem sizes than subblock columnsort.
We propose a new approach for constructing P2P networks based on a dynamic decomposition of a continuous space into cells corresponding to processors. We demonstrate the power of these design rules by suggesting two n...
详细信息
ISBN:
(纸本)9781581136616
We propose a new approach for constructing P2P networks based on a dynamic decomposition of a continuous space into cells corresponding to processors. We demonstrate the power of these design rules by suggesting two new architectures, one for DHT (Distributed Hash Table) and the other for dynamic expander networks. The DHT network, which we call Distance Halving allows logarithmic routing and load, while preserving constant degrees. It offers an optimal tradeoff between the degree and the dilation in the sense that degree d guarantees a dilation of O(logdn). Another advantage over previous constructions is its relative simplicity. A major new contribution of this construction is a dynamic caching technique that maintains low load and storage even under the occurrence of hot spots. Our second construction builds a network that is guaranteed to be an expander. The resulting topologies are simple to maintain and implement. Their simplicity makes it easy to modify and add protocols. A small variation yields a DHT which is robust against random faults. Finally we show that, using our approach, it is possible to construct any family of constant degree graphs in a dynamic environment, though with worst parameters. Therefore we expect that more distributed data structures could be designed and implemented in a dynamic environment.
Quantum computation has become an intriguing technology with which to attack difficult problems and to enhance system security. Quantum algorithms, however, have been analyzed under idealized assumptions without impor...
详细信息
ISBN:
(纸本)9781581136616
Quantum computation has become an intriguing technology with which to attack difficult problems and to enhance system security. Quantum algorithms, however, have been analyzed under idealized assumptions without important physical constraints in mind. In this paper, we analyze two key constraints: the short spatial distance of quantum interactions and the short temporal life of quantum data. In particular, quantum computations must make use of extremely robust error correction techniques to extend the life of quantum data. We present optimized spatial layouts of quantum error correction circuits for quantum bits embedded in silicon. We analyze the complexity of error correction under the constraint that interaction between these bits is near neighbor and data must be propagated via swap operations from one part of the circuit to another. We discover two interesting results from our quantum layouts. First, the recursive nature of quantum error correction circuits requires a additional communication technique more powerful than near-neighbor swaps - too much error accumulates if we attempt to swap over long distances. We show that quantum teleportation can be used to implement recursive structures. We also show that the reliability of the quantum swap operation is the limiting factor in solid-state quantum computation.
We present an optimal parallel selection algorithm on the EREW PRAM. This algorithm runs in O(logn) time with n/log n processors. This complexity matches the known lower bound for parallel selection on the EREW PRAM m...
详细信息
We present an optimal parallel selection algorithm on the EREW PRAM. This algorithm runs in O(logn) time with n/log n processors. This complexity matches the known lower bound for parallel selection on the EREW PRAM model. We therefore close this problem which has been open for more than a decade.
Explicit multithreading (XMT) is a parallel programming approach for exploiting on-chip parallelism. XMT introduces a computational framework with (1) a simple programming style that relies on fine-grained PRAM-style ...
详细信息
Explicit multithreading (XMT) is a parallel programming approach for exploiting on-chip parallelism. XMT introduces a computational framework with (1) a simple programming style that relies on fine-grained PRAM-style algorithms;(2) hardware support for low-overhead parallel threads, scalable load balancing, and efficient synchronization. The missing link between the algorithmic-programming level and the architecture level is provided by the first prototype XMT compiler. This paper also takes this new opportunity to evaluate the overall effectiveness of the interaction between the programming model and the hardware, and enhance its performance where needed, incorporating new optimizations into the XMT compiler. We present a wide range of applications, which written in XMT obtain significant speedups relative to the best serial programs. We show that XMT is especially useful for more advanced applications with dynamic, irregular access patterns, where for regular computations we demonstrate performance gains that scale up to much higher levels than have been demonstrated before for on-chip systems.
暂无评论