High-end shared storage systems serving multiple independent work-loads must assure that concurrently executing clients will receive a fair or agreed-upon share of system I/O resources. In a parallel I/O system an app...
详细信息
ISBN:
(纸本)9781581139860
High-end shared storage systems serving multiple independent work-loads must assure that concurrently executing clients will receive a fair or agreed-upon share of system I/O resources. In a parallel I/O system an application makes requests for specific disks at different steps of its computation depending on the data layout and its computational state. Different applications contend for disk access making the problem of maintaining fair allocation challenging. We propose a model for differentiated disk bandwidth allocation based on lexicographic minimization, and provide new efficient scheduling algorithms to allocate the I/O bandwidth fairly among contending applications. A major contribution of our model is its ability to handle multiple parallel disks and contention for disks among the concurrent applications. Analysis and simulation-based evaluation shows that our algorithms provide performance isolation, weighted allocation of resources, and are work conserving. The solutions are also applicable to other shared resource environments dealing with non-uniform heterogeneous servers. Copyright 2005 acm.
This paper introduces a parallel scheduling problem where a directed acyclic graph modeling t tasks and their dependencies needs to be executed on n unreliable workers. Worker i executes task j correctly with probabil...
详细信息
ISBN:
(纸本)9781581139860
This paper introduces a parallel scheduling problem where a directed acyclic graph modeling t tasks and their dependencies needs to be executed on n unreliable workers. Worker i executes task j correctly with probability p i,j. The goal is to find a regimen Σ, that dictates how workers get assigned to tasks (possibly in parallel and redundantly) throughout execution, so as to minimize expected completion time. This fundamental parallel scheduling problem arises in grid computing and project management fields, and has several practical applications. We show a polynomial time algorithm for the problem restricted to the case when dag width is at most a constant and the number of workers is also at most a constant. These two restrictions may appear to be too severe. However, they are fundamentally required. Specifically, we demonstrate that the problem is NP-hard with constant number of workers when dag width can grow, and is also NP-hard with constant dag width when the number of workers can grow. When both dag width and the number of workers are unconstrained, then the problem is inapproximable within factor less than 5/4, unless P=NP. Copyright 2005 acm.
We study the relatively old problem of asymptotically reducing the runtime of serial computations with polynomial size Boolean circuits. To the best of our knowledge, no progress on this problem has been formally repo...
详细信息
ISBN:
(纸本)9781581139860
We study the relatively old problem of asymptotically reducing the runtime of serial computations with polynomial size Boolean circuits. To the best of our knowledge, no progress on this problem has been formally reported in the literature for general computational models, although we observe that early work of Chandra, Stockmeyer, and Vishkin implies the existence of non-uniform unbounded fan-in circuits of tO(1) size and O(t/log log n) depth, for time t Turing machines. We give an algorithmic size-depth tradeoff for parallelizing time t random access Turing machines, a model at least as powerful as logarithmic cost RAMs. Our parallel simulation yields logspace-uniform tO(1) size, O(t/log t) depth Boolean circuits having semi-unbounded fan-in gates. In fact, for appropriate d, uniform tO(1)2 O(t/d) size circuits of depth O(d) can simulate time t. One corollary is that any log-cost time t RAM can be simulated by a log-cost CRCW PRAM using tO(1) processors and O(t/ log t) time. This is a major improvement over previous parallel speedups, which could only guarantee an Ω(log t) speedup with an exponential number of processors. Copyright 2005 acm.
We consider a model with n players and m objects. Each player has a "preference vector" of length m that models his grade for each object. The grades are unknown to the players. A player can learn his grade ...
详细信息
ISBN:
(纸本)9781581139860
We consider a model with n players and m objects. Each player has a "preference vector" of length m that models his grade for each object. The grades are unknown to the players. A player can learn his grade for an object by probing that object, but performing a probe incurs cost. The goal of a player is to learn his preference vector with minimal cost, by adopting the results of probes performed by other players. To facilitate communication, we assume that players collaborate by posting their grades for objects on a shared billboard: reading from the billboard is free. We consider players whose preference vectors are popular, i.e., players whose preferences are common to many other players. We present distributed and sequential algorithms to solve the problem with logarithmic cost overhead. Copyright 2005 acm.
In recent years, reconfigurable technology has emerged as a popular choice for implementing various types of cryptographic functions. Nevertheless, an insufficient amount effort has been placed into fully exploiting t...
详细信息
ISBN:
(纸本)0769524451
In recent years, reconfigurable technology has emerged as a popular choice for implementing various types of cryptographic functions. Nevertheless, an insufficient amount effort has been placed into fully exploiting the tremendous amounts of parallelism intrinsic to FPGAs for this class of algorithms. In this paper, we focus on block cipher architectures and explore design decisions that leverage the multi-grained parallelism inherent in many of these algorithms. We demonstrate the usefulness of this approach with a highly parallel FPGA implementation of the AES standard, and present results detailing the area/delay tradeoffs resulting from our design decisions.
We present two methods for weighted consistent hashing also known as weighted distributed hash tables. The first method, called Linear Method, combines the standard consistent hasing introduced by Karger et al. [9] wi...
详细信息
ISBN:
(纸本)9781581139860
We present two methods for weighted consistent hashing also known as weighted distributed hash tables. The first method, called Linear Method, combines the standard consistent hasing introduced by Karger et al. [9] with a linear weighted distance measure. By using node copies and different partitions of the hash space, the balance of this scheme approximates the fair weight relationship with high probability. The second method, called the Logarithmic Method, uses a logarithmic weighted distance between the peers and the data to find the corresponding node. For distributing one data element it provides perfect weighted balance. To provide this distribution for many data elements we use partitions to achieve a fair balance with high probability. These methods provide small fragmentation, which means that the hash space is divided into at most script O sign(n log n) intervals. Furthermore, there is an efficient data structure that assigns data elements to the nodes in expected time script O sign(log n). If small fragmentation is not an issue one can replace the use of partitions by a method we call double hash functions. This method needs script O sign(n) for assigning elements to a node, yet it can be directly used for Storage Area Networks, where the number of nodes is small compared to participating nodes in Peer-to-Peer networks. Copyright 2005 acm.
The generalized windows scheduling problem for n jobs on multiple machines is defined as follows: Given is a sequence, I = 〈(w1, l 1),(w2, l2),....(wn, l n)〉 of n Pairs of positive integers that are associated with th...
详细信息
ISBN:
(纸本)9781581139860
The generalized windows scheduling problem for n jobs on multiple machines is defined as follows: Given is a sequence, I = 〈(w1, l 1),(w2, l2),....(wn, l n)〉 of n Pairs of positive integers that are associated with the jobs 1, 2,..., n, respectively. The processing length of job i is li slots (a slot is the processing time of one length unit). The goal is to repeatedly and non-preemptively schedule all the jobs on the fewest possible parallel machines such that the gap (window) between two consecutive executions of the first slot of. job i is at most wi slots. This problem arises in push broadcast systems in which data is transmitted on parallel channels. The problem is NP-hard even for unit-length jobs and a (1 + Ε)-approximation algorithm is known for this case by approximating the natural lower bound W(I) = Σi=1n(1/wi). The techniques used for approximating unit-length jobs cannot be applied for arbitrary-length jobs mainly because the optimal number of machines might be arbitrarily larger than the generalized lower bound W (I) = Σi=1n(l i/wi). Our main result is an 8-approximation algorithm for the generalized problem using new methods, different from those used for the unit-length case. We also present an algorithm that uses 2(1 + Ε)W(I) + log wmax machines and a greedy algorithm that is based on a new tree representation of schedules. The greedy algorithm is optimal for some special and simulations show that it performs very well in practice. Copyright 2005 acm.
We study randomized algorithms for placing a sequence of n nodes on a circle with unit perimeter. Nodes divide the circle into disjoint arcs. We desire that a newly-arrived node (which is oblivious of its index in the...
详细信息
ISBN:
(纸本)9781581139860
We study randomized algorithms for placing a sequence of n nodes on a circle with unit perimeter. Nodes divide the circle into disjoint arcs. We desire that a newly-arrived node (which is oblivious of its index in the sequence) choose its position on the circle by learning the positions of as few existing nodes as possible. At the same time, we desire that that the variation in arc-lengths be small. To this end, we propose a new algorithm that works as follows: The kth node chooses r random points on the circle, inspects the sizes of v arcs in the vicinity of each random point, and places itself at the mid-point of the largest arc encountered. We show that for any combination of r and v satisfying rv ≥ c log k, where c is a small constant, the ratio of the largest to the smallest arc-length is at most eight w.h.p., for an arbitrarily long sequence of n nodes. This strategy of node placement underlies a novel decentralized load-balancing algorithm that we propose for Distributed Hash Tables (DHTs) in peer-to-peer environments. Underlying the analysis of our algorithm is Structured Coupon Collection over n/b disjoint cliques with b nodes per clique, for any n, b ≥ 1. Nodes are initially uncovered. At each step, we choose d nodes independently and uniformly at random. If all the nodes in the corresponding cliques are covered, we do nothing. Otherwise, from among the chosen cliques with at least one uncovered node, we select one at random and cover an uncovered node within that clique. We show that as long as bd ≥ c log n, O(n) steps are sufficient to cover all nodes w.h.p. and each of the first Ω(n) steps succeeds in covering a node w.h.p. These results are then utilized to analyze a stochastic process for growing binary trees that are highly balanced - the leaves of the tree belong to at most four different levels with high probability. Copyright 2005 acm.
We explore the possibility of using multiple processors to improve the encoding and decoding times of Lempel-Ziv schemes. A new layout of the processors, based on a full binary tree, is suggested and it is shown how L...
详细信息
We explore the possibility of using multiple processors to improve the encoding and decoding times of Lempel-Ziv schemes. A new layout of the processors, based on a full binary tree, is suggested and it is shown how LZSS and LZW can be adapted to take advantage of such parallelarchitectures. The layout is then generalized to higher order trees. Experimental results show an improvement in compression over the standard method of parallelization and an improvement in time over the sequential method. (C) 2004 Elsevier B.V. All rights reserved.
暂无评论