The authors address the apparently difficult problem of doing parallel transitive closure when the (directed) graph is sparse and/or, only single-source information is desired. O(e) work is their target for the single...
详细信息
ISBN:
(纸本)0897913701
The authors address the apparently difficult problem of doing parallel transitive closure when the (directed) graph is sparse and/or, only single-source information is desired. O(e) work is their target for the single-source problem. When the graph is sparse, then the all-pairs transitive closure problem can be solved by performing a depth-first search from each node, taking O(ne) time;that is their target for the all-pairs problem. The authors do not reach either target, except for the all-pairs case when e is fairly large. However, they make significant progress, in the sense that they have the first algorithms that simultaneously use time much less than linear and use work that is less than M(n).
This paper introduces a parallel scheduling problem where a directed acyclic graph modeling t tasks and their dependencies needs to be executed on n unreliable workers. Worker i executes task j correctly with probabil...
详细信息
ISBN:
(纸本)9781581139860
This paper introduces a parallel scheduling problem where a directed acyclic graph modeling t tasks and their dependencies needs to be executed on n unreliable workers. Worker i executes task j correctly with probability p i,j. The goal is to find a regimen Σ, that dictates how workers get assigned to tasks (possibly in parallel and redundantly) throughout execution, so as to minimize expected completion time. This fundamental parallel scheduling problem arises in grid computing and project management fields, and has several practical applications. We show a polynomial time algorithm for the problem restricted to the case when dag width is at most a constant and the number of workers is also at most a constant. These two restrictions may appear to be too severe. However, they are fundamentally required. Specifically, we demonstrate that the problem is NP-hard with constant number of workers when dag width can grow, and is also NP-hard with constant dag width when the number of workers can grow. When both dag width and the number of workers are unconstrained, then the problem is inapproximable within factor less than 5/4, unless P=NP. Copyright 2005 acm.
We describe an efficient parallel algorithm to solve the single function coarsest partition problem. The algorithm runs in O(logn) time using 0(n log log n) operations on the Arbitrary CRCW PRAM. The previous best kno...
详细信息
This paper introduces a storage format for sparse matrices, called compressed sparse blocks (CSB), which allows both Ax and A(x)(inverted perpendicular) to be computed efficiently in parallel, where A is an n x n spar...
详细信息
ISBN:
(纸本)9781605586069
This paper introduces a storage format for sparse matrices, called compressed sparse blocks (CSB), which allows both Ax and A(x)(inverted perpendicular) to be computed efficiently in parallel, where A is an n x n sparse matrix with nnz >= n nonzeros and x is a dense n-vector. Our algorithms use Theta(nnz) work (serial running time) and Theta(root nlgn) span (critical-path length), yielding a parallelism of Theta(nnz/root nlgn), which is amply high for virtually any large matrix. The storage requirement for CSB is esssentially the same as that for the more-standard compressed-sparse-rows (CSR) format, for which computing Ax in parallel is easy but A(x)(inverted perpendicular) is difficult. Benchmark results indicate that on one processor, the CSB algorithms for Ax and A(x)(inverted perpendicular) run just as fast as the CSR algorithm for Ax, but the CSB algorithms also scale up linearly with processors until limited by off-chip memory bandwidth.
We give tight bounds on the parallel complexity of some problems involving random graphs. Specifically, we show that a Hamiltonian cycle, a breadth first spanning tree, and a maximal matching can all be constructed in...
详细信息
Hardware Transactional Memory (HTM) implementations are becoming available in commercial, off-the-shelf components. While generally comparable, some implementations deviate from the strict all-or-nothing property of p...
详细信息
ISBN:
(纸本)9781450315722
Hardware Transactional Memory (HTM) implementations are becoming available in commercial, off-the-shelf components. While generally comparable, some implementations deviate from the strict all-or-nothing property of pure Transactional Memory. We analyse these deviations and find that with small modifications, they can be used to accelerate and simplify both transactional and nontransactional programming constructs. At the heart of our extensions we enable access to the transaction's full register state in the abort handler in an existing HTM without extending the architectural register state. Access to the full register state enables applications in both transactional and non-transactional parallel programming: hybrid transactional memory;transactional escape actions;transactional suspend/resume;and alert-on-update.
A collection of local workpiles (task queues) and a simple load balancing scheme is well suited for scheduling tasks in shared memory parallel machines. Task scheduling on such machines has usually been done through a...
详细信息
State-of-the-art parallel sorting algorithms for distributed-memory architectures are based on computing a balanced partitioning via sampling and histogramming. By finding samples that partition the sorted keys into e...
详细信息
ISBN:
(纸本)9781450395458
State-of-the-art parallel sorting algorithms for distributed-memory architectures are based on computing a balanced partitioning via sampling and histogramming. By finding samples that partition the sorted keys into evenly-sized chunks, these algorithms minimize the number of communication rounds required. Histogramming (computing positions of samples) guides sampling, enabling a decrease in the overall number of samples collected. We derive lower and upper bounds on the number of sampling/histogramming rounds required to compute a balanced partitioning. We improve on prior results to demonstrate that when using p processors, O(log* p) rounds with O(p/log* p) samples per round suffice. We match that with a lower bound that shows that any algorithm with O(p) samples per round requires at least Omega(log* p) rounds. Additionally, we prove the Omega(p log p) samples lower bound for one round, thus proving that existing one round algorithms: sample sort, AMS sort [2] and HSS [16] have optimal sample size complexity. To derive the lower bound, we propose a hard randomized input distribution and apply classical results from the distribution theory of runs.
We study parallel comparison-based algorithms for finding all equivalence classes of a set of n elements, where sorting according to some total order is not possible. Such scenarios arise, for example, in applications...
详细信息
ISBN:
(纸本)9781450342100
We study parallel comparison-based algorithms for finding all equivalence classes of a set of n elements, where sorting according to some total order is not possible. Such scenarios arise, for example, in applications, such as in distributed computer security, where each of n agents are working to identify the private group to which they belong, with the only operation available to them being a zero-knowledge pairwise-comparison (which is sometimes called a "secret handshake") that reveals only whether two agents are in the same group or in different groups. We provide new parallelalgorithms for this problem, as well as new lower bounds and distribution-based analysis.
Recently, the subject of allocating tasks to servers has attracted much attention. There are several ways of distinguishing load balancing problems. There are sequential and parallel strategies, that is, placing the t...
详细信息
Recently, the subject of allocating tasks to servers has attracted much attention. There are several ways of distinguishing load balancing problems. There are sequential and parallel strategies, that is, placing the tasks one after the other or all of them in parallel. Another approach divides load balancing problems into continuous and static ones. In the continuous case new tasks are generated and consumed as time proceeds, in the second case the number of tasks is fixed. We present and analyze a parallel randomized continuous load balancing algorithm in a scenario where n processors continuously generate and consume tasks according to some given probability distribution. Each processor initiates load balancing actions only if its load exceeds a certain threshold t, and then tries to find a balancing partner for moving some of its tasks to this partner. We consider several randomized load generation and consumption schemes which yield an expected load of O(n) in the whole system. We show that the maximum load of any processor is upper bounded by (log log n)2, with high probability. The expected number of load requests required for finding a balancing partner is constant.
暂无评论