When two parallel threads holding no locks in common access the same memory location and at least one of the threads modifies the location, a 'data race' occurs, which is usually a bug. This paper describes th...
详细信息
ISBN:
(纸本)9780897919890
When two parallel threads holding no locks in common access the same memory location and at least one of the threads modifies the location, a 'data race' occurs, which is usually a bug. This paper describes the algorithms and strategies used by a debugging tool, called the Nondeterminator-2, which checks for data races in programs coded in the Cilk multithreaded language. Like its predecessor, the Nondeterminator, which checks for simple 'determinacy' races, the Nondeterminator-2 is a debugging tool, not a verifier, since it checks for data races only in the computation generated by a serial execution of the program on a given input. We give an algorithm, ALL-SETS, that determines whether the computation generated by a serial execution of a Cilk program on a given input contains a race. For a program that runs serially in time T, accesses V shared memory locations, uses a total of n locks, and holds at most k n locks simultaneously, ALL-SETS runs in. O(nkT α(V, V)) time and O(nkV) space, where α is Tarjan's functional inverse of Ackermann's function. Since ALL-SETS may be too inefficient in the worst case, we propose a much more efficient algorithm which can be used to detect races in programs that obey the 'umbrella' locking discipline, a programming methodology that is more flexible than similar disciplines proposed in the literature. We present an algorithm, BRELLY, which detects violations of the umbrella discipline in O(kT α(V, V)) time using O(kV) space. We also prove that any 'abelian' Cilk program, one whose critical sections commute, produces a determinate final state if it is deadlock free and if it generates any computation which is data-race free. Thus, the Nondeterminator-2's two algorithms can verify the determinacy of a deadlock-free abelian program running on a given input.
In this paper we consider the problem of deadlock-free routing in arbitrary parallel and distributed computers. We focus on asynchronous routing algorithms which continuously receive new packets to route and which do ...
详细信息
ISBN:
(纸本)9780897919890
In this paper we consider the problem of deadlock-free routing in arbitrary parallel and distributed computers. We focus on asynchronous routing algorithms which continuously receive new packets to route and which do not discard packets that encounter congestion. Specifically, we examine what we call the deadlock-free routing (DFR) problem. The input to the DFR problem consists of an arbitrary network and an arbitrary set of paths in the network. The output consists of a routing algorithm, which is a list of the buffers used along each of the paths. The routing algorithm is required to be free from deadlock and the goal is to minimize the number of buffers required in any one node. We study the DFR problem by converting it into an equivalent problem which we call the flattest common supersequence (FCS) problem. The input to the FCS problem consists of a set of sequences and the output consists of a single sequence that contains all of the input sequences as (possibly non-contiguous) subsequences. The goal of the FCS problem is to minimize the maximum frequency of any symbol in the output sequence. We present three main results. First, we prove that the decision version of the FCS problem is NP-complete, and has no polynomial-time approximation scheme unless P = NP. Next, we propose and experimentally evaluate a range of heuristics for FCS. Our experimental results show that one of these heuristics performs very well over a wide range of inputs. Lastly, we prove that this heuristic is in fact optimal for certain restricted classes of inputs.
We give an alphabet-independent deterministic parallel algorithm for finding all occurrences of a pattern array of size m(h) x m(w) in a text array of size n(h) x n(w) in the concurrent-read-concurrent-write-parallel-...
详细信息
We give an alphabet-independent deterministic parallel algorithm for finding all occurrences of a pattern array of size m(h) x m(w) in a text array of size n(h) x n(w) in the concurrent-read-concurrent-write-parallel-random-access-machine (CRCW-PRAM) model. Our algorithm runs in O(1) time performing optimal, that is, O(n(h) x n(w)) work, following preprocessing of the pattern. This improves the previous best bound of O(loglogm) time with optimal work [A. Amir, G. Benson, and M. Farach, proceedings 5th annualacmsymposium on parallelalgorithms and architectures, acm, New York, 1993, pp. 79-85], following preprocessing of the pattern, where m = max {m(h),m(w)}. The preprocessing required by our algorithm (and that due to Amir, Benson, and Farach) can be accomplished in O(loglogm) time and O(m(h) x m(w)) work [M. Crochemore et al., manuscript, 1993], [R. Cole et al., manuscript, 1993].
The proceedings contains 32 papers from the 9th annualacmsymposium on parallelalgorithms and architectures. Topics discussed include: Cilk programs;parallel scheduling algorithms;reactive diffracting trees;load bal...
详细信息
The proceedings contains 32 papers from the 9th annualacmsymposium on parallelalgorithms and architectures. Topics discussed include: Cilk programs;parallel scheduling algorithms;reactive diffracting trees;load balancing and data remapping for adaptive grid calculations;spectral partitioners;three-dimensional pattern matching;matrix-factorization algorithms;shared-memory models;fault-prone Bulk-Synchronous parallel (BSP) machines;parallel bandwidth;external memory algorithms;system area network mapping;multi-class routing algorithms;interconnection networks;deadlock-free oblivious wormhole routing algorithms;all-optical networks;approximation algorithms;fine-grain multithreading;speculative retirement and instruction windows.
We prove a number of negative results about practical (i.e., numerically accurate) algorithms for certain matrix factorizations. In particular, we prove that the popular Givens' method for computing the QR decompo...
详细信息
ISBN:
(纸本)9780897918909
We prove a number of negative results about practical (i.e., numerically accurate) algorithms for certain matrix factorizations. In particular, we prove that the popular Givens' method for computing the QR decomposition is inherently sequential over the realistic model of floating point arithmetic. We also prove a number of additional results concerning Gaussian Elimination for computing the LU decomposition. Altogether, the results of this paper support the widespread belief that there is a tradeoff between parallelism and accuracy in numerical algorithms.
Recently there has been an increasing interest in models of parallel computation that account for the bandwidth limitations in communication networks. Some models (e.g., BSP and LOGP) account for bandwidth limitations...
详细信息
Recently there has been an increasing interest in models of parallel computation that account for the bandwidth limitations in communication networks. Some models (e.g., BSP and LOGP) account for bandwidth limitations using a per-processor parameter g>1, such that each processor can send/receive at most h messages in g·h time. Other models (e.g., PRAM(m)) account for bandwidth limitations as an aggregate parameter mΩ(√lg p) separation known previously.
暂无评论