We use O(log2 n) parallel arithmetic steps and n2 processors to compute the least-squares solution x = A+b to a linear system of equations, Ax = b, given a g × h matrix A and a vector b, both filled with integers...
详细信息
ISBN:
(纸本)0897913701
We use O(log2 n) parallel arithmetic steps and n2 processors to compute the least-squares solution x = A+b to a linear system of equations, Ax = b, given a g × h matrix A and a vector b, both filled with integers or with rational numbers, provided that g + h ≤ 2n and that A is given with its displacement generator of length r = O(1) and thus has displacement rank O(1). For a vector b and for a general p × q matrix A (with p + q ≤ n), we compute A+ and A+b by using O(log2 n) parallel arithmetic steps and n2.851 processors, and we may also compute A+b by using O(n2.376) arithmetic operations.
Space-efficient Information Dispersal Algorithm (IDA) [11] is applied to parallel communication in the hypercube. Let N denote the size of the network. Our communication scheme runs in 2·log N + 1 time using cons...
详细信息
ISBN:
(纸本)0897913701
Space-efficient Information Dispersal Algorithm (IDA) [11] is applied to parallel communication in the hypercube. Let N denote the size of the network. Our communication scheme runs in 2·log N + 1 time using constant size buffers. Its probability of successful routing is at least 1 - N-2.419·log N+1.5, proving Rabin's conjecture. The same scheme also tolerates O(N) random link failures with high probability. The scheme runs within the said time bound without long delay. On-line and efficient wire testing and replacement on the hypercube can be realized if our fault-tolerant routing scheme is used. Let α denote the total number of links in the hypercube. It is shown that ≈ α/352 wires can be disabled simultaneously without disrupting the ongoing computation or degrading the routing performance much.
We present parallelalgorithms to construct, for given sets of leaf weights, binary trees with almost optimal weighted path length (i.e. approximate Huffman trees). Specifically, assuming that weights are normalized (...
详细信息
ISBN:
(纸本)0897913701
We present parallelalgorithms to construct, for given sets of leaf weights, binary trees with almost optimal weighted path length (i.e. approximate Huffman trees). Specifically, assuming that weights are normalized (to sum up to one) and error refers to the (absolute) difference between the weighted path length of a given tree and the optimal tree with the same weights, we present: (i) O(logn) time and n loglogn/logn EREW processor algorithm which constructs a tree with error less than 0.172;(ii) O(klogn log*n) time and n CREW processor algorithm which produces a tree with error at most 1/nk, and (iii) an O(k2logn) time and n2 CREW processor algorithm which produces a tree with error at most 1/nk. Our approach also leads to two sequential algorithms: an O(kn) time algorithm which produces a tree with error at most 1/n2(k) and O(kn) time algorithm which produces a tree with error at most 1/2n2(k). The last two algorithms use different computation models. (A detailed description of sequential algorithms is left to an expanded version of the paper [KP89].)
Multithreading has been proposed as an architectural strategy for tolerating latency in multiprocessors and, through limited empirical studies, shown to offer promise. This paper develops an analytical model of multit...
详细信息
ISBN:
(纸本)0897913701
Multithreading has been proposed as an architectural strategy for tolerating latency in multiprocessors and, through limited empirical studies, shown to offer promise. This paper develops an analytical model of multithreaded processor behavior based on a small set of architectural and program parameters. The model gives rise to a large Markov chain, which is solved to obtain a formula for processor efficiency in terms of the number of threads per processor, the remote reference rate, the latency, and the cost of switching between threads. It is shown that a multithreaded processor exhibits three operating regimes: linear (efficiency is proportional to the number of threads), transition, and saturation (efficiency depends only on the remote reference rate and switch cost). Formulae for regime boundaries are derived. The model is embellished to reflect cache degradation due to multithreading, using an analytical model of cache behavior, demonstrating that returns diminish as the number threads becomes large. Predictions from the embellished model correlate well with published empirical measurements. Prescriptive use of the model under various scenarios indicates that multithreading is effective, but the number of useful threads per processor is fairly small.
Let P be a simple rectilinear convex polygon of size O(n) inside which lie n pairwise disjoint rectangular rectilinear obstacles. We provide parallel techniques for computing rectilinear shortest paths that avoid the ...
详细信息
ISBN:
(纸本)0897913701
Let P be a simple rectilinear convex polygon of size O(n) inside which lie n pairwise disjoint rectangular rectilinear obstacles. We provide parallel techniques for computing rectilinear shortest paths that avoid the set of obstacles in P. Specifically, we compute descriptions of shortest paths in O(log2 n) time, with O(n2/log2n) processors in the CREWPRAM model if source and destination are on the boundary of P, with O(n2/log n) processors if the source is an obstacle vertex and the destination a vertex of P, and with O(n2) processors if both source and destination are obstacle vertices. The descriptions we compute enable one processor to obtain the path length of any query pair of vertices in constant time, or O(n/log n) processors to retrieve the shortest path itself in logarithmic time. If the two query points are arbitrary rather than vertices, then one processor takes O(log n) time (instead of constant time) for finding the path length. A number of other related shortest paths problems are solved. The techniques we use involve a fast computation of separator staircases and, most importantly, a scheme for partitioning the obstacles' boundaries into families in a way that ensures that the resulting path length matrices have a monotonicity property that is apparently absent before applying our partitioning scheme.
暂无评论