First we study asymptotically fast algorithms for rectangular matrix multiplication. We begin with new algorithms for multiplication of an n x n matrix by an n x n(2) matrix in arithmetic time O(n(omega)), omega = 3.3...
详细信息
First we study asymptotically fast algorithms for rectangular matrix multiplication. We begin with new algorithms for multiplication of an n x n matrix by an n x n(2) matrix in arithmetic time O(n(omega)), omega = 3.333953..., which is less by 0.041 than the previous record 3.375477.... Then we present fast multiplication algorithms for matrix pairs of arbitrary dimensions, estimate the asymptotic running time as a function of the dimensions, and optimize the exponents of the complexity estimates. For a large class of input matrix pairs, we improve the known exponents. Finally we show three applications of our results: (a) we decrease from 2.851 to 2.837 the known exponent of *** bounds for fast deterministic (NC) parallel evaluation of the determinant, the characteristic polynomial, and the inverse of an n x n matrix, as well as for the solution to a nonsingular linear system of n equations, (b) we asymptotically accelerate the known sequential algorithms for the univariate polynomial composition mod x(n), yielding the complexity bound O(n(1.667)) versus the old record of O(n(1.688)), and for the univariate polynomial factorization over a finite field, and (c) we improve slightly the known complexity estimates for computing basic solutions to the linear programming problem with n constraints and n variables. (C) 1998 Academic Press.
This paper resolves the parallel complexity of the graph closure problem, an open question posed by S. Khuller. In particular, we prove that the 2N - k-closure problem is in L for k = 5 and it is P-complete for k grea...
详细信息
This paper resolves the parallel complexity of the graph closure problem, an open question posed by S. Khuller. In particular, we prove that the 2N - k-closure problem is in L for k = 5 and it is P-complete for k greater than or equal to 6. Finally, we show that the N + k-closure problem is P-complete for any integer k.
We show lower bounds for depth of arithmetic networks over algebraically closed fields, real closed fields and the field of the rationals. The parameters used are either the degree or the number of connected component...
详细信息
We show lower bounds for depth of arithmetic networks over algebraically closed fields, real closed fields and the field of the rationals. The parameters used are either the degree or the number of connected components. These lower bounds allow us to show the inefficiency of arithmetic networks to parallelize several natural problems. For instance, we show a square-root n lower bound for parallel time of the Knapsack problem over the reals and also that the computation of the ''integer part'' is not well parallelizable by arithmetic networks. Over the rationals we obtain results of similar order and that the Knapsack has an square root n lower bound for the parallel time measured by networks. A simply exponential lower bound for the parallel time of quantifier elimination is also shown. Finally, separations among classes P(K) and NC(K) are available for fields K in the above cases.
The problem of solving tridiagonal linear systems on parallel distributed-memory environments is considered in this paper. In particular, two common direct methods for solving such systems are considered: odd-even cyc...
详细信息
The problem of solving tridiagonal linear systems on parallel distributed-memory environments is considered in this paper. In particular, two common direct methods for solving such systems are considered: odd-even cyclic reduction and prefix summing. For each method, a variety of lower bounds on execution time for solving tridiagonal linear systems are presented. Specifically, lower bounds are presented that (a) hold when the number of data items per processor is bounded, (b) are general lower bounds, and (c) for specific data layouts commonly used in designing parallel algorithms to solve tridiagonal linear systems. Furthermore, algorithms are presented that have running times within a constant factor of the lower bounds provided. Lastly, a comparison of bounds for odd-even cyclic reduction and prefix summing is given.
The problem considered is that of approximating the solution of a linear scalar partial differential equation (PDE) at one or more locations in its domain. A lower bound on the amount of data required to satisfy a giv...
详细信息
The problem considered is that of approximating the solution of a linear scalar partial differential equation (PDE) at one or more locations in its domain. A lower bound on the amount of data required to satisfy a given error tolerance in the approximation is described. Using this bound, a lower bound on the execution time of parallel algorithms that approximate the solution is derived. The lower bound on the execution time has the form alpha.f(+).log2-epsilon-1, where alpha is a problem-dependent constant, f(+) is a measure of the speed of floating point arithmetic, and epsilon is an upper bound on the error. Thus, when alpha > 0, the execution time increases as epsilon decreases, independent of the number of processors, the interconnection topology, and the algorithm used. Lower bounds on the execution time are also given for the cases where the interconnection network or the number of processors is specified. Recent research has established that it is often possible to use a large number of processors efficiently when calculating the numerical solution of a PDE if the problem is sufficiently large. In this paper, it is shown that increasing the size of such a problem will usually come at the cost of increasing the execution time. Two examples are described that verify this conclusion, an algorithm-independent analysis of an elliptic PDE and an analysis of a specific algorithm for the approximation of a hyperbolic PDE.
The Orbit problem is defined as follows: Given a matrix A is an element of Q(nxn) and vectors x, y is an element of Q(n), does there exist a non-negative integer i such that A(i)x = y. This problem was shown to be in ...
详细信息
The Orbit problem is defined as follows: Given a matrix A is an element of Q(nxn) and vectors x, y is an element of Q(n), does there exist a non-negative integer i such that A(i)x = y. This problem was shown to be in deterministic polynomial time by Kannan and Lipton (J. ACM 33(4): 808-821, 1986). In this paper we place the problem in the logspace counting hierarchy GapLH. We also show that the problem is hard for C(=)L with respect to logspace many-one reductions.
In order to achieve practical efficient execution on a parallel architecture, a knowledge of the data dependencies related to the application appears as the key point for building an efficient schedule. By restricting...
详细信息
ISBN:
(纸本)0818685913
In order to achieve practical efficient execution on a parallel architecture, a knowledge of the data dependencies related to the application appears as the key point for building an efficient schedule. By restricting accesses in shared memory, we show that such a data dependency graph can be computed on-line on a distribute architecture. The overhead introduced is bounded with respect to the parallelism expressed by the user: each basic computation corresponds to a user-defined task;each data-dependency to a user-defined data structure. We introduce a language named Athapascan-1 that allows built a graph of dependencies from a strong typing of shared memory accesses. We detail compilation and implementation of the language. Besides, the performance of a code (parallel time, communication and arithmetic works, memory space) are defined from a cost model without the need of a machine model. We exhibit efficient scheduling with respect to these costs art theoretical machine models.
作者:
Sedjelmaci, Sidi MohamedLIPN
CNRS UMR 7030 Université Paris-Nord 93430 Villetaneuse Av. J.-B. Clément France
We generalize a formula of B. Litow [parallel complexity of Integer Coprimality, in Electronic Colloquium on Computational complexity, Report No. 9, 1998.] and propose several new formula linked with the parallel Inte...
详细信息
We present a new GCD algorithm for two integers that combines both the Euclidean and the binary gcd approaches. We give its worst case time analysis and we prove that its bit-time complexity is still O (n2) for two n-...
详细信息
The paper gives an overview of some models of computation which have proved successful in laying a foundation for a general theory of parallel computation. We present three models of parallel computation, namely boole...
详细信息
暂无评论