This paper presents a new parallel conjugate direction method for unconstrained, n-dimensional minimization problems. Starting from one point, the method performs n line searches concurrently along n conjugate directi...
详细信息
This paper presents a new parallel conjugate direction method for unconstrained, n-dimensional minimization problems. Starting from one point, the method performs n line searches concurrently along n conjugate directions of the approximate Hessian to derive a new point and generate n new conjugate directions. For deriving the new point, a linear formula is presented, and its convergence properties are discussed. The n new conjugate directions are generated mostly in parallel without explicitly using the approximate Hessian. For quadratic problems, it is shown that the method converges in two parallel iterations. For nonquadratic problems, safeguard steps are taken to guarantee monotonic decency and global convergence. A sufficient condition for “rapid” convergence is also presented. Numerical testing results indicate that, comparing with the conventional conjugate gradient method, the parallel method takes less (parallel) iterations and less function evaluations for all test problems, lmd is most effective for problems whose Hessians change little from iteration to iteration.
The compressible, three-dimensional, time-dependent Navier-Stokes equations are solved on a 20 processor Flex/32 computer. The code is a parallel implementation of an existing code operational on the Cray-2 at NASA Am...
详细信息
A fast parallel algorithm that can be used to find a satisfying truth assignment for a 2-CNF formula is proposed. The input to the algorithm is a formula that is the conjunction of a given number of clauses, each of ...
详细信息
A fast parallel algorithm that can be used to find a satisfying truth assignment for a 2-CNF formula is proposed. The input to the algorithm is a formula that is the conjunction of a given number of clauses, each of which is the disjunction of exactly 2 literals, over a given number of Boolean variables. The algorithm determines if the inputted formula is satisfiable, and, if so, it finds a truth assignment to the variables that satisfies the formula. The implementation of the algorithm on a concurrent-read concurrent-write parallel random access machine (CRCW PRAM) is described. The input data structures are: 1. the number of clauses, 2. the number of variables, and 3. an array of the length of the number of clauses, with the entry for each clause consisting of the indexes of the 2 literals that occur in the clause. Output data structures are: 1. a Boolean variable indicating if the formula is satisfiable, and 2. an array of a length equal to twice the number of variables.
This paper presents a parallel algorithm that computes the breadth-first search (BFS) numbering of a directed graph in O(log super(2)n) time using M(n) processors on the exclusive-read exclusive-write (EREW) parallel ...
详细信息
This paper presents a parallel algorithm that computes the breadth-first search (BFS) numbering of a directed graph in O(log super(2)n) time using M(n) processors on the exclusive-read exclusive-write (EREW) parallel random access machine (PRAM) model, where M(n) denotes the number of processors needed to multiply two n x n integer matrices over the ring (Z, +, X) in O(log n) time. The best known bound for M(n) is O(n super(2.376)) (Coppersmith and Winograd, 1987). The algorithm presented in their paper uses fewer processors than the classical algorithm for BFS that employs matrix powering over the semiring (dioid) (N, min, +), using O(log n) time and O(n super(3)) processors on the concurrent-read concurrent-write (CRCW) model, or using O(log super(2) n) time and n super(3)/log n processors on the EREW model.
A modified version of the fast parallel thinning algorithm proposed by Zhang and Suen is presented in this paper. It preserves the original merits such as the contour noise immunity and good effect in thinning crossin...
详细信息
A modified version of the fast parallel thinning algorithm proposed by Zhang and Suen is presented in this paper. It preserves the original merits such as the contour noise immunity and good effect in thinning crossing lines; and overcomes the original demerits such as the serious shrinking and line connectivity problems.
A parallel algorithm is proposed in this paper for solving the problem $\min \{ q(x)|x \in C_1 \cap \cdots \cap C_m \} $ where q is an uniformly convex function and $C_i$ are closed convex sets in $R^n$. In each it...
详细信息
A parallel algorithm is proposed in this paper for solving the problem $\min \{ q(x)|x \in C_1 \cap \cdots \cap C_m \} $ where q is an uniformly convex function and $C_i$ are closed convex sets in $R^n$. In each iteration of the method, we solve in parallel m independent subproblems, each minimizing a definite quadratic function over an individual set $C_i$. The method has attractive convergence properties and can be implemented as parallel algorithms for tackling definite quadratic programs, linear programs, systems of linear equations and systems of generalized nonlinear inequalities.
In this paper, Tseng and Lee's parallel algorithm to solve the stable marriage prolem is analyzed. It is shown that the average number of parallel proposals of the algorithm is of ordern by usingn processors on a ...
详细信息
In this paper, Tseng and Lee's parallel algorithm to solve the stable marriage prolem is analyzed. It is shown that the average number of parallel proposals of the algorithm is of ordern by usingn processors on a CREW PRAM, where each parallel proposal requiresO(loglog(n) time on CREW PRAM by applying the parallel selection algorithms of Valiant or Shiloach and Vishkin. Therefore, our parallel algorithm requiresO(nloglog(n)) time. The speed-up achieved is log(n)/loglog(n) since the average number of proposals required by applying McVitie and Wilson's algorithm to solve the stable marriage problem isO(nlog(n)).
This article presents PFCM, a parallel algorithm for fuzzy clustering of large data sets. Being a generalization of FCM, the algorithm enables arbitrary numbers of data points, features and clusters to be handled cost...
详细信息
This article presents PFCM, a parallel algorithm for fuzzy clustering of large data sets. Being a generalization of FCM, the algorithm enables arbitrary numbers of data points, features and clusters to be handled cost-optimally by hypercube SIMD computers of arbitrary cube dimension, the only limitation being the size of the local memories of the processors. Speedup responds optimally to enlarging the hypercube. PFCM owes its flexibility to the technique employed in its derivation from the sequential fuzzy C-means algorithm FCM: the association of each of the three dimensions of the problem (numbers of data points, features and clusters) with a distinct subset of hypercube dimensions.
We decrease (from (n super(2.876)) to o(n super(2.851)) the current record bound on the number of processors required in O(log super(2)n) step parallel arithmetic algorithms over rationals for the exact evaluation of ...
详细信息
We decrease (from (n super(2.876)) to o(n super(2.851)) the current record bound on the number of processors required in O(log super(2)n) step parallel arithmetic algorithms over rationals for the exact evaluation of the inverse and all coefficients of the characteristic polynomial of an n x n rational, real, or complex matrix A. For an integer input matrix A, the evaluation involves only d-bit numbers where either d = O(log p) if the computation is modulo a prime p or d = O(n log parallel A parallel ) in the general case; the Boolean cost of computing det A is further decreased in a randomized parallel algorithm.
The parallel complexity of the blocking flow problem is examined. Shiloach and Vishkin (1982) gave a time parallel algorithm for finding a blocking flow in an acyclic network, thereby giving an algorithm for max-flow...
详细信息
The parallel complexity of the blocking flow problem is examined. Shiloach and Vishkin (1982) gave a time parallel algorithm for finding a blocking flow in an acyclic network, thereby giving an algorithm for max-flow. Then, Goldschlager, Shaw, and Staples (1982) showed that the max-flow problem is log-space complete for P, so it is unlikely to be NC. This complexity is resolved for a restricted class of acyclic networks called 3-layer networks. The CREW PRAM model is used, in which concurrent reads from the same memory location are allowed, but concurrent writes to the same memory location are disallowed. The lexicographically first blocking flow is shown to be log-space complete for P. A random NC algorithm for finding a blocking flow in a 3-layer network is demonstrated.
暂无评论