Parallel algorithms for the solution of dense systems of nonlinear equations on a message-passing multiprocessor computer are developed. Specifically, a distributed finite-difference Newton method, a multiple secant m...
详细信息
Parallel algorithms for the solution of dense systems of nonlinear equations on a message-passing multiprocessor computer are developed. Specifically, a distributed finite-difference Newton method, a multiple secant method, and a rank-1 secant method are proposed. Experimental results, obtained on an Intel hypercube, indicate that these methods exhibit good parallelism.
We develop an algorithm for computing the symbolic Cholesky factorization of a large sparse symmetric positive definite matrix. The algorithm is intended for a message-passing multiprocessor system, such as the hyperc...
详细信息
We develop an algorithm for computing the symbolic Cholesky factorization of a large sparse symmetric positive definite matrix. The algorithm is intended for a message-passing multiprocessor system, such as the hypercube, and is based on the concept of elimination forest. In addition, we provide an algorithm for computing these forests along with a discussion of the algorithm's complexity and a proof of its correctness.
Optimizing inter-processor(PE) communication is crucial for parallelizing compilers for message-passing parallel machines to achieve high performance. In this paper, we;propose a technique to eliminate redundant inter...
详细信息
Optimizing inter-processor(PE) communication is crucial for parallelizing compilers for message-passing parallel machines to achieve high performance. In this paper, we;propose a technique to eliminate redundant inter-PE messages. This technique utilizes data-flow analysis to find a definition point that corresponds to a use point where the definition and the use occur in different PEs. If several read accesses occurred in the same PE use the data defined at the same definition point in another PE, redundant inter-PE messages are eliminated as Follows: only one inter-PE communication is performed for the earliest read access and the previously received data are used for the following read. In order to guarantee the consistency of the data, a valid flag and a sent nag are provided for each chunk of received data. The control of these flags is equivalent to the coherence control by the self invalidation on a compiler aided cache coherence scheme.
It has been claimed that the solution of a triangular systems with column storage cannot be achieved with much better than serial speed. We show taht the classical inner product algorithms can be nearly as efficient a...
详细信息
It has been claimed that the solution of a triangular systems with column storage cannot be achieved with much better than serial speed. We show taht the classical inner product algorithms can be nearly as efficient as the usual column sweep algorithm.
The authors recently proposed a new parallel algorithm, based on the sequential Levenberg-Marquardt method, for the nonlinear least-squares problem. The algorithm is suitable for message-passing multiprocessor compute...
详细信息
The authors recently proposed a new parallel algorithm, based on the sequential Levenberg-Marquardt method, for the nonlinear least-squares problem. The algorithm is suitable for message-passing multiprocessor computers. In this paper a parallel efficiency analysis is provided and computational results are reported. The experiments were performed on an Intel iPSC/2 multiprocessor with 32 nodes: this paper presents exPerimental results comparing the given parallel algorithm with sequential MINPACK code executed on a single processor. These experimental results show that essentially full efficiency is obtained for problems where the row size is sufficiently larger than the number of processors.
暂无评论