We investigate several ways to improve the performance of sparse LU factorization with partial pivoting, as used to solve unsymmetric linear systems. We introduce the notion of unsymmetric supernodes to perform most o...
详细信息
We investigate several ways to improve the performance of sparse LU factorization with partial pivoting, as used to solve unsymmetric linear systems. We introduce the notion of unsymmetric supernodes to perform most of the numerical computation in dense matrix kernels. We introduce unsymmetric supernode-panel updates and two-dimensional data partitioning to better exploit the memory hierarchy. We use Gilbert and Peierls's depth-first search with Eisenstat and Liu's symmetric structural reductions to speed up symbolic factorization. We have developed a sparse LU code using all these ideas. We present experiments demonstrating that it is significantly faster than earlier partial pivoting codes. We also compare its performance with UMFPACK, which uses a multifrontal approach;our code is very competitive in time and storage requirements, especially for large problems.
Let $Ax = b$ be a large sparse nonsingular system of linear equations to be solved using Gaussian elimination with partial pivoting. The factorization obtained can be expressed in the form $A = P_1 M_1 P_2 M_2 \cdots ...
详细信息
Let $Ax = b$ be a large sparse nonsingular system of linear equations to be solved using Gaussian elimination with partial pivoting. The factorization obtained can be expressed in the form $A = P_1 M_1 P_2 M_2 \cdots P_{n - 1} M_{n - 1} U$, where $P_k $ is an elementary permutation matrix reflecting the row interchange that occurs at step k during the factorization, $M_k $ is a unit lower triangular matrix whose kth column contains the multipliers, and U is an upper triangular matrix.
The elimination tree is central to the study of Cholesky factorization of sparse symmetric positive definite matrices. In this paper, the elimination tree is generalized to a structure appropriate for the sparse LU fa...
详细信息
The elimination tree is central to the study of Cholesky factorization of sparse symmetric positive definite matrices. In this paper, the elimination tree is generalized to a structure appropriate for the sparse LU factorization of unsymmetric matrices. A pair of directed acyclic graphs, called elimination dags, is defined and they are used to characterize the zero-nonzero structures of the lower and upper triangular factors. These elimination structures are applied in a new algorithm to compute fill for sparse LU factorization. Experimental results indicate that the new algorithm is usually faster than earlier methods.
This paper develops and compares several fine-grained parallel algorithms to compute the Cholesky factorization of a sparsematrix. The experimental implementations are on the Connection Machine, a distributed-memory ...
详细信息
This paper develops and compares several fine-grained parallel algorithms to compute the Cholesky factorization of a sparsematrix. The experimental implementations are on the Connection Machine, a distributed-memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special-purpose algorithms in which the matrix structure conforms to the connection structure of the machine, this paper focuses on matrices with arbitrary sparsity structure. The most promising alternative is a supernodal, multifrontal algorithm whose inner loop performs several dense factorizations simultaneously on a two-dimensional grid of processors. The key subroutine is a fine-grained parallel, dense-factorization algorithm. The sparse code attains execution rates comparable to those of the dense subroutine. Although at present architectural limitations prevent the dense factorization from realizing its potential efficiency, it is concluded that a regular data parallel architecture can be used efficiently to solve arbitrarily structured sparse problems. A performance model is also presented and used to analyze these algorithms. Asymptotic analysis combined with experimental measurement of parameters is accurate enough to be useful in choosing among alternative algorithms for a complicated problem.
The matrix computation language and environment MATLAB is extended to include sparsematrix storage and operations. The only change to the outward appearance of the MATLAB language is a pair of commands to create full...
详细信息
The matrix computation language and environment MATLAB is extended to include sparsematrix storage and operations. The only change to the outward appearance of the MATLAB language is a pair of commands to create full or sparse matrices. Nearly all the operations of MATLAB now apply equally to full or sparse matrices, without any explicit action by the user. The sparse data structure represents a matrix in space proportional to the number of nonzero entries, and most of the operations compute sparse results in time proportional to the number of arithmetic operations on nonzeros.
For a general m by n sparsematrix A, a new scheme is proposed for the structural representation of the factors of its sparse orthogonal decomposition by Householder transformations. The storage scheme is row-oriented...
详细信息
For a general m by nsparsematrixA, a new scheme is proposed for the structural representation of the factors of its sparse orthogonal decomposition by Householder transformations. The storage scheme is row-oriented and is based on the structure of the upper triangular factor obtained in the decomposition. The storage of the orthogonal matrix factor is particularly efficient in that the overhead required is only m+n items, independent of the actual number of nonzeros in the
We investigate the effect of load balancing when performing Cholesky factorization on a massively parallel SIMD computer. In particular we describe a supernodal algorithm for performing sparse Cholesky factorization. ...
详细信息
We investigate the effect of load balancing when performing Cholesky factorization on a massively parallel SIMD computer. In particular we describe a supernodal algorithm for performing sparse Cholesky factorization. The way the matrix is mapped onto the processors has significant effect on its efficiency. We show that this assignment problem can be modeled as a graph coloring problem in a weighted graph. By a simple greedy algorithm, we obtain substantial speedup compared with previously suggested data mapping schemes. Experimental runs have been made on a 16K processor MasPar MP-2 parallel computer using symmetric test matrices with irregular sparsity structure. On these problems our implementation achieves performance rates of well above 200 Mflops in double precision arithmetic.
Existing sparse partial pivoting algorithms can spend asymptotically more time manipulating data structures than doing arithmetic, although they are tuned to be efficient on many large problems. We present an algorith...
详细信息
Existing sparse partial pivoting algorithms can spend asymptotically more time manipulating data structures than doing arithmetic, although they are tuned to be efficient on many large problems. We present an algorithm to factor sparse matrices by Gaussian elimination with partial pivoting in time proportional to the number of arithmetic operations.
Partitioning graphs into two or more subgraphs is a fundamental operation in computer science, with applications in large-scale graph analytics, distributed and parallel data processing, and fill-reducing orderings in...
详细信息
Partitioning graphs into two or more subgraphs is a fundamental operation in computer science, with applications in large-scale graph analytics, distributed and parallel data processing, and fill-reducing orderings in sparse matrix algorithms. Computing balanced and minimally connected subgraphs is a common pre-processing step in these areas, and must therefore be done quickly and efficiently. Since graph partitioning is NP-hard, heuristics must be used. These heuristics must balance the need to produce high quality partitions with that of providing practical performance. Traditional methods of partitioning graphs rely heavily on combinatorics, but recent developments in continuous optimization formulations have led to the development of hybrid methods that combine the best of both approaches. This work describes numerical optimization formulations for two classes of graph partitioning problems, edge cuts and vertex separators. Optimization-based formulations for each of these problems are described, and hybrid algorithms combining these optimization-based approaches with traditional combinatoric methods are presented. Efficient implementations and computational results for these algorithms are presented in a C++ graph partitioning library competitive with the state of the art. Additionally, an optimization-based approach to hypergraph partitioning is proposed.
暂无评论