作者:
BARON, INYU
COURANT INST MATH SCINEW YORKNY 10012
We give a practical parallel algorithm for solving band symmetric positive definite systems of linear equations in O(m* log n) time using nm/log n processors. Here n denotes the system size and m its bandwidth. Hence,...
详细信息
We give a practical parallel algorithm for solving band symmetric positive definite systems of linear equations in O(m* log n) time using nm/log n processors. Here n denotes the system size and m its bandwidth. Hence, the algorithm is efficient. For tridiagonal systems, the algorithm runs in O(log n) time using n/log n processors. Furthermore, an improved version runs in O(log m log n) time using nm2/(log m log n) processors.
This paper presents some practical ways of using polynomial preconditions for solving large sparse linear systems of equations issued from discretizations of partial differential equations. For a symmetric positive de...
详细信息
This paper presents some practical ways of using polynomial preconditions for solving large sparse linear systems of equations issued from discretizations of partial differential equations. For a symmetric positive definite matrix A these techniques are based on least squares polynomials on the interval $[0,b]$ where b is the Gershgorin estimate of the largest eigenvalue. Therefore, as opposed to previous work in the field, there is no need for computing eigenvalues of A. We formulate a version of the conjugate gradient algorithm that is more suitable for parallel architectures and discuss the advantages of polynomial preconditioning in the context of these architectures.
Coarse grain message passing and shared memory algorithms for solving the quasi-triangular Sylvester equation are discussed. The basic algorithm is of block type, i.e., rich in matrix-matrix operations. The focus is o...
详细信息
Coarse grain message passing and shared memory algorithms for solving the quasi-triangular Sylvester equation are discussed. The basic algorithm is of block type, i.e., rich in matrix-matrix operations. The focus is on computing reliable estimates of the sep-1 function (a natural condition number for the Sylvester equation and the invariant subspace problem). Estimators based on the Frobenius norm and the 1-norm, respectively, are presented. Accuracy, efficiency, and reliability results are presented. The applicability of the estimators to both the shared memory and distributed memory paradigms are discussed. Some performance results of the parallel block algorithms with condition estimators are also presented. The reliability of both estimators are very good. The Frobenius norm-based estimator is much more efficient in both sequential and parallel settings (on average between four to five times). Further, it is applicable to both the standard and generalized problems.
There is a group of problems that require big amount of computing power to solve. Computer grids allow building effective computing platforms at relatively low cost. It is expected that algorithms like Genetic Algorit...
详细信息
There is a group of problems that require big amount of computing power to solve. Computer grids allow building effective computing platforms at relatively low cost. It is expected that algorithms like Genetic Algorithm will perform well on the grid. In this paper, grid implementation of multiobjective distributed genetic algorithm is proposed. A distributed version of the algorithm is based on a modified island algorithm where genetic data exchange is replaced by introduced new Forgetting Island Elitism. The algorithm is applied to booster station allocation in Chojnice water distribution system.
parallel Jacobi-like algorithms are presented for computing a singular-value decomposition of an $m \times n$ matrix $(m \geqq n)$ and an eigenvalue decomposition of an $n \times n$ symmetric matrix. A linear array of...
详细信息
parallel Jacobi-like algorithms are presented for computing a singular-value decomposition of an $m \times n$ matrix $(m \geqq n)$ and an eigenvalue decomposition of an $n \times n$ symmetric matrix. A linear array of $O(n)$ processors is proposed for the singular-value problem; the associated algorithm requires time $O(mnS)$, where S is the number of sweeps (typically $S \leqq 10$). A square array of $O(n^2 )$ processors with nearest-neighbor communication is proposed for the eigenvalue problem; the associated algorithm requires time $O(nS)$.
This paper develops locally adapted hierarchical basis functions for effectively preconditioning large optimization problems that arise in computer graphics applications such as tone mapping, gradient-domain blending,...
详细信息
This paper develops locally adapted hierarchical basis functions for effectively preconditioning large optimization problems that arise in computer graphics applications such as tone mapping, gradient-domain blending, colorization, and scattered data interpolation. By looking at the local structure of the coefficient matrix and performing a recursive set of variable eliminations, combined with a simplification of the resulting coarse level problems, we obtain bases better suited for problems with inhomogeneous ( spatially varying) data, smoothness, and boundary constraints. Our approach removes the need to heuristically adjust the optimal number of preconditioning levels, significantly outperforms previously proposed approaches, and also maps cleanly onto data-parallel architectures such as modern GPUs.
A unified framework is presented for a fully parallel solution of large, sparse nonsymmetric linear systems on distributed memory multiprocessors. Unlike earlier work, both symbolic and numeric steps are parallelized....
详细信息
A unified framework is presented for a fully parallel solution of large, sparse nonsymmetric linear systems on distributed memory multiprocessors. Unlike earlier work, both symbolic and numeric steps are parallelized. parallel Cartesian nested dissection is used to compute a fill-reducing ordering of A using a compact representation of the column intersection graph, and the resulting separator tree is used to estimate the structure of the factor and to distribute data and perform, multifrontal numeric computations. When the matrix is nonsymmetric but square, the numeric computations involve Gaussian elimination with partial pivoting;when the matrix is overdetermined, row-oriented Householder transforms are applied to compute the triangular factor of an orthogonal factorization. Extensive empirical results are provided to demonstrate that the approach is effective both in preserving sparsity and achieving good parallel performance on an Intel iPSC/860.
We have employed evolutionary computation to solve the optimization problem of sensor deployment in battlefield environments. A genetic algorithm has the advantage of delivering results of a higher quality than simple...
详细信息
We have employed evolutionary computation to solve the optimization problem of sensor deployment in battlefield environments. A genetic algorithm has the advantage of delivering results of a higher quality than simple computational algorithms, but it has the drawback of requiring too much computing time. This study aimed not only to shorten the computing time to as close to real-time as possible by using the Compute Unified Device Architecture (CUDA) but also to maintain a solution quality that is as good as or better than the case when the proposed algorithm is not used. In the proposed genetic algorithm, parallelization was applied to speed up the fitness evaluation requiring heavy computation time. The proposed CUDA-based design approach for complex and various sensor deployments is validated by means of simulation. We parallelized two parts in Monte Carlo simulation for the fitness evaluation: moving lots of tested vehicles and calculating the probability of detection (POD) for each vehicle. The experiment was divided into CPU and GPU experiments depending on arithmetic unit types. In the GPU experiment, the results showed similar levels for the detection probability by GPU and CPU, and the computing time decreased by approximately 55-56 times.
In this paper, we give two algorithms for the 1-1 routing problems on a mesh-connected computer. The first algorithm, with queue size 28, solves the 1-1 routing problem on an n × n mesh-connected computer in 2n +...
详细信息
In this paper, we give two algorithms for the 1-1 routing problems on a mesh-connected computer. The first algorithm, with queue size 28, solves the 1-1 routing problem on an n × n mesh-connected computer in 2n + O(1) steps. This improves the previous queue size of 75. The second algorithm solves the 1-1 routing problem in 2n - 2 steps with queue size 12ts/s where ts is the time for sorting an s × s mesh into a row major order for all s ≥ 1. This result improves the previous queue size 18.67ts/s.
We propose a multiprocessor structure for solving a dense system of n linear equations. The solution is obtained in two stages. First, the matrix of coefficients is reduced to upper triangular form via Givens rotation...
详细信息
We propose a multiprocessor structure for solving a dense system of n linear equations. The solution is obtained in two stages. First, the matrix of coefficients is reduced to upper triangular form via Givens rotations. Second, a back substitution process is applied to the triangular system. A two-dimensional array of $\theta (n^2 )$ processors is employed to implement the first step, and (using a previously known scheme) a one-dimensional array of $\theta (n)$ processors is employed to implement the second step. These processor arrays allow both stages to be carried out in time $O(n)$, and they are well suited for VLSI implementation as identical processors with a simple and regular interconnection pattern are required.
暂无评论