We investigate fast parallel algorithms to compute normal forms of matrices and the corresponding transformations. Given a matrix B in M-n,M-n(K), where K is an arbitrary commutative field, we establish that computing...
详细信息
We investigate fast parallel algorithms to compute normal forms of matrices and the corresponding transformations. Given a matrix B in M-n,M-n(K), where K is an arbitrary commutative field, we establish that computing a similarity transformation P such that F = P-1 BP is in Frobenius normal form can be done in NCK2. Using a reduction to this first problem, a similar fact is then proved for the Smith normal form S(x) of a polynomial matrix A(x) in M-n,M-m(K[x]);to compute unimodular matrices U(x) and V(x) such that S(x) = U(x)A(x)V(x) can be done in NCK2. We get that over concrete fields such as the rationals, these problems are in NC2. Using our previous results we have thus established that the problems of computing transformations over a field extension for the Jordan normal form, and transformations over the input field for the Frobenius and the Smith normal form are all in NCK2. As a corollary we establish a polynomial-time sequential algorithm to compute transformations for the Smith form over K[x].
An efficient algorithm for implementing the finite-element time-domain (FETD) method on parallel computers is presented. An unconditionally stable implicit FETD algorithm is combined with the finite-element tearing an...
详细信息
An efficient algorithm for implementing the finite-element time-domain (FETD) method on parallel computers is presented. An unconditionally stable implicit FETD algorithm is combined with the finite-element tearing and interconnecting (FETI) method. This domain decomposition algorithm converges at a rate dependent solely on the subdomain size, resulting in a highly scalable algorithm. (C) 1997 John Wiley & Sons, Inc.
In this paper we present a new parallel algorithm for computing the LL(T) decomposition of real symmetric positive-definite tridiagonal matrices. The algorithm consists of a preprocessing and a factoring stage. In the...
详细信息
In this paper we present a new parallel algorithm for computing the LL(T) decomposition of real symmetric positive-definite tridiagonal matrices. The algorithm consists of a preprocessing and a factoring stage. In the preprocessing stage it determines a rank-(p - 1) correction to the original matrix (p,= number of processors) by precomputing selected components x(k) of the L factor, k = 1,...,p - 1. In the factoring stage it performs independent factorizations of p matrices of order n/p. The algorithm is especially suited for machines with both vector and processor parallelism, as confirmed by the experiments carried out on a Connection Machine CM5 with 32 nodes. Let <(x)over cap(k)>, and <(x)over cap'(k)> denote the components computed in the preprocessing stage and the corresponding values (re)computed in the factorization stage, respectively. Assuming that is small, k = 1,...,p-1, we are able to prove that the algorithm is stable in the backward sense. The above assumption is justified both experimentally and theoretically. In fact, we have found experimentally that is small even for ill-conditioned matrices, and we have proven by an a priori analysis that the above ratios are small provided that preprocessing is performed with suitably larger precision.
We present a high-order parallel algorithm, which requires only the minimum interprocessor communication dictated by the physical nature of the problem at hand. The parallelization is achieved by domain decomposition....
详细信息
This paper presents an efficient parallel algorithm for computing the mutual range-join of N sets of numbers on shared-nothing hypercube computers. The algorithm iteratively joins each set to the mutual range-join of ...
详细信息
ISBN:
(纸本)0780342291
This paper presents an efficient parallel algorithm for computing the mutual range-join of N sets of numbers on shared-nothing hypercube computers. The algorithm iteratively joins each set to the mutual range-join of the preceding sets. Each join is performed on all processors of the hypercube in parallel. The algorithm uses a global sorting method to distribute the elements of the first set evenly across all processors in increasing order, a new data balancing technique to distribute the elements of subsequent sets to match the intermediate set at each processor and to compensate for join skew, and a new efficient local range-join procedure. We analyse the performance of this algorithm and demonstrate that it improves on the best previously published algorithm for this problem when the join selectivity factor is small. The method can also be applied to similar problems such as band-join and equi-join.
There is at present a worldwide effort to overcome the technological barriers to nanoelectronics. Microscopic simulation can significantly enhance our understanding of the physics of nanoscale structures, and constitu...
详细信息
There is at present a worldwide effort to overcome the technological barriers to nanoelectronics. Microscopic simulation can significantly enhance our understanding of the physics of nanoscale structures, and constitutes a valuable tool for designing nanoelectronic functional devices. In nanodevices, novel physics effects are used to attain logic functionality which conventional technology can not achieve. Therefore it is necessary to develop quantum-transport simulation methods which include novel physical effects. Moreover, Simulation of realistic nanodevices require enormous computing resource, necessitating parallel supercomputing. In this paper, we present massively parallel algorithms for simulating large-scale nanoelectronic networks based on the single-electron tunneling effect, which is arguably the quantum effect of greatest significance to nanoelectronic technology. A MIMD implementation of our simulation algorithm is carried out on a 64-processor nCUBE 2, and a SIMD implementation is carried out on a 16,384-processor MasPar MP-1. By exploiting massive parallelism, both parallel implementations achieve very high parallel efficiency and nearly linear scalability. The result of this work is that we are able to simulate large-scale nanoelectronic network, within a reasonable time period, which would be impractical on conventional workstations.
For unrooted and cyclically ordered trees (CO-tree), the distance based on the ancestor-descendant relation-preserving mapping (TD), the distance based on the structure-preserving mapping (SPD), and the distance based...
详细信息
For unrooted and cyclically ordered trees (CO-tree), the distance based on the ancestor-descendant relation-preserving mapping (TD), the distance based on the structure-preserving mapping (SPD), and the distance based on the strongly structure-preserving mapping (SSPD) have been defined and their computing methods have been proposed. Let T-a and T-b be CO-trees. Let the numbers of their vertices be N-a and N-b, and the numbers of their leaves be L-a and L-b respectively. Let. the maximum orders of the vertices of T-a and T-b be m(a) and m(b), respectively. In this paper, improved serial computing methods for TD and SPD are presented. The time complexities of those methods are O(NaNbLaLb) and O(m(a)N(a)N(b)L(b)), respectively. Based on the improved serial computing methods for TD and SPD, the parallel computing methods for TD, SPD, and SSPD are proposed. The time complexities of those methods are O(max{N-a, N-b}) The numbers of required processors are O(NaLaLb), O(m(a)N(a)L(b)), and O(m(a)m(b)max{N-a,N-b}), respectively. Each of those parallel methods is optimal in the sense that the cost is equal to the time complexity of the serial computing method. (C) 1998 Scripta Technica.
In this paper, we propose a new parallel bidirectional algorithm, based on Cholesky factorization, for the solution of sparse symmetric system of linear equations. Unlike the existing algorithms, the numerical factori...
详细信息
In this paper, we propose a new parallel bidirectional algorithm, based on Cholesky factorization, for the solution of sparse symmetric system of linear equations. Unlike the existing algorithms, the numerical factorization phase of our algorithm is carried out in such a manner that the entire back substitution component of the substitution phase is replaced by a single step division. Since there is a substantial reduction in the time taken by the repeated execution of the substitution phase, our algorithm is particularly suited for the solution of systems with multiple b-vectors. The effectiveness of our algorithm is demonstrated by comparing it with the existing parallel algorithm, based on Cholesky factorization, using extensive simulation studies on two-dimensional problems discretized by FEM.
In this paper, we present two parallel multiplicative algorithms for convex programming. If the objective function has compact level sets and has a locally Lipschitz continuous gradient, we discuss convergence of the ...
详细信息
In this paper, we present two parallel multiplicative algorithms for convex programming. If the objective function has compact level sets and has a locally Lipschitz continuous gradient, we discuss convergence of the algorithms. The proofs are essentially based on the results of sequential methods shown by Eggermontt[1].
This paper presents an algorithm for the Gaussian elimination problem that reduces the length of the critical path compared to the algorithm of Lord et al. This is done by redefining the notion of a task. For all prac...
详细信息
This paper presents an algorithm for the Gaussian elimination problem that reduces the length of the critical path compared to the algorithm of Lord et al. This is done by redefining the notion of a task. For all practical purposes, the issues of communication overhead and pivoting cannot be overlooked. We consider these issues for the new algorithm as well. Timing results of this algorithm as executed on the CM-2 model of the Connection Machine are presented. Another contribution of this paper is the use of logical pivoting for stable computation of the Gaussian elimination algorithm. Pivoting is essential is producing stable results. When pivoting occurs, an interchange of two rows is required. A physical interchange of the values can be avoided by providing a permutation vector in a globally accessible location. We show experimental results that substantiate the use of logical pivoting.
暂无评论