In solving application problems, many largesscale nonlinear systems of equations result in sparse Jacobian matrices. Such nonlinear systems are called sparse nonlinear systems. The irregularity of the locations of non...
详细信息
In solving application problems, many largesscale nonlinear systems of equations result in sparse Jacobian matrices. Such nonlinear systems are called sparse nonlinear systems. The irregularity of the locations of nonzero elements of a general sparse matrix makes it very difficult to generally map sparse matrix computations to multiprocessors for parallel processing in a well balanced manner. To overcome this difficulty, we define a new storage scheme for general sparse matrices in this paper. With the new storage scheme, we develop parallel algorithms to solve large-scale general sparse systems of equations by interval Newton/Generalized bisection methods which reliably find all numerical solutions within a given *** Section 1, we provide an introduction to the addressed problem and the interval Newton's methods. In Section 2, some currently used storage schemes for sparse sys-terns are reviewed. In Section 3, new index schemes to store general sparse matrices are reported. In Section 4, we present a parallel algorithm to evaluate a general sparse Jarobian matrix. In Section 5, we present a parallel algorithm to solve the correspond-ing interval linear 8ystem by the all-row preconditioned scheme. Conclusions and future work are discussed in Section 6.
Distance transform is extensively used in image processing, such as expanding, shrinking, thinning, computing shape factor, etc. There are many approximate Euclidean distance transform (EDT) algorithms in the literatu...
详细信息
Distance transform is extensively used in image processing, such as expanding, shrinking, thinning, computing shape factor, etc. There are many approximate Euclidean distance transform (EDT) algorithms in the literature, but finding the exact Euclidean distance transform (EDT) with respect to the Euclidean distance metric is better in various application fields. Unless the digital image is very small, it is rather time consuming to find the exact Euclidean distance transform of an image. So, it is important to improve the computing speed. In this paper we study the parallel computation of the exact Euclidean distance transform for two parallel architectures: the EREW PRAM model and the SIMD hypercube computer. The parallel algorithm is given for the computation of exact Euclidean distance transform for all pixels with respect to black pixels in an N X N black and white image. The running time is O(log(2) N) both in the EREW PRAM model and the hypercube computer with N X N processors.
Many numerical simulation problems of natural phenomena are formulated by large tridiagonal and block tridiagonal linear systems. In this paper, an efficient parallel algorithm to solve a tridiagonal linear system is ...
详细信息
Many numerical simulation problems of natural phenomena are formulated by large tridiagonal and block tridiagonal linear systems. In this paper, an efficient parallel algorithm to solve a tridiagonal linear system is proposed. The algorithm named bi-recurrence algorithm has an inherent parallelism which is suitable for parallel processing. Its time complexity is 8N-4 for a tridiagonal linear system of order N. The complexity is little more than the Gaussian elimination algorithm. For parallel implementation with two processors, the time complexity is 4N-1. Based on the bi-recurrence algorithm, a VLSI oriented tridiagonal solver is designed, which has an architecture of 1-D linear systolic array with three processing cells. The systolic tridiagonal solver completes finding the solution of a tridiagonal linear system in 3N+6 units of time. A highly parallel systolic tridiagonal solver is also presented. The solver is characterized by highly parallel computability which originates in the divide-and-conquer strategy and high cost performance which originates in the systolic architecture. This solver completes finding the solution in 10(N/p) + 6p + 23 time units, where p is the number of partitions of the system.
This paper presents a parallel move generation of a Chess machine system for achieving the purpose of reducing the number of move generation cycles. The parallel system is composed of five move generation modules whic...
详细信息
This paper presents a parallel move generation of a Chess machine system for achieving the purpose of reducing the number of move generation cycles. The parallel system is composed of five move generation modules which share the move generating cycles to reduce the time of building a game tree. Simulation results show that the proposed parallel move generation architecture takes about half of the number of move generation cycles to build a game tree that is the same as the one built by a sequential move generation module.
Simulated annealing is a general-purpose optimization technique capable of finding an optimal or near-optimal solution in various applications. However, the long execution time required for a good quality solution has...
详细信息
Simulated annealing is a general-purpose optimization technique capable of finding an optimal or near-optimal solution in various applications. However, the long execution time required for a good quality solution has been a major drawback in practice. Extensive studies have been carried out to develop parallel algorithms for simulated annealing. Most of them were not very successful, mainly because multiple processing elements (PEs) were required to follow a single Markov chain and, therefore, only a limited parallelism was exploited. In this paper, we propose new parallel simulated annealing algorithms which allow multiple Markov chains to be traced simultaneously by PEs which may communicate with each other. We have considered both synchronous and asynchronous implementations of the algorithms. Their performance has been analyzed in detail and also verified by extensive experimental results. It has been shown that for graph partitioning the proposed parallel simulated annealing schemes can find a solution of equivalent (or even better) quality up to an order of magnitude faster than the conventional parallel schemes. Among the proposed schemes, the one where PEs exchange information dynamically (not with a fixed period) performs best.
The parallel version of precondition iterative techniques is developed for matrices arising from the panel boundary element method for three-dimensional simple connected domains with Dirichlet boundary conditions. Res...
详细信息
The parallel version of precondition iterative techniques is developed for matrices arising from the panel boundary element method for three-dimensional simple connected domains with Dirichlet boundary conditions. Results were obtained on an nCube-2 parallel computer showing that preconditoned iterative methods are very well suited also in three-dimensional cases for implementation on an MIMD computer and that they are much more efficient than usual direct solution techniques.
Discrete-time optimal control (DTOC) problems are potentially large, highly structured optimization problems. In such a problem there are N + 1 stages, or time steps;corresponding to time step i are a vector of contro...
详细信息
Discrete-time optimal control (DTOC) problems are potentially large, highly structured optimization problems. In such a problem there are N + 1 stages, or time steps;corresponding to time step i are a vector of control variables u(i) and a vector of state variables x(i). When N is large relative to the dimension of each u(i) and x(i), efficient methods for unconstrained DTOC problems have long been known. We propose a parallel method for unconstrained DTOC problems via stagewise decomposition: given J > 1 processors and I-1 = 1 < I-2 < ... < I-J+1 = N + 1, J subproblems are formed by grouping stages 1 to I-2 - 1, I-2 to I-3 - 1, and so on. The subproblems are independent except fur separator variables, x(I2), ..., x(IJ), which are treated specially to retain the quadratic local convergence typical of serial DTOC algorithms. Computational examples using the Intel Paragon OSF/1 computer are given.
A unified parallel algorithm for grid adaptation by local refinement/coarsening is presented. It is designed to be independent from the type of the grid. This is achieved by employing a generic data template that can ...
详细信息
A unified parallel algorithm for grid adaptation by local refinement/coarsening is presented. It is designed to be independent from the type of the grid. This is achieved by employing a generic data template that can be configured to capture the data structures for any computational grid regardless of structure and dimensionality. Furthermore, the algorithm itself is specified in terms of generic parallel primitives that are completely independent of the underlying parallel architecture. The unified parallel algorithm is employed for dynamic adaptation of three-dimensional unstructured tetrahedral grids on a partitioned memory multiple-instruction multiple-data architecture. Performance results are presented for the Intel iPSC/860.
Permutation generation is an important problem in combinatorial computing. In this paper we present an optimal parallel algorithm to generate all N! permutations of N objects. The algorithm is designed to be executed ...
详细信息
Permutation generation is an important problem in combinatorial computing. In this paper we present an optimal parallel algorithm to generate all N! permutations of N objects. The algorithm is designed to be executed on a very simple computation model that is a linear array with N identical processors. Because of the simplicity and regularity of the processors, the model is very suitable for VLSI implementation. Another advantageous characteristic of this design is that it can generate all the permutations in minimal change order.
In this paper, we improve the bounds for computing a network decomposition distributively and deterministically. Our algorithm computes an (n(epsilon(n)), n(epsilon(n)))-decomposition in n(O(epsilon(n))) time, where e...
详细信息
In this paper, we improve the bounds for computing a network decomposition distributively and deterministically. Our algorithm computes an (n(epsilon(n)), n(epsilon(n)))-decomposition in n(O(epsilon(n))) time, where epsilon(n) = 1/root log n. As a corollary we obtain improved deterministic bounds for distributively computing several graph structures such as maximal independent sets and Delta-vertrx colorings. We also show that the class of graphs G whose maximum degree is n(O(delta(n))) where delta(n) = 1/log log n is complete for the task of computing a near-optimal decomposition, i.e., a (log n,log n)-decomposition, in polylog(n) time. This is a corollary of a more general characterization, which pinpoints the weak points of existing network decomposition algorithms. Completeness is to be intended in the following sense: if we have an algorithm A that computes a near-optimal decomposition in polylog(n) time for graphs in G, then we can compute a near-optimal decomposition in polylog(n) time for all graphs. (C) 1996 Academic Press, Inc.
暂无评论