A unified framework is presented for a fully parallel solution of large, sparse nonsymmetric linear systems on distributed memory multiprocessors. Unlike earlier work, both symbolic and numeric steps are parallelized....
详细信息
A unified framework is presented for a fully parallel solution of large, sparse nonsymmetric linear systems on distributed memory multiprocessors. Unlike earlier work, both symbolic and numeric steps are parallelized. parallel Cartesian nested dissection is used to compute a fill-reducing ordering of A using a compact representation of the column intersection graph, and the resulting separator tree is used to estimate the structure of the factor and to distribute data and perform, multifrontal numeric computations. When the matrix is nonsymmetric but square, the numeric computations involve Gaussian elimination with partial pivoting;when the matrix is overdetermined, row-oriented Householder transforms are applied to compute the triangular factor of an orthogonal factorization. Extensive empirical results are provided to demonstrate that the approach is effective both in preserving sparsity and achieving good parallel performance on an Intel iPSC/860.
A new parallel normalized explicit preconditioned conjugate gradient method in conjunction with normalized approximate inverse matrix techniques is presented for solving efficiently sparse linear systems on multi-comp...
详细信息
A new parallel normalized explicit preconditioned conjugate gradient method in conjunction with normalized approximate inverse matrix techniques is presented for solving efficiently sparse linear systems on multi-computer systems. Application of the proposed method on a three dimensional boundary value problem is discussed and numerical results are given. The implementation and performance on a distributed, memory MIMD machine, using message passing interface (MPI) is also investigated.
We have employed evolutionary computation to solve the optimization problem of sensor deployment in battlefield environments. A genetic algorithm has the advantage of delivering results of a higher quality than simple...
详细信息
We have employed evolutionary computation to solve the optimization problem of sensor deployment in battlefield environments. A genetic algorithm has the advantage of delivering results of a higher quality than simple computational algorithms, but it has the drawback of requiring too much computing time. This study aimed not only to shorten the computing time to as close to real-time as possible by using the Compute Unified Device Architecture (CUDA) but also to maintain a solution quality that is as good as or better than the case when the proposed algorithm is not used. In the proposed genetic algorithm, parallelization was applied to speed up the fitness evaluation requiring heavy computation time. The proposed CUDA-based design approach for complex and various sensor deployments is validated by means of simulation. We parallelized two parts in Monte Carlo simulation for the fitness evaluation: moving lots of tested vehicles and calculating the probability of detection (POD) for each vehicle. The experiment was divided into CPU and GPU experiments depending on arithmetic unit types. In the GPU experiment, the results showed similar levels for the detection probability by GPU and CPU, and the computing time decreased by approximately 55-56 times.
In this paper, we give two algorithms for the 1-1 routing problems on a mesh-connected computer. The first algorithm, with queue size 28, solves the 1-1 routing problem on an n × n mesh-connected computer in 2n +...
详细信息
In this paper, we give two algorithms for the 1-1 routing problems on a mesh-connected computer. The first algorithm, with queue size 28, solves the 1-1 routing problem on an n × n mesh-connected computer in 2n + O(1) steps. This improves the previous queue size of 75. The second algorithm solves the 1-1 routing problem in 2n - 2 steps with queue size 12ts/s where ts is the time for sorting an s × s mesh into a row major order for all s ≥ 1. This result improves the previous queue size 18.67ts/s.
We propose a multiprocessor structure for solving a dense system of n linear equations. The solution is obtained in two stages. First, the matrix of coefficients is reduced to upper triangular form via Givens rotation...
详细信息
We propose a multiprocessor structure for solving a dense system of n linear equations. The solution is obtained in two stages. First, the matrix of coefficients is reduced to upper triangular form via Givens rotations. Second, a back substitution process is applied to the triangular system. A two-dimensional array of θ(n2
The use of high-performance libraries for dense linear algebra operations is of great importance in many numerical scientific applications. The most common operations form the backbone of the Basic Linear Algebra Subr...
详细信息
The use of high-performance libraries for dense linear algebra operations is of great importance in many numerical scientific applications. The most common operations form the backbone of the Basic Linear Algebra Subroutines (BLAS) library. In this paper, we consider the performance and auto-tuning of level 1 and level 2 BLAS routines on graphical processing units. As examples, we develop single-precision Compute Unified Device Architecture kernels for three of the most popular operations, the Euclidian norm (SNRM2), the matrixvector multiplication (SGEMV), and the triangular solution (STRSV). The target hardware is the most recent Nvidia (Santa Clara, CA, USA) Tesla 20-series (Fermi architecture), which is designed from the ground up for high-performance computing. We show that it is essentially a matter of fully utilizing the fine-grained parallelism of the many-core graphical processing unit to achieve high performance for level 1 and level 2 BLAS operations. We show that auto-tuning can be successfully employed to kernels for these operations so that they perform well for all input sizes. Copyright (c) 2012 John Wiley & Sons, Ltd.
This paper describes an iterative method for reducing a general matrix to upper triangular form by unitary similarity transformations. The method is similar to Jacobi’s method for the symmetric eigenvalue problem in ...
详细信息
This paper describes an iterative method for reducing a general matrix to upper triangular form by unitary similarity transformations. The method is similar to Jacobi’s method for the symmetric eigenvalue problem in that it uses plane rotations to annihilate off-diagonal elements, and when the matrix is Hermitian it reduces to a variant of Jacobi’s method. Although the method cannot compete with the QR algorithm in serial implementation, it admits of a parallel implementation in which a double sweep of the matrix can be done in time proportional to the order of the matrix.
The routing problem of VLSI layout design is very compute intensive. Consequently, the routing task often turns out to be a bottleneck in the layout design of large circuits. parallel processing of the routing problem...
详细信息
The routing problem of VLSI layout design is very compute intensive. Consequently, the routing task often turns out to be a bottleneck in the layout design of large circuits. parallel processing of the routing problem holds promise for mitigating this situation. In this context, we present a parallel channel routing algorithm that is targetted to run on loosely coupled computers like hypercubes. The proposed parallel algorithm employs the simulated annealing technique for obtaining near- optimum solutions. Initially, the number of tracks in the channel is made equal to the number of nets, and partitions of the channel are appropriately assigned to the nodes of the hypercube. Each node carries out concurrent perturbations to obtain new channel states that satisfy the constraints for a given net list. The algorithm minimizes the number of tracks iteratively by using the simulated annealing technique. For efficient execution, we attempt to reduce the communication overheads by restricting the broadcast updates to cases of interprocessor net transfers only. Performance evaluation studies of the algorithm show promising results.
作者:
M. NIVATA. SAOUDIL.I.T.P.
Université Paris VII 2 Place Jussieu 75251 Paris Cedex 05 France L.I.T.P.
Université Paris VII Centre Scientifique et Polytechnique Avenue J. B. Clément 93400 Villetaneuse France
We investigate the complexity of the recognition of images generated by a class of context-free image grammars. We show that the sequential time complexity of the recognition of an n × n image as generated by a c...
详细信息
We investigate the complexity of the recognition of images generated by a class of context-free image grammars. We show that the sequential time complexity of the recognition of an n × n image as generated by a context-free grammar is O(nM(n)), where M(n) is the time to multiply two boolean n × n matrices. The space complexity of this recognition is O(n 3 ). Using a parallel random access machine (i.e. PRAM), the recognition can be done in O( log 2 (n)) time with n 7 processors or in O(n log 2 (n)) time with n 6 processors. We also introduce high dimensional context-free grammars and prove that their recognition problem is polylogarithmic.
In this paper we describe a general framework for parallel optimization based on the island model of evolutionary algorithms. The framework runs a number of optimization methods in parallel with periodic communication...
详细信息
In this paper we describe a general framework for parallel optimization based on the island model of evolutionary algorithms. The framework runs a number of optimization methods in parallel with periodic communication. In this way, it essentially creates a parallel ensemble of optimization methods. At the same time, the system contains a planner that decides which of the available optimization methods should be used to solve the given optimization problem and changes the distribution of such methods during the run of the optimization. Thus, the system effectively solves the problem of online parallel portfolio selection. The proposed system is evaluated in a number of common benchmarks with various problem encodings as well as in two real-life problems - the optimization in recommender systems and the training of neural networks for the control of electric vehicle charging.
暂无评论