A new tridiagonal Toeplitz linear system (TTLS) solver is proposed. The solver first decomposes an n-dimensional strictly diagonally dominant TTLS equation into a number of m-dimensional subsystems employing a modifie...
详细信息
A new tridiagonal Toeplitz linear system (TTLS) solver is proposed. The solver first decomposes an n-dimensional strictly diagonally dominant TTLS equation into a number of m-dimensional subsystems employing a modified Gaussian elimination method. An analytic solution of a continued fraction is obtained to derive the solver. The solver based on the modified Gaussian elimination method fully exploits parallelism. Computation and communication complexities of the proposed algorithm are all shown to be O(n/m).
In this paper, we discuss the problem of finding all the regions, which are congruent to a testing region, in a planar figure. A parallel algorithm is developed to find all of these regions. This algorithm takes O( n ...
详细信息
In this paper, we discuss the problem of finding all the regions, which are congruent to a testing region, in a planar figure. A parallel algorithm is developed to find all of these regions. This algorithm takes O( n 2 ) computation time and needs O( m ) processors where n and m are the numbers of edges of the input planar figure and the testing region respectively.
Clusters of workstations connected by local area networks are in common use in many organizations. The combined processing power of these clusters is rarely exploited owing to the lack of suitable parallel algorithms....
详细信息
Clusters of workstations connected by local area networks are in common use in many organizations. The combined processing power of these clusters is rarely exploited owing to the lack of suitable parallel algorithms. The paper describes a parallel programming paradigm called supervisor-worker, suitable for the workstation environment, which can be used to speed up the execution of a large class of existing sequential programs. Simple formulae are developed to predict the speed-up of a parallel algorithm developed in this way. The predictions depend on two easily-determined parameters of the sequential program and the characteristic communication cost of the workstation cluster. Consequently, it is possible to estimate the benefits of the parallel program before proceeding with detailed implementation. As an example, the parallel version of a travelling salesman program is developed and the measured speed-up compared with the predicted speed-up.
Search of discrete spaces is important in combinatorial optimization. Such problems arise in artificial intelligence, computer vision, operations research, and other areas. For realistic problems, the search spaces to...
详细信息
Search of discrete spaces is important in combinatorial optimization. Such problems arise in artificial intelligence, computer vision, operations research, and other areas. For realistic problems, the search spaces to be processed are usually huge, necessitating long computation times, pruning heuristics, or massively parallel processing. We present an algorithm that reduces the computation time for graph matching by employing both branch-and-bound pruning of the search tree and massively-parallel search of the as-yet-unpruned portions of the space. Most research on parallel search has assumed that a multiple-instruction-stream/multiple-data-stream (MIMD) parallel computer is available. Since massively parallel single-instruction-stream/multiple-data-stream (SIMD) computers are much less expensive than MIMD systems with equal numbers of processors, the question arises as to whether SIMD systems can efficiently handle state-space search problems. We demonstrate that the answer is yes, and in particular, that graph matching has a natural acid efficient implementation on SIMD machines.
We present a parallel algorithm for the score-based optimal structure search of Bayesian networks. This algorithm is based on a dynamic programming (DP) algorithm having O(n . 2(n)) time and space complexity, which is...
详细信息
We present a parallel algorithm for the score-based optimal structure search of Bayesian networks. This algorithm is based on a dynamic programming (DP) algorithm having O(n . 2(n)) time and space complexity, which is known to be the fastest algorithm for the optimal structure search of networks with n nodes. The bottleneck of the problem is the memory requirement, and therefore, the algorithm is currently applicable for up to a few tens of nodes. While the recently proposed algorithm overcomes this limitation by a space-time trade-off, our proposed algorithm realizes direct parallelization of the original DP algorithm with O(n(sigma)) time and space overhead calculations, where sigma > 0 controls the communication-space trade-off. The overall time and space complexity is O(n(sigma+1)2(n)). This algorithm splits the search space so that the required communication between independent calculations is minimal. Because of this advantage, our algorithm can run on distributed memory supercomputers. Through computational experiments, we confirmed that our algorithm can run in parallel using up to 256 processors with a parallelization efficiency of 0.74, compared to the original DP algorithm with a single processor. We also demonstrate optimal structure search for a 32-node network without any constraints, which is the largest network search presented in literature.
A parallel algorithm for efficient calculation of the second derivatives (Hessian) of the conformational energy in internal coordinates is proposed, This parallel algorithm is based on the master/slave model. A master...
详细信息
A parallel algorithm for efficient calculation of the second derivatives (Hessian) of the conformational energy in internal coordinates is proposed, This parallel algorithm is based on the master/slave model. A master processor distributes the calculations of components of the Hessian to one or more slave processors that, after finishing their calculations, send the results to the master processor that assembles all the components of the Hessian. Our previously developed molecular analysis system for conformational energy optimization, norm,al mode analysis, and Monte Carlo simulation for internal coordinates is extended to use this parallel algorithm for Hessian calculation on a massively parallel computer. The implementation of our algorithm uses the message passing Interface and works effectively on both distributed-memory parallel computers and shared-memory parallel computers. We applied this system to the Newton-Raphson energy optimization of the structures of glutaminyl transfer RNA (Gln-tRNA) with 74 nucleotides and glutaminyl-tRNA synthetase (GlnRS) with 540 residues to analyze the performance of our system. The parallel speedups for the Hessian calculation were 6.8 for Gln-tRNA with 24 processors and 11.2 for GlnRS with 54 processors. The parallel speedups for the Newton-Raphson optimization were 6.3 for Gln-tRNA with 30 processors and 12.0 for GlnRS with 62 processors. (C) 1998 John Wiley & Sons, Inc.
The n-star graph, denoted by S(n), is one of the graph networks that have been recently proposed as attractive alternatives to the n-cube topology for interconnecting processors in parallel computers. In this paper, w...
详细信息
The n-star graph, denoted by S(n), is one of the graph networks that have been recently proposed as attractive alternatives to the n-cube topology for interconnecting processors in parallel computers. In this paper, we present a parallel algorithm for the computation of the Fourier transform on the star graph. The algorithm requires O(n2)multiply-add steps for an input sequence of n! elements, and is hence cost-optimal with respect to the sequential algorithm on which it is based. To the best of our knowledge, this is the first algorithm, and the only one to date, for the computation of the Fourier transform on the star graph.
This paper presents the implementation details of a parallel algorithm on graphics processing units (GPUs) to compute the optimal switching angles for the harmonic minimization in multilevel inverters with unequal dc ...
详细信息
This paper presents the implementation details of a parallel algorithm on graphics processing units (GPUs) to compute the optimal switching angles for the harmonic minimization in multilevel inverters with unequal dc voltage sources. Two algorithms, the Newton- Raphson method and the bisection method, and three different parallel implementations are investigated. Both algorithms considered have a low time complexity and offer a superior converging rate allowing for the real- time control of inverters with a very large number of levels. By exploiting the massively parallel architecture of GPUs, the execution time of the program is reduced significantly. The proposed parallel implementation offers a maximum speedup of 534x compared with a sequential execution on CPU, and allows for the calculation of the optimal switching angles for inverters with up to 1000 dc sources in less than 16.4 mu s.
The solution of special linear, circulant-tridiagonal systems is considered. In this paper, a fast parallel algorithm for solving the special tridiagonal systems, which includes the skew-symmetric and tridiagonal-Toep...
详细信息
The solution of special linear, circulant-tridiagonal systems is considered. In this paper, a fast parallel algorithm for solving the special tridiagonal systems, which includes the skew-symmetric and tridiagonal-Toeplitz systems, is presented. Employing the diagonally dominant property, our parallel solver does need only local communications between adjacent processors on a ring network. An error analysis is also given. On the nCUBE/2E multiprocessors, some experimental results demonstrate the good performance of our stable parallel solver.
The computation of the probability of the top event or minimal cut sets of fault trees is known as intractable NP-hard problems. Modularization can be used to reduce the computational cost of basic operations on fault...
详细信息
The computation of the probability of the top event or minimal cut sets of fault trees is known as intractable NP-hard problems. Modularization can be used to reduce the computational cost of basic operations on fault trees efficiently. The idea of the linear time algorithm, as a very efficient and compact modules detecting algorithm, is visiting the nodes one by one with top-down depth-first left-most traversal of the tree. So the efficiency of the linear time algorithm is limited by nodes visiting time successively and serially, especially when confronting large-scale fault trees. Aiming at improving the efficiency of modularizing large-scale fault trees, this paper proposes a new parallel method to find all possible modules. Firstly, we transform the fault tree into a directed acyclic graph (DAG) and treat the terminal basic nodes as entries of the algorithm. And then, according to the proposed rules in this paper, we traverse the graph bottom-up from the terminal nodes and mark the internal nodes in a parallel way. Therefore, we can compare all internal nodes and decide which nodes are modules. Eventually, an experiment is carried out to compare the linear and parallel algorithm, and the result shows that the proposed parallel algorithm is efficient on handling large-scale fault trees. (C) 2015 Elsevier Ltd. All rights reserved.
暂无评论