A parallel multilevel fast multipole method for computing the large scale electromagnetic scattering problem is proposed and implemented in this paper. In recent years, multilevel fast multipole method (MLFMA) has bee...
详细信息
ISBN:
(纸本)078039433X
A parallel multilevel fast multipole method for computing the large scale electromagnetic scattering problem is proposed and implemented in this paper. In recent years, multilevel fast multipole method (MLFMA) has been widely used for analyzing electromagnetic scattering problem of electric large object. In order to extend the range of problem that can be solved using this algorithm, we implement a parallel fast multipole method which can run on the distributed memory computer system. This parallel algorithm is based on the Message-passing Interface (MPI). A compressed octree based parallel domain subdivision algorithm is used for efficiently avoiding the load balance problem between different CPUs which is caused by the irregular object shape. The parallel efficiency of this algorithm is demonstrated by different computing examples. We have solved electromagnetic scattering problem of many practicality military targets with more than 5,000,000 unknowns using our parallel multilevel fast multipole code on Drawing 4000A Super-Computer at Shanghai Super-Computing Center.
The task of approximate string matching is to find all locations at which a pattern string p of length m matches a substring of a text string t of length n with at most k differences. It is common to use Levenshtein d...
详细信息
ISBN:
(纸本)3540259201
The task of approximate string matching is to find all locations at which a pattern string p of length m matches a substring of a text string t of length n with at most k differences. It is common to use Levenshtein distance [5], which allows the differences to be single-character insertions, deletions, substitutions. Recently, in [3], the IndelMYE, IndelWM and IndelBYN algorithms where introduced as modified version of the bit-parallel algorithms of Myers [6], Wu&Manber [10] and Baeza-Yates&Navarro [1], respectively. These modified versions where made to support the indel distance (only single-character insertions and/or deletions are allowed). In this paper we present an improved version of IndelMYE that makes a better use of the bit-operations and runs 24.5 percent faster in practice. In the end we present a complete set of experimental results to support our findings.
Digital signal processing (DSP) has become an integral part of wireless communication systems and equivalent traditional analog systems can be developed with required fidelity at reasonable cost. In a previous work, a...
详细信息
ISBN:
(纸本)078039433X
Digital signal processing (DSP) has become an integral part of wireless communication systems and equivalent traditional analog systems can be developed with required fidelity at reasonable cost. In a previous work, a DSP technique is employed for radar upconversion using combinations of upsampling and narrow band FIR filtering. An efficient FPGA implementation of the DSP modulator is possible exploiting filter symmetry since symmetrical properties of word serial bits parallel (WSBP) FIR filters improve system throughput. However arithmetic processing rate for the WSBP symmetric models is higher than that of the non-symmetric models. In this paper we present a modified WSBP symmetrical algorithm to reduce the arithmetic processing for implementation of direct conversion ionospheric radar. In WSBP approach processing is performed in integers as block of bits. Matching and buffering criterion are used to reduce computations up to fifty percent. The algorithm can be extended to applications with similar characteristics particularly for system on chip (SOC) techniques.
In the paper the method of computation all deadlocks and traps in the Petri net is presented. This method is based on Thelen method [9] and it was proposed in [10]. Methods of calculation of all deadlocks and trap v i...
详细信息
ISBN:
(纸本)078039402X
In the paper the method of computation all deadlocks and traps in the Petri net is presented. This method is based on Thelen method [9] and it was proposed in [10]. Methods of calculation of all deadlocks and trap v in Petri nets are very time consuming. Therefore it is very important to optimize a computation. The parallel computation method for the time reduction is proposed. Experimental results of presented method are discussed, as well.
We explore the possibility of using multiple processors to improve the encoding and decoding times of Lempel-Ziv schemes. A new layout of the processors, based on a full binary tree, is suggested and it is shown how L...
详细信息
We explore the possibility of using multiple processors to improve the encoding and decoding times of Lempel-Ziv schemes. A new layout of the processors, based on a full binary tree, is suggested and it is shown how LZSS and LZW can be adapted to take advantage of such parallel architectures. The layout is then generalized to higher order trees. Experimental results show an improvement in compression over the standard method of parallelization and an improvement in time over the sequential method. (C) 2004 Elsevier B.V. All rights reserved.
While kernel support vector machines are powerful classification algorithms, their computational overhead can be significant, especially for large and high-dimensional data sets. A recent biomedical dataset, for insta...
详细信息
ISBN:
(纸本)9780898715934
While kernel support vector machines are powerful classification algorithms, their computational overhead can be significant, especially for large and high-dimensional data sets. A recent biomedical dataset, for instance, could take as long as 3 weeks to compute its RBF kernel matrix on a modern, single-processor workstation. In this paper, we develop methods for high-performance parallel computation of kernel matrices. There are two key components to a parallel implementation: distribution of the computation across nodes and communication to combine the results. To address the first, we employ a dimension-wise data partition that yields efficient computation and low communication overhead during the initial phase. This partition provides dramatic speedups on large and high-dimensional data, applies to a wide variety of kernel functions, and is an exact computation, producing the same kernel matrix as its sequential implementation. To address communication needs during the second phase, we introduce an approximation specific to the Gaussian RBF kernel that yields sparse partial kernel matrices and, thus, efficient communication. We analyze the approximation error of this method, demonstrating that it falls off exponentially with N, the parameter of the approximation. We also examine the positive definiteness of the approximation with respect to Mercer's condition and show that (a) in the limit of N our approximation becomes positive definite for any data set and (b) for a fixed data set, there exists a finite N yielding a positive definite kernel matrix. We also give a simple iterative method for selecting N to yield a positive definite kernel matrix on any fixed data set. In practice, we find that positive definiteness is achieved on all of the data sets we examine with very small N (2-5). Finally, we test the empirical performance of our two methods on a variety of large, real-world data sets, demonstrating large computational speedups with little or no impact on
In order to improve the computation speed of ordered subset expectation maximization (OSEM) algorithm for fully 3-D single photon emission computed tomography (SPECT) reconstruction, a parallelizing, scheme of OSEM re...
详细信息
ISBN:
(纸本)0780392213
In order to improve the computation speed of ordered subset expectation maximization (OSEM) algorithm for fully 3-D single photon emission computed tomography (SPECT) reconstruction, a parallelizing, scheme of OSEM reconstruction algorithm was implemented on an experimental beowulf-type cluster and impact factors on the parallel efficiency were investigated. Two approaches were employed to improve the efficiency: (1) the communication cost was minimized via overlapping communication with computation and (2) the idle time of processes was reduced by auto load balancing. Performance of the optimized parallel algorithm was evaluated in terms of computation time, speedup factor and parallel efficiency. Improvements were observed after optimization. The efficiency was raised from 83.86% to 92.07% in fully 3-D 128 x 128 x 128 SPECT reconstruction.
This paper presents a parallel blocked algorithm for the algebraic path problem (APP). It is known that the complexity of the APP is the same as that of the classical matrix-matrix multiplication-, however, solving th...
详细信息
ISBN:
(纸本)3540290311
This paper presents a parallel blocked algorithm for the algebraic path problem (APP). It is known that the complexity of the APP is the same as that of the classical matrix-matrix multiplication-, however, solving the APP takes much more running time because of its unique data dependencies that limits data reuse drastically. We examine a parallel implementation of a blocked algorithm for the APP on the one-chip Intrinsity FastMATH adaptive processor, which consists of a scalar MIPS processor extended with a SIMD matrix coprocessor. The matrix coprocessor supports native matrix instructions on an array of 4 x 4 processing elements. Implementing with matrix instructions requires us to transform algorithms in terms of matrix-matrix operations. Conventional vectorization for SIMD vector processing deals with only the innermost loop;however, on the FastMATH processor, we need to vectorize two or three nested loops in order to convert the loops to equivalent one matrix operation. Our experimental results show a peak performance of 9.27 COPS and high usage rates of matrix instructions for solving the APP. Findings from our experimental results indicate that the SIMD matrix extension to (super)scalar processor would be very useful for fast solution of many matrix-formulated problems.
A basic problem in graphs and hypergraphs is that of finding a large independent set-one of guaranteed size. Understanding the parallel complexity of this and related independent set problems on hypergraphs is a funda...
详细信息
A basic problem in graphs and hypergraphs is that of finding a large independent set-one of guaranteed size. Understanding the parallel complexity of this and related independent set problems on hypergraphs is a fundamental open issue in parallel computation. Caro and Tuza [J. Graph Theory, 15 (1991), pp. 99-107] have shown a certain lower bound alpha(k)(H) on the size of a maximum independent set in a given k-uniform hypergraph H and have also presented an efficient sequential algorithm to find an independent set of size alpha k(H). They also show that alpha(k)(H) is the size of the maximum independent set for various hypergraph families. Here, we show that an RNC algorithm due to Beame and Luby [in Proceedings of the ACM-SIAM Symposium on Discrete algorithms, 1990, pp. 212-218] finds an independent set of expected size alpha(k)(H) and also derandomizes it for certain special cases. (An intriguing conjecture of Beame and Luby implies that understanding this algorithm better may yield an RNC algorithm to find a maximal independent set in hypergraphs, which is among the outstanding open questions in parallel computation.) We also present lower bounds on independent set size for nonuniform hypergraphs using this algorithm. For graphs, we get an NC algorithm to find independent sets of size essentially that guaranteed by the general (degree-sequence based) version of Turan's theorem.
The Bean critical-state model describes the Penetration of magnetic field into type-II superconductors. Mathematically, it is a free boundary problem, and fast algorithms for its solution are needed in applied superco...
详细信息
The Bean critical-state model describes the Penetration of magnetic field into type-II superconductors. Mathematically, it is a free boundary problem, and fast algorithms for its solution are needed in applied superconductivity. Existence and uniqueness of solution, parallel algorithms, stability, and error estimation for this model are discussed.
暂无评论