This paper studies the data redundancy of the coefficient matrix of the corresponding discrete system which forms a basis for fast algorithms of solving the integral equation whose kernel includes a convolution functi...
详细信息
This paper studies the data redundancy of the coefficient matrix of the corresponding discrete system which forms a basis for fast algorithms of solving the integral equation whose kernel includes a convolution function factor. We develop lossless matrix compression strategies, which reduce the cost of integral evaluations and the storage to linear complexity, i.e., the same order of the approximation space dimensions. We establish that this algorithm preserves the convergence order of the approximate solution. We also propose a hardware-aware parallel algorithm for these strategies.
An efficient parallel algorithm is developed for second-order Moller-Plesset perturbation theory with the resolution-of-identity approximation of two-electron repulsion integrals (RI-MP2) to perform MP2 energy calcula...
详细信息
An efficient parallel algorithm is developed for second-order Moller-Plesset perturbation theory with the resolution-of-identity approximation of two-electron repulsion integrals (RI-MP2) to perform MP2 energy calculations of large molecules on distributed memory processors. Benchmark calculations are carried out for taxol (C47H51NO14), valinomycin (C54H90N6O18), and two-layer nanographene sheets (C96H24)(2),which show the high parallel efficiency of the developed algorithm. (C) 2009 Wiley Periodicals, Inc. Int J Quantum Chern 109: 2121-2130, 2009
This paper presents parallel matrix multiplication in C#. Threaded computation was distributed to all available cores in the shared memory system. Number of threads created to parallelize for-loops is equal to the num...
详细信息
ISBN:
(纸本)9789532330762
This paper presents parallel matrix multiplication in C#. Threaded computation was distributed to all available cores in the shared memory system. Number of threads created to parallelize for-loops is equal to the number of cores in the system. The main aim of this work was to accomplish parallelization similar to what Microsoft .NET V4.o provided in their parallel extensions library and make it easy to use.
We are proposing a hybrid algorithm for constructing an efficient Aho-Corasick automaton designed for data-parallel processing in knowledge-based IDS, that supports the use of regular expressions in the patterns, and ...
详细信息
ISBN:
(纸本)9783642330179
We are proposing a hybrid algorithm for constructing an efficient Aho-Corasick automaton designed for data-parallel processing in knowledge-based IDS, that supports the use of regular expressions in the patterns, and validate its use as part of the signature matching process, a critical component of modern intrusion detection systems. Our approach uses a hybrid memory storage mechanism, an adaptation of the Smith-Waterman local-sequence alignment algorithm and additionally employs path compression and bitmapped nodes. Using as a test-bed a set of the latest virus signatures from the ClamAV database, we show how the new automata obtained through our approach can significantly improve memory usage by a factor of times compared to the unoptimized version, while still keeping the throughput at similar levels.
Since the semantic relationship between words is neglected, the results of the text clustering algorithms that only use word frequency are not precision. In this paper, a semantic tree based text clustering algorithm ...
详细信息
ISBN:
(纸本)9788988678558
Since the semantic relationship between words is neglected, the results of the text clustering algorithms that only use word frequency are not precision. In this paper, a semantic tree based text clustering algorithm which is based on WordNet is proposed. In order to reduce the time complexity, we adopt parallel algorithm in multi-processes model. This parallel algorithm starts some processes at the same time. The master process undertakes the task of data partitioning, sending information, collecting information and clustering the result. The slave processes basically are in charge of statistics of word frequency, calculating the weights and getting hypernyms of some words according to the semantic tree. The results of experiment show that this algorithm is not only higher in precision, but also with lower time complexity.
Based on the work of Xu and Zhou [Math. Comp., 69 (2000), pp. 881-909], this paper combines the local defect-correction technique and the shifted-inverse power method to establish new local and parallel finite element...
详细信息
Based on the work of Xu and Zhou [Math. Comp., 69 (2000), pp. 881-909], this paper combines the local defect-correction technique and the shifted-inverse power method to establish new local and parallel finite element three-scale schemes for a class of eigenvalue problems. It is proved that with these schemes, the solution of an eigenvalue problem on a fine grid pi(h) is reduced to the solution of an eigenvalue problem on a much coarser grid pi(H), the solution of a linear algebraic system on a globally mesoscopic grid pi(w), and the solutions of linear systems on several locally fine grids in parallel. The principle to determine the diameters of three different scale grids is given. Especially, this paper devises a new local and parallel finite element multiscale discretization scheme. Theoretical analysis and numerical experiments show that the computational approach proposed in this paper is simple and easy to carry out and can be used to solve singular eigenvalue problems efficiently.
The goal of this paper is to develop a parallel algorithm for the direct solution of large sparse linear systems and integrate it into domain decomposition methods. The computational effort for these linear systems, o...
详细信息
The goal of this paper is to develop a parallel algorithm for the direct solution of large sparse linear systems and integrate it into domain decomposition methods. The computational effort for these linear systems, often encountered in numerical simulation of structural mechanics problems by finite element codes, is very significant in terms of run-time and memory requirements. In this paper, a two-level parallelism is exploited. The exploitation of the lower level of parallelism is based on the development of a parallel direct solver with a nested dissection algorithm and to introduce it into the FETI methods. This direct solver has the advantage of handling zero-energy modes in floating structures automatically and properly. The upper level of parallelism is a coarse-grain parallelism between substructures of FETI. Some numerical tests are carried out to evaluate the performance of the direct solver.
In this paper we provide a parallel formulation of the dancing links algorithm described by Donald E. Knuth. This algorithm uses an efficient encoding of the exact set cover problem. Using backtracking the state space...
详细信息
ISBN:
(纸本)9788192024974
In this paper we provide a parallel formulation of the dancing links algorithm described by Donald E. Knuth. This algorithm uses an efficient encoding of the exact set cover problem. Using backtracking the state space is search in a depth-first manner. We will derive the parallel algorithm and outline some implementation details. We conclude with experimental results for the n-queens-problem showing the nearly linear speed-up of the presented approach.
It becomes increasingly common to use GPU (Graphics Processing Units) as accelerators to speed up compute-intensive sections of applications. Since block ciphers are supposed to be used for high speed encryption, it i...
详细信息
ISBN:
(纸本)9780769550886
It becomes increasingly common to use GPU (Graphics Processing Units) as accelerators to speed up compute-intensive sections of applications. Since block ciphers are supposed to be used for high speed encryption, it is important to implement them as fast as possible. Block cipher ARIA is a new type of encryption standard with four different Sboxes. This paper proposes three methods of high performance implementations of ARIA encryption algorithm on GPU. In order to reduce the data dependency, the round function of ARIA are merged into lookup tables and XOR operations. Encrypting process is performed in parallel and all the data in different GPU memory spaces are arranged properly. Experimental results demonstrate that these techniques accelerate the speed of ARIA encryption significantly. The quantitative performance comparison demonstrates acceleration up to 18 - 45 times speedup while the size of plaintext varies from 4M to 256M.
In order to form a false scene in Synthetic Aperture Radar (SAR) image, deceptive jammer need to get the relevant SAR parameters. In these parameters, squint angle and beamwidth usally change and it will make the pre-...
详细信息
ISBN:
(纸本)9780819498021
In order to form a false scene in Synthetic Aperture Radar (SAR) image, deceptive jammer need to get the relevant SAR parameters. In these parameters, squint angle and beamwidth usally change and it will make the pre-generated jamming signal unuseful. For solving this problem, a strategy is proposed to transform the pre-generated jamming signals to counter SAR with arbitrary squint angle and beamwidth in real time. Firstly, the jamming effects under estimation errors of SAR's squint angle and beam-width are analyzed. Using Graphics Processing Units (GPU), a parallel algorithm to generate jamming signals for varying squint angle and azimuth beam-width is proposed. Then, This paper describes a method that can implement the signal transformation between wide-beam condition and narrow-beam condition. Based on the generated signals, the jamming under arbitrary squint angle and beam-width can be realized in real time. The simulation results shows that this strategy is effective to jam SAR with varieties of squint angles and wide-beams.
暂无评论