A new parallel algorithm has been developed for second-order Moller-Plesset perturbation theory (MP2) energy calculations. Its main projected applications are for large molecules, for instance, for the calculation of ...
详细信息
A new parallel algorithm has been developed for second-order Moller-Plesset perturbation theory (MP2) energy calculations. Its main projected applications are for large molecules, for instance, for the calculation of dispersion interaction. Tests on a moderate number of processors (2-16) show that the program has high CPU and parallel efficiency. Timings are presented for two relatively large molecules, taxol (C47H51NO14) and luciferin (C11H8N2O3S2), the former with the 6-31G* and 6-311G** basis sets (1032 and 1484 basis functions, 164 correlated orbitals), and the latter with the aug-cc-pVDZ and aug-cc-pVTZ basis sets (530 and 1198 basis functions, 46 correlated orbitals). An MP2 energy calculation on C130H10 (1970 basis functions, 265 con-elated orbitals) completed in less than 2 h on 128 processors. (c) 2006 Wiley Periodicals, Inc.
Given a natural number e, an addition chain for e is a finite sequence of numbers having the following properties: (1) the first number is one, (2) every element is the sum of two earlier elements, and (3) the given n...
详细信息
Given a natural number e, an addition chain for e is a finite sequence of numbers having the following properties: (1) the first number is one, (2) every element is the sum of two earlier elements, and (3) the given number occurs at the end of the sequence. We introduce a fast optimal algorithm to generate a chain of short length for the number e of n-bits. The algorithm is based on the right-left binary strategy and barrel shifter circuit. The algorithm uses processors and runs in time under exclusive read exclusive write parallel random access machine.
Currently, a tremendous amount of space debris in Earth's orbit imperils operational spacecraft. It is essential to undertake risk assessments of collisions and predict dangerous encounters in space. However, coll...
详细信息
Currently, a tremendous amount of space debris in Earth's orbit imperils operational spacecraft. It is essential to undertake risk assessments of collisions and predict dangerous encounters in space. However, collision predictions for an enormous amount of space debris give rise to large-scale computations. In this paper, a parallel algorithm is established on the Compute Unified Device Architecture (CUDA) platform of NVIDIA Corporation for collision prediction. According to the parallel structure of NVIDIA graphics processors, a block decomposition strategy is adopted in the algorithm. Space debris is divided into batches, and the computation and data transfer operations of adjacent batches overlap. As a consequence, the latency to access shared memory during the entire computing process is significantly reduced, and a higher computing speed is reached. Theoretically, a simulation of collision prediction for space debris of any amount and for any time span can be executed. To verify this algorithm, a simulation example including 1382 pieces of debris, whose operational time scales vary from 1 min to 3 days, is conducted on Tesla C2075 of NVIDIA. The simulation results demonstrate that with the same computational accuracy as that of a CPU, the computing speed of the parallel algorithm on a GPU is 30 times that on a CPU. Based on this algorithm, collision prediction of over 150 Chinese spacecraft for a time span of 3 days can be completed in less than 3 h on a single computer, which meets the timeliness requirement of the initial screening task. Furthermore, the algorithm can be adapted for multiple tasks, including particle filtration, constellation design, and Monte-Carlo simulation of an orbital computation. (C) 2017 COSPAR. Published by Elsevier Ltd. All rights reserved.
Accurate long-term runoff forecasting is crucial for managing and allocating water resources. Due to the complexity and variability of natural runoff, the most difficult problems currently faced by long-term runoff fo...
详细信息
Accurate long-term runoff forecasting is crucial for managing and allocating water resources. Due to the complexity and variability of natural runoff, the most difficult problems currently faced by long-term runoff forecasting are the difficulty of model construction, poor prediction accuracy, and time intensive forecasting processes. Therefore, this study proposes a hybrid long-term runoff forecasting framework that uses the antecedent inflow and specific meteorological factors as the inputs, is modeled by ensemble empirical mode decomposition (EEMD) coupled with an artificial neural network (ANN), and computed by a parallel algorithm. First, the framework can transform monthly inflow and meteorological series into stationary signals via EEMD to more comprehensively explore the relationships of the input factors through the ANN. Second, the selected meteorological factors that are closely related to inflow formation can be filtered out by the single correlation coefficient method, which contributes to reducing coupling between input factors, and increases the accuracy of the prediction models. Finally, a multicore parallel algorithm that is easily accessed everywhere and that fully utilizes multiple calculation resources while flexibly contending with various optimization requirements will improve forecasting efficiency. The Xiaowan Hydropower Station (XW) is selected as the study area, and the final results of the study show that (1) the addition of targeted meteorological factors does indeed greatly enhance the performance of the prediction models;(2) the five criteria for evaluating the prediction accuracy show that the EEMD-ANN model is far superior to the prediction performance from the ordinary ANN model when run under the same input conditions;and (3) the optimization time of the 32-core model can be reduced by as much as 25 times, which significantly saves time during the forecast process.
This paper presents a very high-speed image processing algorithm applied to multi-faceted asymmetric radiation from the edge (MARFE) detection on the Joint European Torus. The algorithm was built in serial and paralle...
详细信息
This paper presents a very high-speed image processing algorithm applied to multi-faceted asymmetric radiation from the edge (MARFE) detection on the Joint European Torus. The algorithm was built in serial and parallel versions and written in C/C+ using OpenCV, cvBlob, and LibSVM libraries. The code implemented was characterized by its accuracy and run-time performance. The final result of the parallel version achieves a correct detection rate of 97.6% for MARFE identification and an image processing rate of more than 10 000 frame per second. The parallel version divides the image processing chain into two groups and seven tasks. One group is responsible for Background Image Estimation and Image Binarization modules, and the other is responsible for region Feature Extraction and Pattern Classification. At the same time and to maximize the workload distribution, the parallel code uses data parallelism and pipeline strategies for these two groups, respectively. A master thread is responsible for opening, signaling, and transferring images between both groups. The algorithm has been tested in a dedicated Intel symmetric-multiprocessing computer architecture with a Linux operating system.
Permutation generation is an important problem in combinatorial computing. In this paper we present an optimal parallel algorithm to generate all N! permutations of N objects. The algorithm is designed to be executed ...
详细信息
Permutation generation is an important problem in combinatorial computing. In this paper we present an optimal parallel algorithm to generate all N! permutations of N objects. The algorithm is designed to be executed on a very simple computation model that is a linear array with N identical processors. Because of the simplicity and regularity of the processors, the model is very suitable for VLSI implementation. Another advantageous characteristic of this design is that it can generate all the permutations in minimal change order.
This paper presents a simple yet effective algorithm to improve an arbitrary Poisson disk sampling to reach the maximal property, i.e., no more Poisson disk can be inserted. Taking a non-maximal Poisson disk sampling ...
详细信息
This paper presents a simple yet effective algorithm to improve an arbitrary Poisson disk sampling to reach the maximal property, i.e., no more Poisson disk can be inserted. Taking a non-maximal Poisson disk sampling as input, our algorithm efficiently detects the regions allowing additional samples and then generates Poisson disks in these regions. The key idea is to convert the complicated plane or space searching problem into a simple searching on circles or spheres, which is one dimensional lower than the original sampling domain. Our algorithm is memory efficient and flexible, which generates maximal Poisson disk sampling in an arbitrary 2D polygon or 3D polyhedron. Moreover, our parallel algorithm can be extended from the Euclidean space to curved surfaces in an intrinsic manner. Thanks to its parallel structure, our method can be implemented easily on modern graphics hardware. We have observed significance performance improvement compared to the existing techniques. (C) 2013 Elsevier Ltd. All rights reserved.
We present a parallel algorithm for computing the Voronoi diagram of a set of spheres, S in R3. The spheres have varying radii and do not intersect. We compute each Voronoi cell independently using a two-stage iterati...
详细信息
We present a parallel algorithm for computing the Voronoi diagram of a set of spheres, S in R3. The spheres have varying radii and do not intersect. We compute each Voronoi cell independently using a two-stage iterative procedure, assuming the input spheres are in general position. In the first stage, an initial Voronoi cell for a sphere si is computed using an iterative lower envelope approach restricted to a subset of spheres Li subset of S. This helps to avoid defining the bisectors between all pairs of input spheres and develop a distributed memory parallel algorithm. We use the Delaunay graph of sample points from the input spheres to select the subset Li for computing each Voronoi cell. In the second stage, Voronoi cells obtained from the first stage are matched for updating the subsets. If additional spheres are added to a subset Li, the correctness of the computed vertices is verified with the bisectors of spheres newly added to Li. Results and performance of the algorithm show robustness and speed of the algorithm in handling a large set of spheres.
Seismic interferometry is a technique for extracting deterministic signals (i.e., ambient-noise Green's functions) from recordings of ambient-noise wavefields through cross-correlation and other related signal pro...
详细信息
Seismic interferometry is a technique for extracting deterministic signals (i.e., ambient-noise Green's functions) from recordings of ambient-noise wavefields through cross-correlation and other related signal processing techniques. The extracted ambient-noise Green's functions can be used in ambient noise tomography for constructing seismic structure models of the Earth's interior. The amount of calculations involved in the seismic interferometry procedure can be significant, especially for ambient noise datasets collected by large seismic sensor arrays (i.e., "large-N" data). We present an efficient parallel algorithm, named pSIN (parallel Seismic INterferometry), for solving seismic interferometry problems on conventional distributed-memory computer clusters. The design of the algorithm is based on a two-dimensional partition of the ambient-noise data recorded by a seismic sensor array. We pay special attention to the balance of the computational load, inter-process communication overhead and memory usage across all MPI processes and we minimize the total number of I/O operations. We have tested the algorithm using a real ambient-noise dataset and obtained a significant amount of savings in processing time. Scaling tests have shown excellent strong scalability from 80 cores to over 2000 cores. (C) 2016 Elsevier Ltd. All rights reserved.
An efficient L-0-stable parallel algorithm is developed for the two-dimensional diffusion equation with non-local time-dependent boundary conditions. The algorithm is based on subdiagonal Pade approximation to the mat...
详细信息
An efficient L-0-stable parallel algorithm is developed for the two-dimensional diffusion equation with non-local time-dependent boundary conditions. The algorithm is based on subdiagonal Pade approximation to the matrix exponentials arising from the use of the method of lines and may be implemented on a parallel architecture using two processors running concurrently with each processor employing the use of tridiagonal solvers at every time-step. The algorithm is tested on two model problems from the literature for which discontinuities between initial and boundary conditions exist. The CPU times together with the associated error estimates are compared.
暂无评论