An efficient parallel algorithm is presented for computing selected components of A(-1) where A is a structured symmetric sparse matrix. Calculations of this type are useful for several applications, including electro...
详细信息
An efficient parallel algorithm is presented for computing selected components of A(-1) where A is a structured symmetric sparse matrix. Calculations of this type are useful for several applications, including electronic structure analysis of materials in which the diagonal elements of the Green's functions are needed. The algorithm proposed here is a direct method based on a block LDLT factorization. The selected elements of A(-1) we compute lie in the nonzero positions of L+L-T. We use the elimination tree associated with the block LDLT factorization to organize the parallel algorithm, and reduce the synchronization overhead by passing the data level by level along this tree using the technique of local buffers and relative indices. We demonstrate the efficiency of our parallel implementation by applying it to a discretized two dimensional Hamiltonian matrix. We analyze the performance of the parallel algorithm by examining its load balance and communication overhead, and show that our parallel implementation exhibits an excellent weak scaling on a large-scale high performance distributed-memory parallel machine.
A micro-digital holographic particle tracking velocimetry with high-speed system is constructed by a PC grid environment that employs Windows XP with AD-POWERs as parallel tool. Two algorithms for high-speed system ar...
详细信息
A micro-digital holographic particle tracking velocimetry with high-speed system is constructed by a PC grid environment that employs Windows XP with AD-POWERs as parallel tool. Two algorithms for high-speed system are evaluated under the same PC grid environment. Both methods are based on a computer-generated hologram algorithm. One method is a division algorithm based on time development for the measurements, while the other is a division algorithm based on spatial reconstruction for the measurement. In case of the former, the performance is increased by a factor of 3.3 by using 4 PCs. The present system can compute huge hologram images and output them "on-site" at an experimental facility. (c) 2007 Elsevier B.V. All fights reserved.
Computed tomographic imaging spectrometers capture hyperspectral images in real-time. However, postprocessing the imagery can require enormous computational resources;thus, limiting its application to nonrealtime scen...
详细信息
Computed tomographic imaging spectrometers capture hyperspectral images in real-time. However, postprocessing the imagery can require enormous computational resources;thus, limiting its application to nonrealtime scenarios. To overcome these challenges, we developed a highly parallelizable algorithm that exploits spatial shift-invariance. To demonstrate the versatility of our algorithm, we developed implementations on a desktop and an embedded graphics processing unit. To our knowledge, our results show the fastest image reconstruction times reported. (C) 2020 Society of Photo-Optical Instrumentation Engineers (SPIE)
A new algorithm for eigenvalue problems for linear differential operators with fractional derivatives is proposed and justified. The algorithm is based on the approximation (perturbation) of the coefficients of a part...
详细信息
A new algorithm for eigenvalue problems for linear differential operators with fractional derivatives is proposed and justified. The algorithm is based on the approximation (perturbation) of the coefficients of a part of the differential operator by piecewise constant functions where the eigenvalue problem for the last one is supposed to be simpler than the original one. Another milestone of the algorithm is the homotopy idea which results at the possibility for a given eigenpair number to compute recursively a sequence of the approximate eigenpairs. This sequence converges to the exact eigenpair with a super-exponential convergence rate. The eigenpairs can be computed in parallel for all prescribed indexes. The proposed method possesses the following principal property: its convergence rate increases together with the index of the eigenpair. Numerical examples confirm the theory.
The goal of this paper is to develop a parallel algorithm for the direct solution of large sparse linear systems and integrate it into domain decomposition methods. The computational effort for these linear systems, o...
详细信息
The goal of this paper is to develop a parallel algorithm for the direct solution of large sparse linear systems and integrate it into domain decomposition methods. The computational effort for these linear systems, often encountered in numerical simulation of structural mechanics problems by finite element codes, is very significant in terms of run-time and memory requirements. In this paper, a two-level parallelism is exploited. The exploitation of the lower level of parallelism is based on the development of a parallel direct solver with a nested dissection algorithm and to introduce it into the FETI methods. This direct solver has the advantage of handling zero-energy modes in floating structures automatically and properly. The upper level of parallelism is a coarse-grain parallelism between substructures of FETI. Some numerical tests are carried out to evaluate the performance of the direct solver.
We present a new parallel algorithm to solve the all-pairs shortest path problem in a given graph which is considerably faster than the most recently published algorithm [7] for the same problem. Next we propose a sui...
详细信息
We present a new parallel algorithm to solve the all-pairs shortest path problem in a given graph which is considerably faster than the most recently published algorithm [7] for the same problem. Next we propose a suitable VLSI systolic architecture to map our algorithm and evaluate the performance of the proposed architecture in terms of execution time and inter-processor communication time. We show that our implementation has O(log2n) execution time (compare-exchange time) and O(nlogn) communication time compared to O(nlogn) and O(n2) in [7].
Dislocation dynamics (DD), a discrete dynamic simulation method in which dislocations are the fundamental entities, is a powerful tool for investigation of plasticity, deformation and fracture of materials at the micr...
详细信息
Dislocation dynamics (DD), a discrete dynamic simulation method in which dislocations are the fundamental entities, is a powerful tool for investigation of plasticity, deformation and fracture of materials at the micron length scale. However, severe computational difficulties arising from complex, long-range interactions between these curvilinear line defects limit the application of DD in the study of large-scale plastic deformation. We present here the development of a parallel algorithm for accelerated computer simulations of DD. By representing dislocations as a 3D set of dislocation particles, we show here that the problem of an interacting ensemble of dislocations can be converted to a problem of a particle ensemble, interacting with a long-range force field. A grid using binary space partitioning is constructed to keep track of node connectivity across domains. We demonstrate the computational efficiency of the parallel micro-plasticity code and discuss how O(N) methods map naturally onto the parallel data structure. Finally, we present results from applications of the parallel code to deformation in single crystal fee metals. (c) 2006 Elsevier Inc. All rights reserved.
Two parallel algorithms are proposed in this paper for solving the problem of finding exact exclusive-or sum of products (ESOP) expressions for an arbitrary Boolean function. This minimization problem is a very diffic...
详细信息
Two parallel algorithms are proposed in this paper for solving the problem of finding exact exclusive-or sum of products (ESOP) expressions for an arbitrary Boolean function. This minimization problem is a very difficult one and solutions have been proposed only for up to seven variables. The processing time for some symmetric functions of seven variables is of the order of weeks. The proposed algorithm is a hybrid one (OpenMP, MPI) and a speed-up of more than nine could be achieved, for a cluster of three nodes with four cores each.
A new parallel algorithm has been developed for calculating the analytic energy derivatives of full accuracy second order Moller-Plesset perturbation theory (MP2). Its main projected application is the optimization of...
详细信息
A new parallel algorithm has been developed for calculating the analytic energy derivatives of full accuracy second order Moller-Plesset perturbation theory (MP2). Its main projected application is the optimization of geometries of large molecules, in which noncovalent interactions play a significant role. The algorithm is based on the two-step MP2 energy calculation algorithm developed recently and implemented into the quantum chemistry program, GAMESS. Timings are presented for test calculations on taxol (C47H51NO14) With the 6-31G and 6-31G(d) basis sets (660 and 1032 basis functions, 328 correlated electrons) and luciferin (C11H8N2O3S2) with aug-cc-pVDZ and aug-cc-pVTZ (530 and 1198 basis functions, 92 correlated electrons). The taxol 6-31G(d) calculations are also performed with up to 80 CPU cores. The results demonstrate the high parallel efficiency of the program. (c) 2007 Wiley Periodicals, Inc.
The dynamic lot-sizing model (DLS) is one of the most frequently used models in production and inventory system because lot decisions can greatly affect the performance of the system. The practicality of DLS algorithm...
详细信息
The dynamic lot-sizing model (DLS) is one of the most frequently used models in production and inventory system because lot decisions can greatly affect the performance of the system. The practicality of DLS algorithms is hindered by the huge amount of computer resources required for solving these models, even for a modest problem. This study developed a parallel algorithm to solve the lot-sizing problem efficiently. Given that n is the size of the problem, the complexity of the proposed parallel algorithm is O(n(2)p) with p processors. Numerical experiments are provided to verify the complexity of the proposed algorithm. The empirical results demonstrate that the speedup of this parallel algorithm approaches linearity, which means that the proposed algorithm can take full advantage of the distributed computing power as the size of the problem increases. (C) 2001 Elsevier Science Ltd. All rights reserved.
暂无评论