Computed tomographic imaging spectrometers capture hyperspectral images in real-time. However, postprocessing the imagery can require enormous computational resources;thus, limiting its application to nonrealtime scen...
详细信息
Computed tomographic imaging spectrometers capture hyperspectral images in real-time. However, postprocessing the imagery can require enormous computational resources;thus, limiting its application to nonrealtime scenarios. To overcome these challenges, we developed a highly parallelizable algorithm that exploits spatial shift-invariance. To demonstrate the versatility of our algorithm, we developed implementations on a desktop and an embedded graphics processing unit. To our knowledge, our results show the fastest image reconstruction times reported. (C) 2020 Society of Photo-Optical Instrumentation Engineers (SPIE)
A new algorithm for eigenvalue problems for linear differential operators with fractional derivatives is proposed and justified. The algorithm is based on the approximation (perturbation) of the coefficients of a part...
详细信息
A new algorithm for eigenvalue problems for linear differential operators with fractional derivatives is proposed and justified. The algorithm is based on the approximation (perturbation) of the coefficients of a part of the differential operator by piecewise constant functions where the eigenvalue problem for the last one is supposed to be simpler than the original one. Another milestone of the algorithm is the homotopy idea which results at the possibility for a given eigenpair number to compute recursively a sequence of the approximate eigenpairs. This sequence converges to the exact eigenpair with a super-exponential convergence rate. The eigenpairs can be computed in parallel for all prescribed indexes. The proposed method possesses the following principal property: its convergence rate increases together with the index of the eigenpair. Numerical examples confirm the theory.
The goal of this paper is to develop a parallel algorithm for the direct solution of large sparse linear systems and integrate it into domain decomposition methods. The computational effort for these linear systems, o...
详细信息
The goal of this paper is to develop a parallel algorithm for the direct solution of large sparse linear systems and integrate it into domain decomposition methods. The computational effort for these linear systems, often encountered in numerical simulation of structural mechanics problems by finite element codes, is very significant in terms of run-time and memory requirements. In this paper, a two-level parallelism is exploited. The exploitation of the lower level of parallelism is based on the development of a parallel direct solver with a nested dissection algorithm and to introduce it into the FETI methods. This direct solver has the advantage of handling zero-energy modes in floating structures automatically and properly. The upper level of parallelism is a coarse-grain parallelism between substructures of FETI. Some numerical tests are carried out to evaluate the performance of the direct solver.
We present a new parallel algorithm to solve the all-pairs shortest path problem in a given graph which is considerably faster than the most recently published algorithm [7] for the same problem. Next we propose a sui...
详细信息
We present a new parallel algorithm to solve the all-pairs shortest path problem in a given graph which is considerably faster than the most recently published algorithm [7] for the same problem. Next we propose a suitable VLSI systolic architecture to map our algorithm and evaluate the performance of the proposed architecture in terms of execution time and inter-processor communication time. We show that our implementation has O(log2n) execution time (compare-exchange time) and O(nlogn) communication time compared to O(nlogn) and O(n2) in [7].
Dislocation dynamics (DD), a discrete dynamic simulation method in which dislocations are the fundamental entities, is a powerful tool for investigation of plasticity, deformation and fracture of materials at the micr...
详细信息
Dislocation dynamics (DD), a discrete dynamic simulation method in which dislocations are the fundamental entities, is a powerful tool for investigation of plasticity, deformation and fracture of materials at the micron length scale. However, severe computational difficulties arising from complex, long-range interactions between these curvilinear line defects limit the application of DD in the study of large-scale plastic deformation. We present here the development of a parallel algorithm for accelerated computer simulations of DD. By representing dislocations as a 3D set of dislocation particles, we show here that the problem of an interacting ensemble of dislocations can be converted to a problem of a particle ensemble, interacting with a long-range force field. A grid using binary space partitioning is constructed to keep track of node connectivity across domains. We demonstrate the computational efficiency of the parallel micro-plasticity code and discuss how O(N) methods map naturally onto the parallel data structure. Finally, we present results from applications of the parallel code to deformation in single crystal fee metals. (c) 2006 Elsevier Inc. All rights reserved.
A new parallel algorithm has been developed for calculating the analytic energy derivatives of full accuracy second order Moller-Plesset perturbation theory (MP2). Its main projected application is the optimization of...
详细信息
A new parallel algorithm has been developed for calculating the analytic energy derivatives of full accuracy second order Moller-Plesset perturbation theory (MP2). Its main projected application is the optimization of geometries of large molecules, in which noncovalent interactions play a significant role. The algorithm is based on the two-step MP2 energy calculation algorithm developed recently and implemented into the quantum chemistry program, GAMESS. Timings are presented for test calculations on taxol (C47H51NO14) With the 6-31G and 6-31G(d) basis sets (660 and 1032 basis functions, 328 correlated electrons) and luciferin (C11H8N2O3S2) with aug-cc-pVDZ and aug-cc-pVTZ (530 and 1198 basis functions, 92 correlated electrons). The taxol 6-31G(d) calculations are also performed with up to 80 CPU cores. The results demonstrate the high parallel efficiency of the program. (c) 2007 Wiley Periodicals, Inc.
Two parallel algorithms are proposed in this paper for solving the problem of finding exact exclusive-or sum of products (ESOP) expressions for an arbitrary Boolean function. This minimization problem is a very diffic...
详细信息
Two parallel algorithms are proposed in this paper for solving the problem of finding exact exclusive-or sum of products (ESOP) expressions for an arbitrary Boolean function. This minimization problem is a very difficult one and solutions have been proposed only for up to seven variables. The processing time for some symmetric functions of seven variables is of the order of weeks. The proposed algorithm is a hybrid one (OpenMP, MPI) and a speed-up of more than nine could be achieved, for a cluster of three nodes with four cores each.
The dynamic lot-sizing model (DLS) is one of the most frequently used models in production and inventory system because lot decisions can greatly affect the performance of the system. The practicality of DLS algorithm...
详细信息
The dynamic lot-sizing model (DLS) is one of the most frequently used models in production and inventory system because lot decisions can greatly affect the performance of the system. The practicality of DLS algorithms is hindered by the huge amount of computer resources required for solving these models, even for a modest problem. This study developed a parallel algorithm to solve the lot-sizing problem efficiently. Given that n is the size of the problem, the complexity of the proposed parallel algorithm is O(n(2)p) with p processors. Numerical experiments are provided to verify the complexity of the proposed algorithm. The empirical results demonstrate that the speedup of this parallel algorithm approaches linearity, which means that the proposed algorithm can take full advantage of the distributed computing power as the size of the problem increases. (C) 2001 Elsevier Science Ltd. All rights reserved.
We present a randomized algorithm to find a minimum spanning forest (MSF) in an undirected graph. With high probability, the algorithm runs in logarithmic time and linear work on an exclusive read exclusive write (ERE...
详细信息
We present a randomized algorithm to find a minimum spanning forest (MSF) in an undirected graph. With high probability, the algorithm runs in logarithmic time and linear work on an exclusive read exclusive write (EREW) PRAM. This result is optimal w.r.t. both work and parallel time, and is the first provably optimal parallel algorithm for this problem under both measures. We also give a simple, general processor allocation scheme for tree-like computations.
parallel computing is a useful technology for scientific and engineering algorithms/applications. LU-SGS (lower-upper Symmetric-Gauss-Seidel method) is an efficient and robust scheme for CFD (Computational fluid dynam...
详细信息
parallel computing is a useful technology for scientific and engineering algorithms/applications. LU-SGS (lower-upper Symmetric-Gauss-Seidel method) is an efficient and robust scheme for CFD (Computational fluid dynamics) and has strong data dependence in its computation. In this paper, we present an efficient wavefront parallel algorithm for 3D (three dimensional) LU-SGS with structured meshes. The corresponding data structure and memory access method with better data locality and communication optimization is designed. The performances of the presented parallel algorithm are reported with different problem sizes. Some discussion and performance issues are also reported. The results show that the overall performance speedup of one Intel E5540 CPU (4 CPU cores) ranges from 2.23 to 2.95 compared with one E5540 core. The parallel efficiency of 1024, 128 processes are up to 35.68%, 72.69% compared with 32 processes on a distributed memory cluster system. The CFD simulation of M6 wing model shows the effect of the presented parallel algorithm. (C) 2016 Elsevier Ltd. All rights reserved.
暂无评论