In biological networks, some nodes are more influential than others. The most influential nodes are those whose elimination induces a network collapse, and detecting these nodes is crucial in many circumstances. Howev...
详细信息
In biological networks, some nodes are more influential than others. The most influential nodes are those whose elimination induces a network collapse, and detecting these nodes is crucial in many circumstances. However, this is a difficult task when the size of the biological networks is large. In this paper, we have designed and implemented an efficient parallel algorithm for detecting influential nodes for large biological networks by exploiting a Graphics Processing Unit (GPU). The essential concept behind the proposed parallel algorithm is that several computationally expensive procedures in detecting influential nodes are redesigned and transformed into quite efficient GPU-accelerated primitives such as parallel sort, scan, and reduction. Four local metrics, including the Degree Centrality (DC), Companion Behavior (CB), Clustering Coefficient (CC), and H-Index, are used to measure the nodal influence. To evaluate the efficiency of the proposed parallel algorithm, five large real biological networks are employed in the experiments. The experimental results show that (1) the proposed parallel algorithm can achieve speedups of approximately 48 similar to 94 over the corresponding serial algorithm;(2) compared to a baseline parallel algorithm developed on a multi-core CPU, the proposed parallel algorithm yields speedups of 5 similar to 9 for DC and H-Index, while it is slightly slower for CB and CC due to the uneven degree distribution;and (3) when using DC and H-Index, the proposed parallel algorithm is capable of detecting the influential nodes in a large biological network consisting of 150 million edges in less than 3 s. (C) 2019 Elsevier B.V. All rights reserved.
We present details of our efficient implementation of full accuracy unrestricted open-shell second-order canonical Moller-Plesset (MP2) energies, both serial and parallel. The algorithm is based on our previous restri...
详细信息
We present details of our efficient implementation of full accuracy unrestricted open-shell second-order canonical Moller-Plesset (MP2) energies, both serial and parallel. The algorithm is based on our previous restricted closed-shell MP2 code using the Saebo-Almlof direct integral transformation. Depending on system details, UMP2 energies take from less than 1.5 to about 3.0 times as long as a closed-shell RMP2 energy on a similar system using the same algorithm. Several examples are given including timings for some large stable radicals with 90- atoms and over 3600 basis functions. (C) 2011 Wiley Periodicals, Inc. J Comput Chem 32: 3304-3312, 2011
In the literature, there are quite a few sequential and parallel algorithms for solving problems on distance-hereditary graphs. With an n-vertex and m-edge distance-hereditary graph G, we show that the efficient domin...
详细信息
In the literature, there are quite a few sequential and parallel algorithms for solving problems on distance-hereditary graphs. With an n-vertex and m-edge distance-hereditary graph G, we show that the efficient domination problem on G can be solved in O(log(n)(2)) time using O(n + m) processors on a CREW PRAM. Moreover, if a binary tree representation of G is given, the problem can be optimally solved in O(log n) time using O(n/log n) processors on an EREW PRAM.
High-performance computing of atmospheric general circulation models (AGCMs) has been receiving increasing attention in earth science research. However, when scaling to large-scale multi-core computing, the paralleliz...
详细信息
High-performance computing of atmospheric general circulation models (AGCMs) has been receiving increasing attention in earth science research. However, when scaling to large-scale multi-core computing, the parallelization of an AGCM which demands fast parallel computing for long-time integration or climate simulation becomes extremely challenging due to its inner complex numerical calculation. The previous Institute of Atmospheric Physics of the Chinese Academy of Sciences Atmospheric General Circulation Model version 4.0 (IAP AGCM4.0) with one-dimensional domain decomposition can only run on dozens of CPU cores, so the paper proposes a two-dimensional domain decomposition parallel algorithm for it. In the parallel implementation of the IAP AGCM4.0, its dynamical core utilizes a hybrid form of latitude longitude decomposition and vertical direction/longitude circle direction decomposition. Through experiments on a multi-core cluster, we confirmed that our algorithm is efficient and scalable. The parallel efficiency of the IAP AGCM4.0 can reach up to 50.88% on 512 CPU cores, and the IAP AGCM4.0 can be run long-term simulations for climate change research. (C) 2017 Elsevier B.V. All rights reserved.
Precise integration methods to solve structural dynamic responses and the corresponding time integration formula are composed of two parts: the multiplication of an exponential matrix with a vector and the integratio...
详细信息
Precise integration methods to solve structural dynamic responses and the corresponding time integration formula are composed of two parts: the multiplication of an exponential matrix with a vector and the integration term. The second term can be solved by the series solution. Two hybrid granularity parallel algorithms are designed, that is, the exponential matrix and the first term are computed by the fine-grained parallel algorithra and the second term is computed by the coarse-grained parallel algorithm. Numerical examples show that these two hybrid granularity parallel algorithms obtain higher speedup and parallel efficiency than two existing parallel algorithms.
Radix sorting is an essential basic data processing operation in many computer fields. It has important practical significance to accelerate its performance through Graphic Processing Unit (GPU). The heterogeneous par...
详细信息
Radix sorting is an essential basic data processing operation in many computer fields. It has important practical significance to accelerate its performance through Graphic Processing Unit (GPU). The heterogeneous parallel computing technology attracts much attention and is widely applied for its effective computation efficiency and parallel real-time data processing capability. Taking advantage of the parallelism of GPU in numerical computation processing, a parallelization design method of the Binary_Least Significant Digit (LSD) first Radix Sorting (B_LSD_RS) algorithm based on Open Computing Language (OpenCL) is proposed. The radix sorting algorithm is divided into multiple kernel tasks, and the kernels are sequentially controlled by the event information transfer. The parallel algorithm is implemented and verified on the GPU + CPU heterogeneous platform. The experimental results show that compared with the performance of the B_LSD_RS sequential algorithm based on AMD Ryzen5 1600X CPU, B_LSD_RS parallel algorithm based on Open Multi-Processing (OpenMP) and B_LSD_RS parallel algorithm based on Compute Unified Device Architecture (CUDA), the B_LSD_RS parallel algorithm based on OpenCL obtained 28.86 times, 11.01 times and 2.14 times speedup in the NVIDIA GTX 1070 computing platform respectively, not only achieves high performance but also achieves performance portability among different GPU computing platforms.
An efficient parallel algorithm is developed for second-order Moller-Plesset perturbation theory with the resolution-of-identity approximation of two-electron repulsion integrals (RI-MP2) to perform MP2 energy calcula...
详细信息
An efficient parallel algorithm is developed for second-order Moller-Plesset perturbation theory with the resolution-of-identity approximation of two-electron repulsion integrals (RI-MP2) to perform MP2 energy calculations of large molecules on distributed memory processors. Benchmark calculations are carried out for taxol (C47H51NO14), valinomycin (C54H90N6O18), and two-layer nanographene sheets (C96H24)(2),which show the high parallel efficiency of the developed algorithm. (C) 2009 Wiley Periodicals, Inc. Int J Quantum Chern 109: 2121-2130, 2009
A parallel algorithm is described for computing the minimum spanning tree of an undirected, connected and weighted graph withn vertices. We assume a shared-memory single-instruction-stream, multiple-data-stream model ...
详细信息
A parallel algorithm is described for computing the minimum spanning tree of an undirected, connected and weighted graph withn vertices. We assume a shared-memory single-instruction-stream, multiple-data-stream model of computation which does not allow read or write conflicts. The algorithm is adaptive in the sense that it usesn 1?e processors and runs inO(n 1+e ) time wheree lies between 0 and 1 and depends on the number of available processors. In view of the obvious Ω(n 2) lower bound on the number of operations required to compute a minimum spanning tree, the algorithm is also cost-optimal.
A new algorithm for nonlinear eigenvalue problems is proposed. The numerical technique is based on a perturbation of the coefficients of differential equation combined with the Adomian decomposition method for the non...
详细信息
A new algorithm for nonlinear eigenvalue problems is proposed. The numerical technique is based on a perturbation of the coefficients of differential equation combined with the Adomian decomposition method for the nonlinear part. The approach provides an exponential convergence rate with a base which is inversely proportional to the index of the eigenvalue under consideration. The eigenpairs can be computed in parallel. Numerical examples are presented to support the theory. They are in good agreement with the spectral asymptotics obtained by other authors.
Maximum flow is one of the important and classical combinatorial optimization problems. However, the time complexity of sequential maximum flow algorithms remains high. In this paper, we present a two-stage distribute...
详细信息
Maximum flow is one of the important and classical combinatorial optimization problems. However, the time complexity of sequential maximum flow algorithms remains high. In this paper, we present a two-stage distributed parallel algorithm (TSDPA) with message passing interface to improve the computational performance. The strategy of TSDPA has two stages, which push excess flows separately along cheap and expensive paths identified by a new distance estimate function. In TSDPA, stage 1 enhances the parallel efficiency by omitting high-cost paths and decentralizing calculations, and stage 2 guarantees the achievement of an optimal solution through divide-and-conquer method. The experimental test demonstrates that TSDPA runs 1.2-15.5 times faster than sequential algorithms and is faster than or almost as fast as the H_PRF and Q_PRF codes.
暂无评论