Detection of strongly connected component (SCC) on the GPU has become a fundamental operation to accelerate graph computing. Existing SCC detection methods on multiple GPUs introduce massive unnecessary data transform...
详细信息
ISBN:
(纸本)9781728133201
Detection of strongly connected component (SCC) on the GPU has become a fundamental operation to accelerate graph computing. Existing SCC detection methods on multiple GPUs introduce massive unnecessary data transformation between multiple GPUs. In this paper, we propose a novel distributed SCC detection approach using multiple GPUs plus CPU. Our approach includes three key ideas: (1) segmentation and labeling over large-scale datasets;(2) collecting and merging the segmented SCCs;and (3) running tasks assignment over multiples GPUs and CPU. We implement our approach under a hybrid distributed architecture with multiple GPUs plus CPU. Our approach can achieve device-level optimization and can be compatible with the state-of-the-art algorithms. We conduct extensive theoretical and experimental analysis to demonstrate efficiency and accuracy of our approach. The experimental results expose that our approach can achieves 11.2x, 1.2x, 1.2x speedup for SCC detection using NVIDIA K80 compared with Tarjan's, FB-Trim, and FB-Hybrid algorithms respectively.
The fast multipole method (FMM) is often used to accelerate the calculation of particle interactions in particle-based methods to simulate incompressible flows. To evaluate the most time-consuming kernels-the Biot-Sav...
详细信息
ISBN:
(纸本)9781479941162
The fast multipole method (FMM) is often used to accelerate the calculation of particle interactions in particle-based methods to simulate incompressible flows. To evaluate the most time-consuming kernels-the Biot-Savart equation and stretching term of the vorticity equation, we mathematically reformulated it so that only two Laplace scalar potentials are used instead of six. This automatically ensuring divergence-free far-field computation. Based on this formulation, we developed a new FMM-based vortex method on heterogeneous architectures, which distributed the work between multicore CPUs and GPUs to best utilize the hardware resources and achieve excellent scalability. The algorithm uses new data structures which can dynamically manage inter-node communication and load balance efficiently, with only a small parallel construction overhead. This algorithm can scale to large-sized clusters showing both strong and weak scalability. Careful error and timing trade-off analysis are also performed for the cutoff functions induced by the vortex particle method. Our implementation can perform one time step of the velocity+stretching calculation for one billion particles on 32 nodes in 55.9 seconds, which yields 49.12 Tflop/s.
We use a particle-based method to simulate incompressible flows, where the Fast Multipole Method (FMM) is used to accelerate the calculation of particle interactions. The most time-consuming kernelsâ'th...
详细信息
We use a particle-based method to simulate incompressible flows, where the Fast Multipole Method (FMM) is used to accelerate the calculation of particle interactions. The most time-consuming kernels-the Biot-Savart eq...
详细信息
ISBN:
(纸本)9781467308052
We use a particle-based method to simulate incompressible flows, where the Fast Multipole Method (FMM) is used to accelerate the calculation of particle interactions. The most time-consuming kernels-the Biot-Savart equation and stretching term of the vorticity equation-are mathematically reformulated so that only two Laplace scalar potentials are used instead of six, while automatically ensuring divergence-free far-field computation. Based on this formulation, and on our previous work for a scalar heterogeneous FMM algorithm, we develop a new FMM-based vortex method capable of simulating general flows including turbulence on heterogeneous architectures. Our work for this poster focuses on the computation perspective and our implementation can perform one time step of the velocity+stretching for one billion particles on 32 nodes in 55.9 seconds, which yields 49.12 Tflop/s.
We use a particle-based method to simulate incompressible flows, where the Fast Multipole Method (FMM) is used to accelerate the calculation of particle interactions. The most time-consuming kernelsâ"the Bio...
详细信息
ISBN:
(纸本)9781467308052
We use a particle-based method to simulate incompressible flows, where the Fast Multipole Method (FMM) is used to accelerate the calculation of particle interactions. The most time-consuming kernelsâ"the Biot-Savart equation and stretching term of the vorticity equationâ"are mathematically reformulated so that only two Laplace scalar potentials are used instead of six, while automatically ensuring divergence-free far-field computation. Based on this formulation, and on our previous work for a scalar heterogeneous FMM algorithm, we develop a new FMM-based vortex method capable of simulating general flows including turbulence on heterogeneous architectures, which distributes the work between multi-core CPUs and GPUs to best utilize the hardware resources and achieve excellent scalability. The algorithm also uses new data structures which can dynamically manage inter-node communication and load balance efficiently but with only a small parallel construction overhead. This algorithm can scale to large-sized clusters showing both strong and weak scalability. Careful error and timing trade-off analysis are also performed for the cutoff functions induced by the vortex particle method. Our implementation can perform one time step of the velocity+stretching for one billion particles on 32 nodes in 55.9 seconds, which yields 49.12 Tflop/s.
暂无评论