Generally, the single gpucomputing method is utilized for the conventional radix sort algorithm based on gpuparallelcomputing. Nevertheless, as the data scale grows, the single gpu sorting algorithm is gradually de...
详细信息
ISBN:
(数字)9798350391954
ISBN:
(纸本)9798350391961;9798350391954
Generally, the single gpucomputing method is utilized for the conventional radix sort algorithm based on gpuparallelcomputing. Nevertheless, as the data scale grows, the single gpu sorting algorithm is gradually demonstrating its performance bottlenecks. In the paper, an efficient radix sort algorithm based on multi-gpu parallel computing is proposed, which implements a strategy of using different bucket classifications on multiple gpus to improve the sorting performance and efficiency of large-scale datasets. With the multi-gpu parallel computing, more buckets may be used for data classification in one traversal, effectively reducing data sorting times, lowering time complexity, and improving sorting speed and throughput. The experiment shows that the algorithm has significantly improved the operational efficiency, demonstrating good application prospects. Meanwhile, the algorithm herein also presents good scalability, which can adapt to the constantly growing data scale in the future.
3D finite -difference time -domain numerical simulation and reconstruction based on the domain decomposition technique are essential parts of high-performance computation for reverse -time migration and full -waveform...
详细信息
3D finite -difference time -domain numerical simulation and reconstruction based on the domain decomposition technique are essential parts of high-performance computation for reverse -time migration and full -waveform inversion. However, the low gpu utilization in computing for small -sized models and the tremendous memory consumption for large -sized models may result in low computational efficiency and high memory costs. This paper proposes a contiguous memory management (CMM) method and a variable -order wavefield reconstruction (VWR) method. The CMM allocates the memory of many small -sized arrays used for MPI communications on a larger -sized contiguous memory block, which aims to reduce the number of MPI communications between subdomains and improve the communication bandwidth, thus reducing the MPI time overhead and improving the gpu utilization. Meanwhile, the VWR can flexibly set the number of layers of boundary wavefield used for source wavefield reconstruction according to the host memory capacity and accuracy requirements. Since one layer of boundary wavefield could be stored using the VWR, the memory consumption of host memory can be significantly alleviated. Numerical experiments show that gpu utilization in computing for the model with a size of 121 3 can be improved from 25% to 90% using the CMM method, and the VWR method can reduce memory consumption by about 86% while maintaining good accuracy in wavefield reconstruction. In addition, the issue of how to obtain a domain decomposition scheme with optimal performance is discussed in this paper.
The airframe/engine integrated configurations with dorsal intake can effectively decouple the inlet from the lifting surface, leading to an excellent aerodynamic performance at a small angle of attack at hypersonic sp...
详细信息
The airframe/engine integrated configurations with dorsal intake can effectively decouple the inlet from the lifting surface, leading to an excellent aerodynamic performance at a small angle of attack at hypersonic speeds. For a comprehensive and systematic investigation of the dorsal intake configuration, it is necessary to compare it with a ventral intake configuration of equal scale. However, most of the present assessments for dorsal intake configurations focus on the fundamental aerodynamic characteristics with a lack of stability and control characteristics. Therefore, the computational fluid dynamics numerical simulation based on multi-gpu parallel computing is introduced for comparing the basic static stability and control characteristics between the dorsal and ventral intake configurations. The results show that the dorsal intake configuration has superior three-channel static stability at hypersonic speeds than the ventral intake configuration. However, this also results in worse control characteristics, which may be remedied by relaxing the static stability margin or installing canards. Moreover, the selection of intake configuration only interferes with the coupling of the pitch channel and does not affect the coupling of the yaw and roll channels.(c) 2023 Elsevier Masson SAS. All rights reserved.
As the high-density seismic acquisition is progressively becoming an essential mode of seismic exploration, developing high-precision imaging methods for 3D datasets has become an urgent industry requirement. Unfortun...
详细信息
As the high-density seismic acquisition is progressively becoming an essential mode of seismic exploration, developing high-precision imaging methods for 3D datasets has become an urgent industry requirement. Unfortunately, the two-dimensional least-squares reverse time migration (LSRTM) can hardly deal with the influence of the third dimension because the structure of reflectors is constantly changing in nearly all directions. Furthermore, anisotropy is very common in underground media, which makes the LSRTM based on isotropic assumption fail to locate the reflectors accurately and seriously limits the identification of geological stratification. Considering the above two problems, this paper derives the least-squares reverse time migration in threedimensional vertical transversely isotropic media (3D-VTI-LSRTM) and develops a workflow which can realize 3D-LSRTM in VTI media based on this theory. Our workflow uses multi-gpu acceleration and heterogeneous CPU/gpuparallelism. Compared with the general parallel CPU workflow or single gpu workflow, it is feasible that the gpu application significantly improves computing efficiency in practical applications. In addition, this paper extends VTI-LSRTM from 2D to 3D, which solves the problem that the 2D algorithm cannot eliminate the lateral direction reflection artifacts. For the self-made 3D-VTI depression model and the 3D-VTI-SEG/EAGE salt model, we use four NVIDIA Tesla k40c graphics cards for testing. The results of the tests are correct and significantly improved compared to the 3D-VTI-RTM method.
The integrated design of the airframe and inlet is the favorite choice for the aerodynamic configuration design of the air-breathing hypersonic vehicle. To better understand the integrated configurations with a dorsal...
详细信息
The integrated design of the airframe and inlet is the favorite choice for the aerodynamic configuration design of the air-breathing hypersonic vehicle. To better understand the integrated configurations with a dorsal intake, this paper utilizes the computational fluid dynamics (CFD) method to comprehensively compare the performance of integrated configurations with the dorsal and traditionally used ventral intake. The performance indexes include aerodynamic performance, such as lift-to-drag ratio, the inlet performance, and thrust-to-drag balance performance. First, under the same geometric constraints, two integrated aerodynamic configurations with dorsal and ventral intake are designed based on a truncated Busemann inlet. Second, the CFD numerical simulation method based on multi-gpu parallel computing is developed and verified. Then, the flow-field characteristics, the aerodynamic, inlet, and thrust-to -drag balance performance of the two types of configurations in a wide range of Mach numbers are comprehensively compared using the gpu-based efficient numerical simulation method. The advantages and disadvantages of the two types of configurations are then analyzed to provide a meaningful reference for the aerodynamic configuration design of air-breathing hypersonic vehicles.(c) 2022 Elsevier Masson SAS. All rights reserved.
Under-resolved unsteady incompressible flow computations employed on coarser grids are presented. Kinetically Reduced Local Navier-Stokes (KRLNS) equations is a newly method to deal with unsteady incompressible flows,...
详细信息
Under-resolved unsteady incompressible flow computations employed on coarser grids are presented. Kinetically Reduced Local Navier-Stokes (KRLNS) equations is a newly method to deal with unsteady incompressible flows, which is applicable to the unsteady incompressible flows without the need for sub iterations and is capable of capturing the correct transient behavior. To stabilize the computations and achieve high accuracy, the KRLNS equations is discretized with higher order standard and optimized types of central finite difference method (FDM) together with artificial dissipation or spatial filter and is integrated by using 4-stage Runge-Kutta method. Numerical simulations of 2D doubly periodic shear layers are carried out on coarser regular grids and the computed solutions are compared with those obtained on finer grids. The parallel computations are implemented on multiple gpu (Testa K40) system with 4 gpus, based on the domain decomposition method and the acceleration is investigated. It is found that the solution obtained by optimized type of FDM is more accurate than that of standard one especially for much coarser grids, and that the proposed approach is easy to perform the parallel computations and obtain a large acceleration according to the number of gpus used. (C) 2019 Elsevier Ltd. All rights reserved.
Numerical simulations of 2D doubly periodic shear layers and 3D decaying homogeneous isotropic turbulence are presented using Kinetically Reduced Local Navier-Stokes (KRLNS) equations that is applicable to the unstead...
详细信息
Numerical simulations of 2D doubly periodic shear layers and 3D decaying homogeneous isotropic turbulence are presented using Kinetically Reduced Local Navier-Stokes (KRLNS) equations that is applicable to the unsteady incompressible flows without the need for sub-iterations and is capable of capturing the correct transient behavior. To achieve high accuracy, the KRLNS equations is discretized with higher order central difference approximations and 4-stage Runge-Kutta method. The results are compared with the solutions obtained by Lattice Boltzmann method (LBM) and pseudo-spectral method (PSM), which is the standard approach for this problem. parallel computations are carried out on multiple gpus (Tesla K40), maximum 4 gpus available, based on the domain decomposition method and the speedup of the KRLNS equations is investigated. It is found that all three methods can capture the transient flow fields of unsteady incompressible flow and a large speedup for the KRLNS equations is obtained. (C)18 Elsevier Ltd. All rights reserved.
暂无评论