检索结果-内蒙古大学图书馆

multi-gpu Radix Sort Algorithm in High Performance computing Environment 27

Multi-GPU Radix Sort Algorithm in High Performance Computing...

27th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and parallel/Distributed computing (SNPD)

作者： Sun Hongdi Jia Minzheng Gao Zhu Beijing Polytech Coll Beijing Peoples R China Beijing Guang Yu Online Technol Co Ltd Beijing Peoples R China

ISBN: (数字)9798350391954

ISBN: (纸本)9798350391961;9798350391954

Generally, the single gpu computing method is utilized for the conventional radix sort algorithm based on gpu parallel computing. Nevertheless, as the data scale grows, the single gpu sorting algorithm is gradually demonstrating its performance bottlenecks. In the paper, an efficient radix sort algorithm based on multi-gpu parallel computing is proposed, which implements a strategy of using different bucket classifications on multiple gpus to improve the sorting performance and efficiency of large-scale datasets. With the multi-gpu parallel computing, more buckets may be used for data classification in one traversal, effectively reducing data sorting times, lowering time complexity, and improving sorting speed and throughput. The experiment shows that the algorithm has significantly improved the operational efficiency, demonstrating good application prospects. Meanwhile, the algorithm herein also presents good scalability, which can adapt to the constantly growing data scale in the future.

关键词： Radix Sort multi-gpu parallel computing Bucket Calculation Strategy

来源：评论

学校读者我要写书评

暂无评论

Simulation and reconstruction for 3D elastic wave using multi-gpu and CUDA-aware MPI

引用

COMPUTERS & GEOSCIENCES 2024年 190卷

作者： Cai, Wei Zhu, Peimin Li, Ziang China Univ Geosci Sch Geophys & Geomat Wuhan Peoples R China

3D finite -difference time -domain numerical simulation and reconstruction based on the domain decomposition technique are essential parts of high-performance computation for reverse -time migration and full -waveform inversion. However, the low gpu utilization in computing for small -sized models and the tremendous memory consumption for large -sized models may result in low computational efficiency and high memory costs. This paper proposes a contiguous memory management (CMM) method and a variable -order wavefield reconstruction (VWR) method. The CMM allocates the memory of many small -sized arrays used for MPI communications on a larger -sized contiguous memory block, which aims to reduce the number of MPI communications between subdomains and improve the communication bandwidth, thus reducing the MPI time overhead and improving the gpu utilization. Meanwhile, the VWR can flexibly set the number of layers of boundary wavefield used for source wavefield reconstruction according to the host memory capacity and accuracy requirements. Since one layer of boundary wavefield could be stored using the VWR, the memory consumption of host memory can be significantly alleviated. Numerical experiments show that gpu utilization in computing for the model with a size of 121 3 can be improved from 25% to 90% using the CMM method, and the VWR method can reduce memory consumption by about 86% while maintaining good accuracy in wavefield reconstruction. In addition, the issue of how to obtain a domain decomposition scheme with optimal performance is discussed in this paper.

关键词： 3D elastic FDTD Wavefield reconstruction High performance computing Domain decomposition technique multi-gpu parallel computing CUDA-aware MPI

来源：评论

学校读者我要写书评

暂无评论

Basic static stability and control characteristics of hypersonic vehicles with dorsal and ventral intake

引用

AEROSPACE SCIENCE AND TECHNOLOGY 2023年第1期134卷

作者： Luo, Shibin Sun, Yuhang Liu, Jun Song, Jiawen Cao, Wenbin Cent South Univ Res Inst Aerosp Technol Changsha 410083 Peoples R China

The airframe/engine integrated configurations with dorsal intake can effectively decouple the inlet from the lifting surface, leading to an excellent aerodynamic performance at a small angle of attack at hypersonic speeds. For a comprehensive and systematic investigation of the dorsal intake configuration, it is necessary to compare it with a ventral intake configuration of equal scale. However, most of the present assessments for dorsal intake configurations focus on the fundamental aerodynamic characteristics with a lack of stability and control characteristics. Therefore, the computational fluid dynamics numerical simulation based on multi-gpu parallel computing is introduced for comparing the basic static stability and control characteristics between the dorsal and ventral intake configurations. The results show that the dorsal intake configuration has superior three-channel static stability at hypersonic speeds than the ventral intake configuration. However, this also results in worse control characteristics, which may be remedied by relaxing the static stability margin or installing canards. Moreover, the selection of intake configuration only interferes with the coupling of the pitch channel and does not affect the coupling of the yaw and roll channels.(c) 2023 Elsevier Masson SAS. All rights reserved.

关键词： Airframe engine integration Dorsal intake configuration Static stability and control multi-gpu parallel computing Hypersonic vehicle

来源：评论

学校读者我要写书评

暂无评论

3D least-squares reverse time migration in VTI media based on pseudoacoustic wave equation and multi-gpu parallel acceleration

引用

JOURNAL OF APPLIED GEOPHYSICS 2023年第1期213卷

作者： Ding, Yi Li, Zhenchun Zhang, Kai Gao, Xue Chen, Feixu China Univ Petr East China Sch Geosci Qingdao 266580 Peoples R China China Univ Petr East China Key Lab Deep Oil & Gas Qingdao 266580 Peoples R China SINOPEC Jiangsu Oilfield Jiangsu Oilfield Subco Oil Prod Plant 1 Yangzhou 225009 Peoples R China PetroChina Tarim Oilfield Co Res Inst Petr Explorat & Dev Korla 841000 Peoples R China

As the high-density seismic acquisition is progressively becoming an essential mode of seismic exploration, developing high-precision imaging methods for 3D datasets has become an urgent industry requirement. Unfortunately, the two-dimensional least-squares reverse time migration (LSRTM) can hardly deal with the influence of the third dimension because the structure of reflectors is constantly changing in nearly all directions. Furthermore, anisotropy is very common in underground media, which makes the LSRTM based on isotropic assumption fail to locate the reflectors accurately and seriously limits the identification of geological stratification. Considering the above two problems, this paper derives the least-squares reverse time migration in threedimensional vertical transversely isotropic media (3D-VTI-LSRTM) and develops a workflow which can realize 3D-LSRTM in VTI media based on this theory. Our workflow uses multi-gpu acceleration and heterogeneous CPU/gpu parallelism. Compared with the general parallel CPU workflow or single gpu workflow, it is feasible that the gpu application significantly improves computing efficiency in practical applications. In addition, this paper extends VTI-LSRTM from 2D to 3D, which solves the problem that the 2D algorithm cannot eliminate the lateral direction reflection artifacts. For the self-made 3D-VTI depression model and the 3D-VTI-SEG/EAGE salt model, we use four NVIDIA Tesla k40c graphics cards for testing. The results of the tests are correct and significantly improved compared to the 3D-VTI-RTM method.

关键词： Least -squares reverse time migration Anisotropy 3D case multi-gpu parallel computing

来源：评论

学校读者我要写书评

暂无评论

Performance analysis of the hypersonic vehicle with dorsal and ventral intake

引用

AEROSPACE SCIENCE AND TECHNOLOGY 2022年第PartA期131卷

作者： Luo, Shibin Sun, Yuhang Liu, Jun Song, Jiawen Cao, Wenbin Cent South Univ Res Inst Aerosp Technol Changsha 410083 Peoples R China

The integrated design of the airframe and inlet is the favorite choice for the aerodynamic configuration design of the air-breathing hypersonic vehicle. To better understand the integrated configurations with a dorsal intake, this paper utilizes the computational fluid dynamics (CFD) method to comprehensively compare the performance of integrated configurations with the dorsal and traditionally used ventral intake. The performance indexes include aerodynamic performance, such as lift-to-drag ratio, the inlet performance, and thrust-to-drag balance performance. First, under the same geometric constraints, two integrated aerodynamic configurations with dorsal and ventral intake are designed based on a truncated Busemann inlet. Second, the CFD numerical simulation method based on multi-gpu parallel computing is developed and verified. Then, the flow-field characteristics, the aerodynamic, inlet, and thrust-to -drag balance performance of the two types of configurations in a wide range of Mach numbers are comprehensively compared using the gpu-based efficient numerical simulation method. The advantages and disadvantages of the two types of configurations are then analyzed to provide a meaningful reference for the aerodynamic configuration design of air-breathing hypersonic vehicles.(c) 2022 Elsevier Masson SAS. All rights reserved.

关键词： Integrated aerodynamic configuration Dorsal intake configuration multi-gpu parallel computing Hypersonic vehicle Aerodynamic performance Inlet performance

来源：评论

学校读者我要写书评

暂无评论

Optimized finite difference method with artificial dissipation for under-resolved unsteady incompressible flow computations using kinetically reduced local Navier-Stokes equations

引用

COMPUTERS & FLUIDS 2019年 184卷 21-28页

作者： Hashimoto, T. Tanno, I. Yasuda, T. Tanaka, Y. Morinishi, K. Satofuka, N. Kindai Univ 3-4-1 Kowakae Osaka 5778502 Japan Tsukuba Univ Technol 4-3-15 Amakubo Tsukuba Ibaraki 3058520 Japan Univ Shiga Prefecture 2500 Hassaka Hikone Shiga 5228533 Japan Toyo Tire & Rubber Co Ltd 2-2-13 Fujinoki Itami Hyogo 6640847 Japan Kyoto Inst Technol Sakyo Ku Kyoto 6068585 Japan

Under-resolved unsteady incompressible flow computations employed on coarser grids are presented. Kinetically Reduced Local Navier-Stokes (KRLNS) equations is a newly method to deal with unsteady incompressible flows, which is applicable to the unsteady incompressible flows without the need for sub iterations and is capable of capturing the correct transient behavior. To stabilize the computations and achieve high accuracy, the KRLNS equations is discretized with higher order standard and optimized types of central finite difference method (FDM) together with artificial dissipation or spatial filter and is integrated by using 4-stage Runge-Kutta method. Numerical simulations of 2D doubly periodic shear layers are carried out on coarser regular grids and the computed solutions are compared with those obtained on finer grids. The parallel computations are implemented on multiple gpu (Testa K40) system with 4 gpus, based on the domain decomposition method and the acceleration is investigated. It is found that the solution obtained by optimized type of FDM is more accurate than that of standard one especially for much coarser grids, and that the proposed approach is easy to perform the parallel computations and obtain a large acceleration according to the number of gpus used. (C) 2019 Elsevier Ltd. All rights reserved.

关键词： Unsteady incompressible viscous flows Kinetically reduced local Navier-Stokes equations Artificial dissipation Spatial filter multi-gpu parallel computing

来源：评论

学校读者我要写书评

暂无评论

multi-gpu parallel computation of unsteady incompressible flows using kinetically reduced local Navier-Stokes equations

引用

COMPUTERS & FLUIDS 2018年 167卷 215-220页

作者： Hashimoto, T. Yasuda, T. Tanno, I Tanaka, Y. Morinishi, K. Satofuka, N. Kindai Univ 3-4-1 Kowakae Higashiosaka Osaka 5778502 Japan Univ Shiga Prefecture 2500 Hassaka Hikone Shiga 5228533 Japan Tsukuba Univ Technol 4-3-15 Amakubo Tsukuba Ibaraki 3058520 Japan TOYO TIRE&RUBBER CO LTD 2-2-13 Fujinoki Itami Hyogo 6640847 Japan Kyoto Inst Technol Sakyo Ku Kyoto 6068585 Japan

Numerical simulations of 2D doubly periodic shear layers and 3D decaying homogeneous isotropic turbulence are presented using Kinetically Reduced Local Navier-Stokes (KRLNS) equations that is applicable to the unsteady incompressible flows without the need for sub-iterations and is capable of capturing the correct transient behavior. To achieve high accuracy, the KRLNS equations is discretized with higher order central difference approximations and 4-stage Runge-Kutta method. The results are compared with the solutions obtained by Lattice Boltzmann method (LBM) and pseudo-spectral method (PSM), which is the standard approach for this problem. parallel computations are carried out on multiple gpus (Tesla K40), maximum 4 gpus available, based on the domain decomposition method and the speedup of the KRLNS equations is investigated. It is found that all three methods can capture the transient flow fields of unsteady incompressible flow and a large speedup for the KRLNS equations is obtained. (C)18 Elsevier Ltd. All rights reserved.

关键词： multi-gpu parallel computing CUDA C programming Unsteady incompressible viscous flows Kinetically reduced local Navier-Stokes equations Artificial compressibility method Lattice Boltzmann method Pseudo-spectral method

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：