To simulate welding induced transient thermal stress and deformation of large scale FE models, an accelerated explicit method (ACEXP) and graphical processing units (gpu) parallelcomputing program of the finite eleme...
详细信息
To simulate welding induced transient thermal stress and deformation of large scale FE models, an accelerated explicit method (ACEXP) and graphical processing units (gpu) parallelcomputing program of the finite element method (FEM) were developed. In the accelerated explicit method, a two-stage computation scheme is employed. The first computation stage is based on a dynamic explicit method considering the characteristics of the welding mechanical process by controlling both the temperature increment and time scaling parameter. In the second computation stage, a static equilibrium computation scheme is implemented after dynamic thermal loading to obtain a static solution of transient thermal stress and welding deformation. It has been demonstrated that the developed gpu parallel computing program has a good scalability for large-scale models of more than 20 million degrees of freedom. The validity of the accelerated explicit method is verified by comparing the transient thermal stress and deformation with those computed by an implicit FEM. Finally, welding deformation and residual stress in a structure model assembled from nine high-strength steel plates and 26 weld lines were efficiently analyzed by ACEXP and gpu parallel computing within 45 h. The computed welding deformation agreed well with measured results, and a good accuracy was obtained.
This paper proposes a comprehensive framework for accelerating population balance-Monte Carlo (PBMC) simulation of particle coagulation dynamics. By combining Markov jump model, weighted majorant kernel and gpu (graph...
详细信息
This paper proposes a comprehensive framework for accelerating population balance-Monte Carlo (PBMC) simulation of particle coagulation dynamics. By combining Markov jump model, weighted majorant kernel and gpu (graphics processing unit) parallelcomputing, a significant gain in computational efficiency is achieved. The Markov jump model constructs a coagulation-rule matrix of differentially-weighted simulation particles, so as to capture the time evolution of particle size distribution with low statistical noise over the full size range and as far as possible to reduce the number of time loopings. Here three coagulation rules are highlighted and it is found that constructing appropriate coagulation rule provides a route to attain the compromise between accuracy and cost of PBMC methods. Further, in order to avoid double looping over all simulation particles when considering the two-particle events (typically, particle coagulation), the weighted majorant kernel is introduced to estimate the maximum coagulation rates being used for acceptance-rejection processes by single-looping over all particles, and meanwhile the mean time-step of coagulation event is estimated by summing the coagulation kernels of rejected and accepted particle pairs. The computational load of these fast differentially-weighted PBMC simulations (based on the Markov jump model) is reduced greatly to be proportional to the number of simulation particles in a zero-dimensional system (single cell). Finally, for a spatially inhomogeneous multi-dimensional (multi-cell) simulation, the proposed fast PBMC is performed in each cell, and multiple cells are parallel processed by multi-cores on a gpu that can implement the massively threaded data-parallel tasks to obtain remarkable speedup ratio (comparing with CPU computation, the speedup ratio of gpu parallel computing is as high as 200 in a case of 100 cells with 10000 simulation particles per cell). These accelerating approaches of PBMC are demonstrated
作者:
Ma, NinshuYuan, ShijianOsaka Univ
Joining & Welding Res Inst 11-1 Mihogaoka Osaka 5670047 Japan JSOL Corp
Engn Technol Div Nishi Ku 2-2-4 Tosabori Osaka 5500001 Japan Harbin Inst Technol
Sch Mat Sci & Technol 92 West Dazhi St Harbin 15001 Peoples R China
An accelerated explicit method and gpu parallel computing program of finite element method (FEM) are developed for simulating transient thermal stress and welding deformation in large scale models. In the accelerated ...
详细信息
An accelerated explicit method and gpu parallel computing program of finite element method (FEM) are developed for simulating transient thermal stress and welding deformation in large scale models. In the accelerated explicit method, a two-stage computation scheme is employed. The first computation stage is based on a dynamic explicit method considering the characteristics of the welding mechanical process by controlling both the temperature increment and time scaling parameter. In the second computation stage, a static equilibrium computation scheme is implemented after thermal loading to obtain a static solution of transient thermal stress and welding deformation. It has been demonstrated that the developed gpu parallel computing program has a good scalability for large scale models of more than 20 million degrees of freedom (DOFs). The validity of the accelerated explicit method is verified by comparing the transient thermal deformation and residual stresses with those computed by the implicit FEM and experimental measurements. Finally, the thermal stress and strain in an automotive engine cradle model with more than 12 million DOFs were efficiently computed and the results are discussed.
The Fourier transform converts a signal from its original domain to a representation in the frequency domain. Applications of the Fourier Transform are far-reaching, spanning fields such as intelligent information pro...
详细信息
ISBN:
(纸本)9781728154534
The Fourier transform converts a signal from its original domain to a representation in the frequency domain. Applications of the Fourier Transform are far-reaching, spanning fields such as intelligent information processing, machine vision, physics, mathematics, medical science, and telecommunications;hence, its applications have become an indispensable part in our daily life. Therefore, it is essential to construct efficient and high-reliability schemes to guarantee smooth performance of the systems using Fourier Transforms. This study compares performances of Fast Fourier Transforms on a host CPU, gpu parallel computing, and gpu parallel computing with memory allocation optimization. From the experimental results, gpu parallel computing is proven to be effective in enhancing computation speed of the FFT;the speedup ratio of gpu parallel computing over the CPU can reach 48 when operating on 32678 8-byte complex input data. In addition, by optimizing gpu memory allocation, the computation speed of the FFT can be further enhanced;the speedup ratio of gpu parallel computing with memory allocation optimization over the CPU can reach 114.7 when operating on 32678 8-byte complex input data.
Petroleum geoscience big data is defined in this paper. CPU/gpu hybrid system is used to try to accelerate computing speed of petroleum geoscience big data using chaotic quantum particle swarm optimization (CQPSO) inv...
详细信息
ISBN:
(纸本)9781479986880
Petroleum geoscience big data is defined in this paper. CPU/gpu hybrid system is used to try to accelerate computing speed of petroleum geoscience big data using chaotic quantum particle swarm optimization (CQPSO) inversion method as an example, and the computing time of CQPSO is reduced significantly.
The RoboCup Middle Size League (MSL) robot soccer competition is a standard test platform for distributed multi-robot systems. There are many challenges in the vision system for MSL soccer robots. For example, huge am...
详细信息
ISBN:
(纸本)9781538637425
The RoboCup Middle Size League (MSL) robot soccer competition is a standard test platform for distributed multi-robot systems. There are many challenges in the vision system for MSL soccer robots. For example, huge amount of data from the Kinect v2 sensor leads to heavy computation burden for the robot's onboard industrial computer, the obstacle-detection algorithm is mainly dependent on the obstacle' colors, the omnidirectional vision system is not able to detect the ball above the camera and get the objects' height information. In this paper, we proposed an algorithm for object detection based on gpu parallel computing employing Kinect v2 and Jetson TX1 as the hardware platform. parallelcomputing is utilized throughout all the steps of the object detection algorithm, so the speed and accuracy of the algorithm are greatly improved. We test the real-time performance and the accuracy of the algorithm using our NuBot soccer robots. The experimental results show that objects can be detected and their 3-D information can be obtained accurately, satisfying the real-time requirements of the MSL competition and decreasing the robot's onboard computer's CPU burden. In addition, the proposed algorithm for obstacle detection is not dependent on a specific color.
High resolution remains a primary goal in the advancement of synthetic aperture radar (SAR) technology. The backprojection (BP) algorithm, which does not introduce any approximation throughout the imaging process, is ...
详细信息
High resolution remains a primary goal in the advancement of synthetic aperture radar (SAR) technology. The backprojection (BP) algorithm, which does not introduce any approximation throughout the imaging process, is broadly applicable and effectively meets the demands for high-resolution imaging. Nonetheless, the BP algorithm necessitates substantial interpolation during point-by-point processing, and the precision and effectiveness of current interpolation methods limit the imaging performance of the BP algorithm. This paper proposes a TSU-ICSI (Time-shift Upsampling-Improved Cubic Spline Interpolation) interpolation method that integrates time-shift upsampling with improved cubic spline interpolation. This method is applied to the BP algorithm and presents an efficient implementation method in conjunction with the gpu architecture. TSU-ICSI not only maintains the accuracy of BP imaging processing but also significantly boosts performance. The effectiveness of the BP algorithm based on TSU-ICSI is confirmed through simulation experiments and by processing measured data collected from both airborne SAR and spaceborne SAR.
computing speed is a significant issue of large-scale flood simulations for real-time response to disaster prevention and mitigation. Even today, most of the large-scale flood simulations are generally run on supercom...
详细信息
computing speed is a significant issue of large-scale flood simulations for real-time response to disaster prevention and mitigation. Even today, most of the large-scale flood simulations are generally run on supercomputers due to the massive amounts of data and computations necessary. In this work, a two-dimensional shallow water model based on an unstructured Godunov-type finite volume scheme was proposed for flood simulation. To realize a fast simulation of large-scale floods on a personal computer, a Graphics Processing Unit (gpu)-based, high-performance computing method using the OpenACC application was adopted to parallelize the shallow water model. An unstructured data management method was presented to control the data transportation between the gpu and CPU (Central Processing Unit) with minimum overhead, and then both computation and data were offloaded from the CPU to the gpu, which exploited the computational capability of the gpu as much as possible. The parallel model was validated using various benchmarks and real-world case studies. The results demonstrate that speed-ups of up to one order of magnitude can be achieved in comparison with the serial model. The proposed parallel model provides a fast and reliable tool with which to quickly assess flood hazards in large-scale areas and, thus, has a bright application prospect for dynamic inundation risk identification and disaster assessment.
Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limi...
详细信息
Conventional gradient-based full waveform inversion (FWI) is a local optimization, which is highly dependent on the initial model and prone to trapping in local minima. Globally optimal FWI that can overcome this limitation is particularly attractive, but is currently limited by the huge amount of calculation. In this paper, we propose a globally optimal FWI framework based on gpu parallel computing, which greatly improves the efficiency, and is expected to make globally optimal FWI more widely used. In this framework, we simplify and recombine the model parameters, and optimize the model iteratively. Each iteration contains hundreds of individuals, each individual is independent of the other, and each individual contains forward modeling and cost function calculation. The framework is suitable for a variety of globally optimal algorithms, and we test the framework with particle swarm optimization algorithm for example. Both the synthetic and field examples achieve good results, indicating the effectiveness of the framework. .
In the AI sports training system, the traditional optical imaging technology limits the resolution of the image. Therefore, the use of optical super-resolution imaging technology to improve image resolution can promot...
详细信息
暂无评论