检索结果-内蒙古大学图书馆

An improved mixed Lagrangian-Eulerian (IMLE) method for modelling incompressible Navier-Stokes flows with CUDA programming on multi-GPUs

引用

COMPUTERS & FLUIDS 2019年 184卷 99-106页

作者： Liu, Rex Kuan-Shuo Wu, Cheng-Tao Kao, Neo Shih-Chao Sheu, Tony Wen-Hann Natl Taiwan Univ Dept Engn Sci & Ocean Engn 1Sec 4Roosevelt Rd Taipei Taiwan CR Classificat Soc Res Dept 8F103Sec 3Nanjing E Rd Taipei Taiwan Natl Taiwan Univ Inst Appl Math Sci Taipei Taiwan Natl Taiwan Univ Ctr Adv Study Theoret Sci Taipei Taiwan

In this study, a GPU-accelerated improved mixed Lagrangian-Eulerian (IMLE) method is proposed to solve the three-dimensional incompressible Navier-Stokes equations. To improve the prediction accuracy, the proposed IMLE method approximates the total derivative term in Lagragian sense, and the spatial derivative terms are approximated on Eulerian coordinates. Transfer of data from Lagrangian particles to data on Eulerian grids is accurately carried out by adopting moving least squares (MLS) interpolation method. The velocity-pressure decoupling issue is overcome by adopting pressure-free projection method in which the pressure field is calculated by solving a pressure Poisson equation (PPE). It is noted that the MLS interpolation is time consuming since this procedure belongs to a pointwise scheme in which a local matrix equation shall be solved on each grid point. In addition, the discretized PPE forms a large sparse matrix and it is computationally intensive to solve by using the conjugate gradient (CG) method. Therefore, we are aimed to resort to CUDA- and openmp-programming means to accelerate the computation. In this study, the performance of the multiple GPUs code can reach up to 27 times faster with respect to multi-threads CPU performance. (C) 2019 Elsevier Ltd. All rights reserved.

关键词： Incompressible Navier-Stokes equations Moving least squares (MLS) interpolation Conjugate gradient (CG) method CUDA programming openmp programming

来源：评论

学校读者我要写书评

暂无评论

Speeding up an Adaptive Filter based ECG Signal Pre-processing on Embedded Architectures

引用

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS 2021年第5期12卷 361-369页

作者： Mejhoudi, Safa Latif, Rachid Saddik, Amine Jenkal, Wissam El Ouardi, Abdelhafid Ibn Zohr Univ ENSA Lab Syst Engn & Informat Technol Agadir Morocco Paris Saclay Univ Digiteo Labs SATIE Orsay France

Medical applications increasingly require complex calculations with constraints of accelerated processing time. These applications are therefore oriented towards the integration of high-performance embedded architectures. In this context, the detection of cardiac abnormalities is a task that remains a high priority in emergency medicine. ECG analysis is a complex task that requires significant computing time since a large amount of information must be analyzed in parallel with high frequencies. Real-time processing is the biggest challenge for researchers, when talking about applications that require time constraints like that of cardiac activity monitoring. This work evaluates the Adaptive Dual Threshold Filter (ADTF) algorithm dedicated to ECG signal filtering using various embedded architectures: A Raspberry 3B+ and Odroid XU4. The implementation has been based on C/C++ and openmp to exploit the parallelism in the used architectures. The evaluation was validated using several ECG signals proposed in MIT-BIH Arrhythmia database with a sampling frequency of 360 Hz. Based on an algorithmic complexity study and a parallelization of the functional blocks which present significant workloads, the evaluation results show a mean execution time of 7.5 ms on the Raspberry 3B+ and 0.34 ms on the Odroid XU4. With an efficient parallelization on the Odroid XU4 architecture, real-time performance can be achieved.

关键词： ECG signal denoising ADTF algorithm openmp programming embedded architectures

来源：评论

学校读者我要写书评

暂无评论

An evaluation of MPI and openmp paradigms in finite-difference explicit methods for PDEs on shared-memory multi- and manycore systems

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2020年第20期32卷 e5642-e5642页

作者： Cabral, Frederico L. Gonzaga de Oliveira, Sanderson L. Osthoff, Carla Costa, Gabriel P. Brandao, Diego N. Kischinhevsky, Mauricio LNCC CENAPAD Petropolis RJ Brazil Univ Fed Lavras DCC Lavras MG Brazil CEFET EIC Rio De Janeiro RJ Brazil Univ Fed Fluminense IC Niteroi RJ Brazil Ave Getulio Vargas 333 BR-25651075 Petropolis RJ Brazil

This paper focuses on parallel implementations of three two-dimensional explicit numerical methods on Intel (R) Xeon (R) Scalable Processor and the coprocessor Knights Landing. In this study, the performance of a hybrid parallel programming with message passing interface (MPI) and Open Multi-Processing (openmp) and a pure MPI implementation used with two thread binding policies is compared with an improved openmp-based implementation in three explicit finite-difference methods for solving partial differential equations on shared-memory multicore and manycore systems. Specifically, the improved openmp-based version is a strategy that synchronizes adjacent threads and eliminates the implicit barriers of a naive openmp-based implementation. The experiments show that the most suitable approach depends on several characteristics related to the nonuniform memory access (NUMA) effect and load balancing, such as the size of the MPI domain and the number of synchronization points used in the parallel implementation. In algorithms that use four and five synchronization points, hybrid MPI/openmp approaches yielded better speedups than the other versions did in runs performed on both systems. The pure MPI-based strategy, however, achieved better results than the other proposed approaches did in the method that employs only one synchronization point.

关键词： high-performance computing hybrid MPI openmp programming MPI multicore architectures parallelism parallel processing

来源：评论

学校读者我要写书评

暂无评论

Using hybrid MPI and openmp programming to optimize communications in parallel loop self-scheduling schemes for multicore PC clusters

引用

JOURNAL OF SUPERCOMPUTING 2012年第1期60卷 31-61页

作者： Wu, Chao-Chin Lai, Lien-Fu Yang, Chao-Tung Chiu, Po-Hsun Natl Changhua Univ Educ Dept Comp Sci & Informat Engn Changhua 500 Taiwan Tunghai Univ Dept Comp Sci & Informat Engn High Performance Comp Lab Taichung 40704 Taiwan

Recently, a series of parallel loop self-scheduling schemes have been proposed, especially for heterogeneous cluster systems. However, they employed the MPI programming model to construct the applications without considering whether the computing node is multicore architecture or not. As a result, every processor core has to communicate directly with the master node for requesting new tasks no matter the fact that the processor cores on the same node can communicate with each other through the underlying shared memory. To address the problem of higher communication overhead, in this paper we propose to adopt hybrid MPI and openmp programming model to design two-level parallel loop self-scheduling schemes. In the first level, each computing node runs an MPI process for inter-node communications. In the second level, each processor core runs an openmp thread to execute the iterations assigned for its resident node. Experimental results show that our method outperforms the previous works.

关键词： Parallel loop scheduling Cluster computing Multicore architecture MPI programming openmp programming Hybrid programming

来源：评论

学校读者我要写书评

暂无评论

Parallel implementation of the EGSnrc Monte Carlo simulation of ionizing radiation transport using openmp

引用

MEDICAL PHYSICS 2017年第12期44卷 6672-6677页

作者： Doerner, Edgardo Caprile, Paola Pontificia Univ Catolica Chile Inst Phys Santiago 7820436 Chile

Purpose: To present the implementation of a new option for parallel processing of the EGSnrc Monte Carlo system using the openmp API, as an alternative to the provided method based on the use of a batch queuing system (BQS). Methods: The parallel solution presented, called OMP_EGS, makes use of openmp features to control the workload distribution between the compute units. These features were inserted into the original EGSnrc source code through properly defined macros. In order to validate the platform, the possibility of producing results in exact agreement with the serial implementation was assessed. The performance of OMP_EGS was evaluated against the BQS method, in terms of parallel speedup and efficiency. Results: As the openmp features can be activated or deactivated depending on the compilation options, the implementation of the platform allowed the direct recovery of the original serial implementation. The validation tests showed that OMP_EGS was able to reproduce the exact same results as the serial implementation. The performance and scalability tests showed that OMP_EGS is a better alternative than the EGSnrc BQS parallel implementation, both in terms of runtime and parallel efficiency. Conclusions: The presented solution has several advantages over the BQS-based parallel implementation available for the EGSnrc system. One of the main advantages is that, in contrast to the BQS alternative, it can be implemented using different compilers and operative systems, which turns it into a compact and portable solution that can be used on a wide range of working environments. It does not introduce artifacts on the simulated distributions, as it only handles the distribution of work among the available computing resources and it proved to have a better performance. (C) 2017 American Association of Physicists in Medicine.

关键词： Monte Carlo methods multicore systems openmp programming parallel programming particle transport simulation

来源：评论

学校读者我要写书评

暂无评论

Parallel computing of multiobjective optimization of air bearing 71

Parallel computing of multiobjective optimization of air bea...

引用

71st Society of Tribologists and Lubrication Engineers Annual Meeting and Exhibition 2016

作者： Chen, Hsin-Yi Wang, Nenzi Department of Mechanical Engineering Chang Gung University Taiwan

来源：评论

学校读者我要写书评

暂无评论

IDM - A New Parallel Methodology to Calculate the Determinant of Matrices of the Order n, with Computational Complexity O(n)

引用

IEEE LATIN AMERICA TRANSACTIONS 2012年第1期10卷 1357-1363页

作者： Menezes, M. P. Pereira, C. E. M. Sato, L. M. Univ Sao Paulo Dept Automacao Energia Eletr Escola Politecn BR-09500900 Sao Paulo Brazil Univ Sao Paulo Dept Energia & Automacao Eletr Escola Politecn BR-09500900 Sao Paulo Brazil Univ Sao Paulo Inst Tecnol Aeronaut BR-09500900 Sao Paulo Brazil

This paper presents a new parallel methodology for calculating the determinant of matrices of the order n, with computational complexity O(n), using the Gauss-Jordan Elimination Method and Chio's Rule as references. We intend to present our step-by-step methodology using clear mathematical language, where we will demonstrate how to calculate the determinant of a matrix of the order n in an analytical format. We will also present a computational model with one sequential algorithm and one parallel algorithm using a pseudo-code.

关键词： Parallel Computing Parallel Methodology openmp programming Chio's Rule Gauss-Jordan Elimination Method

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：