检索结果-内蒙古大学图书馆

A two-level parallelization method for distributed hydrological models

ENVIRONMENTAL MODELLING & SOFTWARE 2016年 80卷 175-184页

作者： Liu, Junzhi Zhu, A-Xing Qin, Cheng-Zhi Wu, Hui Jiang, Jingchao Nanjing Normal Univ Minist Educ Key Lab Virtual Geog Environm 1 Wenyuan Rd Nanjing 210023 Jiangsu Peoples R China Jiangsu Ctr Collaborat Innovat Geog Informat Reso 1 Wenyuan Rd Nanjing 210023 Jiangsu Peoples R China Chinese Acad Sci Inst Geog Sci & Nat Resources Res State Key Lab Resources & Environm Informat Syst Beijing 100101 Peoples R China Univ Wisconsin Dept Geog Madison WI 53706 USA Hangzhou Dianzi Univ 115 Wenyi Rd Hangzhou 310012 Zhejiang Peoples R China Smart City Res Ctr Zhejiang Prov 115 Wenyi Rd Hangzhou 310012 Zhejiang Peoples R China

This paper proposes a scalable two-level parallelization method for distributed hydrological models that can use parallelizability at both the sub-basin level and the basic simulation-unit level (e.g., grid cell) simultaneously. This approach first uses the message-passing programming model to dispatch parallel tasks at the sub-basin level to different nodes with multi-core CPUs in the cluster. Each node is responsible for some of the sub-basins. Parallel tasks for each sub-basin at the basic simulation-unit level are then dispatched to multiple cores within each node using the shared-memory programming model. A grid-based distributed hydrological model was parallelized to demonstrate the performance of the proposed method, which was tested in different scenarios (e.g., different data volume, different numbers of sub-basins). Results show that the proposed two-level parallelization method had better scalability than the parallel computation at sub-basin level alone, and the parallel performance increased with data volume and the number of sub-basins. (C) 2016 Elsevier Ltd. All rights reserved.

关键词： Distributed hydrological model two-level parallelization Multi-core cluster Sub-basin Basic simulation unit

来源：评论

学校读者我要写书评

暂无评论

two-level dynamic load-balanced p-adaptive discontinuous Galerkin methods for CFD simulations

引用

COMPUTERS & MATHEMATICS WITH APPLICATIONS 2024年 176卷 165-178页

作者： Jang, Yongseok Martin, Emeric Chapelier, Jean-Baptiste Couaillier, Vincent Polytech Inst Paris French Aerosp Lab DAAA ONERA F-92320 Chatillon France

We present a novel approach utilizing two-level dynamic load balancing for p-adaptive discontinuous Galerkin (DG) methods in compressible Computational Fluid Dynamics (CFD) simulations. The high-order explicit first stage, specifically the singly diagonal implicit Runge-Kutta (ESDIRK) method, is employed for time integration, where the pseudo-transient continuation is integrated with the restarted generalized minimal residual (GMRES) method to handle the solution of nonlinear equations at each stage of ESDIRK, excluding the initial stage. Relying on smoothness indicators, we carry out the refinement/coarsening process for p-adaptation with dynamic load balancing. This approach involves a coarse level (distributed memory) decomposition based on MPI paradigm and a fine level (shared memory) decomposition based on OpenMP paradigm, enhancing parallel efficiency. Dynamic load balancing is achieved by computing weights based on degrees of freedom, ensuring balanced computational loads across processors. The parallel computing framework adopts either a graph-based type (ParMETIS and Zoltan) or space-filling curves type (GeMPa) for coarse level partitioning, and a graph-based type (METIS and Zoltan) for fine level partitioning. The effectiveness of the method is demonstrated through numerical examples, highlighting its potential to significantly improve the scalability and efficiency of compressible flow simulations. The numerical simulations were conducted using the CODA flow solver, a state-of-the-art tool developed collaboratively by the French National Aerospace Center (ONERA), the German Aerospace Center (DLR), and Airbus.

关键词： Dynamic load balancing p-adaptation High-order discontinuous Galerkin method two-level parallelization CFD

来源：评论

学校读者我要写书评

暂无评论

A hybrid parallel Delaunay image-to-mesh conversion algorithm scalable on distributed-memory clusters

引用

COMPUTER-AIDED DESIGN 2018年 103卷 34-46页

作者： Feng, Daming Chernikov, Andrey N. Chrisochoides, Nikos P. Old Dominion Univ Dept Comp Sci Norfolk VA 23529 USA

In this paper, we present a scalable three dimensional parallel Delaunay image-to-mesh conversion algorithm. A nested master worker model is used to simultaneously explore process- and thread-level parallelization. The mesh generation includes two stages: coarse and fine meshing. First, a coarse mesh is constructed in parallel by the threads of the master process. Then the coarse mesh is partitioned. Finally, the fine mesh refinement procedure is executed until all the elements in the mesh satisfy the quality and fidelity criteria. The communication and computation are separated during the fine mesh refinement procedure. The master thread of each process that initializes the MPI environment is in charge of the inter-node MPI communication for data (submesh) movement while the worker threads of each process are responsible for the local mesh refinement within the node. We conducted a set of experiments to test the performance of the algorithm on distributed memory clusters and observed that the granularity of coarse level data decomposition, which affects the coarse level concurrency, has a significant influence on the performance of the algorithm. With the proper value of granularity, the algorithm is scalable to 45 distributed memory compute nodes (900 cores). (C) 2017 Elsevier Ltd. All rights reserved.

关键词： Hybrid programming Parallel mesh generation Image-to-mesh conversion two-level parallelization

来源：评论

学校读者我要写书评

暂无评论

Improvement and verification of the DeCART code for HTGR core physics analysis

引用

NUCLEAR ENGINEERING AND TECHNOLOGY 2019年第1期51卷 13-30页

作者： Cho, Jin Young Han, Tae Young Park, Ho Jin Hong, Ser Gi Lee, Hyun Chul Korea Atom Energy Res Inst 111 Daedeok Daero 989 Beongil Daejeon 34057 South Korea Kyung Hee Univ 1732 Deogyeong Daero Yongin 17104 Gyeonggi Do South Korea Pusan Natl Univ 2 Busandaehak Ro 63beon Gil Busan 46241 South Korea

This paper presents the recent improvements in the DeCART code for HTGR analysis. A new 190-group DeCART cross-section library based on ENDF/B-VII.0 was generated using the KAERI library processing system for HTGR. two methods for the eigen-mode adjoint flux calculation were implemented. An azimuthal angle discretization method based on the Gaussian quadrature was implemented to reduce the error from the azimuthal angle discretization. A two-level parallelization using MPI and OpenMP was adopted for massive parallel computations. A quadratic depletion solver was implemented to reduce the error involved in the Gd depletion. A module to generate equivalent group constants was implemented for the nodal codes. The capabilities of the DeCART code were improved for geometry handling including an approximate treatment of a cylindrical outer boundary, an explicit border model, the R-G-B checkerboard model, and a super-cell model for a hexagonal geometry. The newly improved and implemented functionalities were verified against various numerical benchmarks such as OECD/MHTGR-350 benchmark phase III problems, two-dimensional high temperature gas cooled reactor benchmark problems derived from the MHTGR-350 reference design, and numerical benchmark problems based on the compact nuclear power source experiment by comparing the DeCART solutions with the Monte-Carlo reference solutions obtained using the McCARD code. (C) 2018 Korean Nuclear Society, Published by Elsevier Korea LLC.

关键词： DeCART HTGR Adjoint solver two-level parallelization Numerical benchmark

来源：评论

学校读者我要写书评

暂无评论

A Hybrid Parallel Delaunay Image-to-Mesh Conversion Algorithm Scalable on Distributed-Memory Clusters 25th

A Hybrid Parallel Delaunay Image-to-Mesh Conversion Algorith...

引用

25th International Meshing Roundtable (IMR)

作者： Feng, Daming Chernikov, Andrey N. Chrisochoides, Nikos P. Old Dominion Univ Dept Comp Sci Norfolk VA 23508 USA

In this paper, we present a scalable three dimensional hybrid MPI+ Threads parallel Delaunay image-to-mesh conversion algorithm. A nested master-worker communication model for parallel mesh generation is implemented which simultaneously explores process-level parallelization and thread-level parallelization: inter-node communication using MPI and inter-core communication inside one node using threads. In order to overlap the communication (task request and data movement) and computation (parallel mesh refinement), the inter-node MPI communication and intra-node local mesh refinement is separated. The master thread that initializes the MPI environment is in charge of the inter-node MPI communication while the worker threads of each process are only responsible for the local mesh refinement within the node. We conducted a set of experiments to test the performance of the algorithm on Turing, a distributed memory cluster at Old Dominion University High Performance Computing Center and observed that the granularity of coarse level data decomposition, which affects the coarse level concurrency, has a significant influence on the performance of the algorithm. With the proper value of granularity, the algorithm expresses impressive performance potential and is scalable to 30 distributed memory compute nodes with 20 cores each (the maximum number of nodes available for us in the experiments). (C) 2016 The Authors. Published by Elsevier Ltd.

关键词： Hybrid Programming Parallel Mesh Generation Nested Master-Worker Model two-level parallelization

来源：评论

学校读者我要写书评

暂无评论

A Hybrid Parallel Delaunay Image-to-mesh Conversion Algorithm Scalable on Distributed-memory Clusters

引用

Procedia Engineering 2016年 163卷 59-71页

作者： Daming Feng Andrey N. Chernikov Nikos P. Chrisochoides Computer Science Department Old Dominion University Norfolk VA 23508 USA

In this paper, we present a scalable three dimensional hybrid MPI+Threads parallel Delaunay image-to-mesh conversion algorithm. A nested master-worker communication model for parallel mesh generation is implemented which simultaneously explores process-level parallelization and thread-level parallelization: inter-node communication using MPI and inter-core communication inside one node using threads. In order to overlap the communication (task request and data movement) and computation (parallel mesh refinement), the inter-node MPI communication and intra-node local mesh refinement is separated. The master thread that initializes the MPI environment is in charge of the inter-node MPI communication while the worker threads of each process are only responsible for the local mesh refinement within the node. We conducted a set of experiments to test the performance of the algorithm on Turing, a distributed memory cluster at Old Dominion University High Performance Computing Center and observed that the granularity of coarse level data decomposition, which affects the coarse level concurrency, has a significant influence on the performance of the algorithm. With the proper value of granularity, the algorithm expresses impressive performance potential and is scalable to 30 distributed memory compute nodes with 20 cores each (the maximum number of nodes available for us in the experiments).

关键词： Hybrid Programming Parallel Mesh Generation Nested Master-Worker Model two-level parallelization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：