检索结果-内蒙古大学图书馆

shared memory parallelization for high-fidelity large-scale 3D polyhedral particle simulations

COMPUTERS AND GEOTECHNICS 2021年 137卷

作者： Park, Eun Hyun Kindratenko, Volodymyr Hashash, Youssef M. A. Univ Illinois Dept Civil & Environm Engn Chicago IL 60680 USA Univ Illinois Natl Ctr Supercomp Applicat Chicago IL 60680 USA

Particle shape plays a vital role in granular material behavior, but simulations with realistic particle shapes are uncommon due to significant computational demands of complex particle geometry representation. In this work, BLOKS3D, a polyhedral Discrete Element Method (DEM) and impulse-based DEM (iDEM) codes are parallelized to enable large-scale simulations with realistic particle shapes on readily accessible multi-core machines. Data structures used in the original codes were redesigned and optimized, leading to 15% improved performance of the original serial codes. New parallel algorithms were developed resulting in 28 times performance improvement on a 48-core (quad-CPU) shared memory system over single core serial algorithm. The parallelized 3D polyhedral DEM and iDEM were applied to series of column collapse simulations. The codes successfully reproduced the runout distance in granular column collapse experiments. The particle force data from both parallelized DEM and iDEM matched data from the serial algorithm. The new parallel implementation of iDEM was then demonstrated with unprecedented 52 million 3D polyhedral particles simulations. This work will benefit future granular material studies with the newly introduced capacity to run large-scale simulations with realistic particle shapes on shared memory hardware platforms readily accessible to many engineers and researchers.

关键词： shared-memory parallelization Discrete element method Polyhedral particle Impulse-based discrete element method

来源：评论

学校读者我要写书评

暂无评论

shared-memory parallelization for Two-Way Coupled Euler-Lagrange Modeling of Cavitating Bubbly Flows

引用

JOURNAL OF FLUIDS ENGINEERING-TRANSACTIONS OF THE ASME 2015年第12期137卷 121106页

作者： Ma, Jingsen Hsiao, Chao-Tsung Chahine, Georges L. Dynaflow Inc Jessup MD 20794 USA

Cavitating and bubbly flows are encountered in many engineering problems involving propellers, pumps, valves, ultrasonic biomedical applications, etc. In this contribution, an OPENMP parallelized Euler-Lagrange model of two-phase flow problems and cavitation is presented. The two-phase medium is treated as a continuum and solved on an Eulerian grid, while the discrete bubbles are tracked in a Lagrangian fashion with their dynamics computed. The intimate coupling between the two description levels is realized through the local void fraction, which is computed from the instantaneous bubble volumes and locations, and provides the continuum properties. Since, in practice, any such flows will involve large numbers of bubbles, schemes for significant speedup are needed to reduce computation times. We present here a shared-memory parallelization scheme combining domain decomposition for the continuum domain and number decomposition for the bubbles;both selected to realize maximum speedup and good load balance. The Eulerian computational domain is subdivided based on geometry into several subdomains, while for the Lagrangian computations, the bubbles are subdivided based on their indices into several subsets. The number of fluid subdomains and bubble subsets matches with the number of central processing unit (CPU) cores available in a shared-memory system. Computation of the continuum solution and the bubble dynamics proceeds sequentially. During each computation time step, all selected OPENMP threads are first used to evolve the fluid solution, with each handling one subdomain. Upon completion, the OPENMP threads selected for the Lagrangian solution are then used to execute the bubble computations. All data exchanges are executed through the shared memory. Extra steps are taken to localize the memory access pattern to minimize nonlocal data fetch latency, since severe performance penalty may occur on a nonuniform memory architecture (NUMA) multiprocessing system where thre

关键词： Euler-Lagrange model bubbly flow shared-memory parallelization OPENMP nonuniform memory access

来源：评论

学校读者我要写书评

暂无评论

Parallel branch-and-price algorithms for the single machine total weighted tardiness scheduling problem with sequence-dependent setup times

引用

COMPUTERS & OPERATIONS RESEARCH 2025年 173卷

作者： Speckenmeyer, Philipp Hilmer, Constanze Rauchecker, Gerhard Schryen, Guido Warburger Str 100 D-33098 Paderborn Germany Unaffiliated Munich Germany Unaffiliated Passau Germany

Scheduling problems occur in a broad range of real-world application fields and have attracted a huge set of research articles. However, there is only little research on exact algorithms for scheduling problems, many of which are NP-hard in the strong sense. We investigate the problem on a single machine with a total weighted tardiness objective function and sequence-dependent setup times. First, we adopt a serial branch-and-price algorithm from the literature and present a modified branching strategy and a primal heuristic. Second, we use the potential of parallel computing architectures by presenting two parallel versions of the branch- and-price algorithm. Third, we conduct extensive computational experiments to show that our parallelization approaches provide substantial parallel speedups on well-known benchmark instances from the literature. We further observe that the parallel speedups achieved by our parallel algorithms are very robust among all tested instances.

关键词： Single machine scheduling Weighted tardiness Sequence-dependent setup times Branch-and-price algorithm Dynamic programming shared-memory parallelization

来源：评论

学校读者我要写书评

暂无评论

A machine learning method to variable classification in OpenMP

引用

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE 2023年 140卷 67-78页

作者： Shen, Yuanyuan Peng, Manman Wu, Qiang Li, Renfa Hunan Univ Coll Comp Sci & Elect Engn Changsha 410082 Hunan Peoples R China Hunan Univ Key Lab Embedded & Network Comp Hunan Prov Changsha 410082 Hunan Peoples R China

OpenMP is a parallel computing framework that provides programmers with a set of directives and clauses to use when writing parallel applications. The most important task in adopting OpenMP is deciding the parallel pattern with associated clauses to employ in a sequential program that already exists. The shared-memory parallelization is complicated by parallel directives with different roles. Some tools have been developed to assist programmers in developing parallel programs using OpenMP. Many tools, however, have constraints on the size of program analysis, OpenMP scoping, and scalar and array reduction. Manually selecting clauses with the necessary data-sharing attributes is also prone to errors. In this study, we target the variable classification in directives to explore the loop-level parallelism. We set the variable classification problem as a type inference task based on a machine learning method, which understands the attributes of variables in certain contexts and relations. We propose an aligned corpus of tokens and types to predict variable attributes used inside the target loop. We support the reduction clause whenever it is applicable. Experimental results indicate that our method is very promising and favorably suited to dealing with real-world complex programs while showing high accuracy.(c) 2022 Elsevier B.V. All rights reserved.

关键词： Parallel computing shared-memory parallelization OpenMP Variable classification Type inference Machine learning

来源：评论

学校读者我要写书评

暂无评论

A parallel shared-memory implementation of a high-order accurate solution technique for variable coefficient Helmholtz problems

引用

COMPUTERS & MATHEMATICS WITH APPLICATIONS 2020年第4期79卷 996-1011页

作者： Beams, Natalie Gillman, Adrianna Hewett, Russell J. Rice Univ Dept Computat & Appl Math Houston TX 77251 USA Total E&P Res & Technol Houston TX USA Virginia Tech Dept Math Blacksburg VA 24061 USA

The recently developed Hierarchical Poincare-Steklov (HPS) method is a high-order discretization technique that comes with a direct solver. Results from previous papers demonstrate the method's ability to solve Helmholtz problems to high accuracy without the so-called pollution effect. While the asymptotic scaling of the direct solver's computational cost is the same as the nested dissection method, serial implementations of the solution technique are not practical for large scale numerical simulations. This manuscript presents the first parallel implementation of the HPS method. Specifically, we introduce an approach for a shared memory implementation of the solution technique utilizing parallel linear algebra. This approach is the foundation for future large scale simulations on supercomputers and clusters with large memory nodes. Performance results on a desktop computer (resembling a large memory node) are presented. (C) 2019 Elsevier Ltd. All rights reserved.

关键词： Numerical partial differential equations Direct solver High-order discretization OpenMP shared-memory parallelization MKL

来源：评论

学校读者我要写书评

暂无评论

shared-memory block-based fast marching method for hierarchical meshes

引用

JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS 2021年 392卷

作者： Quell, Michael Diamantopoulos, Georgios Hoessinger, Andreas Weinbub, Josef TU Wien Inst Microelect Christian Doppler Lab High Performance TCAD Gusshausstr 27-29 A-1040 Vienna Austria Silvaco Europe Ltd Cambridge PE27 5JL England

The fast marching method is commonly used in expanding front simulations in various fields, such as, fluid dynamics, computer graphics, and in microelectronics, to restore the signed-distance field property of the level-set function, also known as re-distancing. To improve the performance of the re-distancing step, parallel algorithms for the fast marching method as well as support for hierarchical meshes have been developed;the latter to locally support higher resolutions of the simulation domain whilst limiting the impact on the overall computational demand. In this work, the previously developed multi-mesh fast marching method is extended by a so-called block-based decomposition step to improve serial and parallel performance on hierarchical meshes. OpenMP tasks are used for the underlying coarse-grained parallelization on a per mesh basis. The developed approach offers improved load balancing as the algorithm employs a high mesh partitioning degree, enabling to balance mesh partitions with varying mesh sizes. Various benchmarks and parameter studies are performed on representative geometries with varying complexities. The serial performance is increased by up to 21% whereas parallel speedups ranging from 7.4 to 19.1 for various test cases on a 24-core Intel Skylake computing platform have been achieved, effectively doubling the parallel efficiency of the previous approach. (C) 2021 The Author(s). Published by Elsevier B.V.

关键词： Fast marching method Eikonal equation Level-set method Re-distancing Hierarchical meshes shared-memory parallelization

来源：评论

学校读者我要写书评

暂无评论

AN EASILY IMPLEMENTED, BLOCK-BASED FAST MARCHING METHOD WITH SUPERIOR SEQUENTIAL AND PARALLEL PERFORMANCE

引用

SIAM JOURNAL ON SCIENTIFIC COMPUTING 2019年第5期41卷 C446-C478页

作者： Yang, Jianming Fidesi Solut LLC Iowa City IA 52244 USA

The fast marching method is well known for its worst-case optimal computational complexity in solving the Eikonal equation, and it has been employed in numerous scientific and engineering fields. However, it has barely benefited from fast-advancing multicore and many-core architectures because of the challenges presented by its apparent sequential nature. In this paper, we present a straightforward block-based approach for a highly scalable parallelization of the fast marching method on shared-memory computers. Central to our new algorithm is a simplified restarted narrow band strategy, which is used to replace the global bound for terminating the front marching by an incremental bound that increases by a given stride in each restart. Our algorithm greatly reduces load imbalance among blocks through a synchronized exchanging step after the marching step. Furthermore, simple activation mechanisms are introduced to skip blocks not involved in either step. Thus the new algorithm mainly consists of two loops, each for performing one step on a group of blocks. Notably, it does not contain any data race conditions, which is ideal for a direct loop level parallelization using multiple threads. A systematic performance study is carried out for several point source problems with various speed functions on grids with up to 1024(3) points using two different computers. Substantial parallel speedups, e.g., up to 3-5 times the number of cores on a 16-core/32-thread computer and up to 2 orders of magnitude on a 64-core/256-thread computer, are demonstrated with all cases. As an extra bonus, our new algorithm also gives much improved sequential performance in most scenarios. Detailed pseudocodes are provided to illustrate the modification procedure from the single-block to the multiblock sequential algorithm in a step-by-step manner.

关键词： Eikonal equation fast marching method narrow band approach domain decomposition parallel algorithm shared-memory parallelization

来源：评论

学校读者我要写书评

暂无评论

Performance modeling and optimization of parallel LU-SGS on many-core processors for 3D high-order CFD simulations

引用

JOURNAL OF SUPERCOMPUTING 2017年第6期73卷 2506-2524页

作者： Li, Dali Xu, Chuanfu Cheng, Bin Xiong, Min Gao, Xiang Deng, Xiaogang Natl Univ Def Technol Coll Comp Sci Changsha 410073 Hunan Peoples R China State Key Lab Aerodynam POB 211 Mianyang 621000 Peoples R China Natl Univ Def Technol Changsha 410073 Hunan Peoples R China

As a typical Gauss-Seidel method, the inherent strong data dependency of lower-upper symmetric Gauss-Seidel (LU-SGS) poses tough challenges for shared-memory parallelization. On early multi-core processors, the pipelined parallel LU-SGS approach achieves promising scalability. However, on emerging many-core processors such as Xeon Phi, experience from our in-house high-order CFD program show that the parallel efficiency drops dramatically to less than 25%. In this paper, we model and analyze the performance of the pipelined parallel LU-SGS algorithm, present a two-level pipeline (TL-Pipeline) approach using nested OpenMP to further exploit fine-grained parallelisms and mitigate the parallel performance bottlenecks. Our TL-Pipeline approach achieves 20% performance gains for a regular problem (256 x 256 x 256) on Xeon Phi. We also discuss some practical problems including domain decomposition and algorithm parameters tuning for realistic CFD simulations. Generally, our work is applicable to the shared-memory parallelization of all Gauss-Seidel like methods with intrinsic strong data dependency.

关键词： LU-SGS Multi-/many-core processor Xeon Phi Pipeline shared-memory parallelization CFD

来源：评论

学校读者我要写书评

暂无评论

A smooth transition from serial to parallel processing in the industrial petroleum system modeling package PetroMod

引用

COMPUTERS & GEOSCIENCES 2008年第11期34卷 1473-1479页

作者： Buecker, H. Martin Kauerauf, Armin I. Rasch, Arno Rhein Westfal TH Aachen Inst Comp Sci D-52074 Aachen Germany Integrated Explorat Syst GmbH D-52072 Aachen Germany

Petroleum system modeling is a crucial technology to numerically simulate the generation, migration, accumulation, and loss of oil and gas through geologic time. The OpenMP programming paradigm is used to achieve modest parallelism on a shared-memory computer for the industrial petroleum system modeling package PetroMod. The significant advantage of this shared-memory parallelization approach is the simplicity of the OpenMP paradigm allowing a smooth transition to a parallel program by incrementally parallelizing a serial code. The process of using OpenMP to parallelize PetroMod is outlined and performance results on a Sun Fire E2900 are reported. (C) 2008 Elsevier Ltd. All rights reserved.

关键词： Petroleum system modeling Parallel computing shared-memory parallelization OpenMP Monte Carlo

来源：评论

学校读者我要写书评

暂无评论

Distributed- and shared-memory parallelizations of assignment-based data association for multitarget tracking

引用

ANNALS OF OPERATIONS RESEARCH 1999年 90卷 293-322页

作者： Popp, RL Pattipati, KR Bar-Shalom, YB Alphatech Inc Burlington MA 01803 USA Univ Connecticut Dept Elect & Syst Engn Storrs CT 06269 USA

To date, there has been a lack of efficient and practical distributed- and shared-memory parallelizations of the data association problem for multitarget tracking. Filling this gap is one of the primary focuses of the present work. We begin by describing our data association algorithm in terms of an Interacting Multiple Model (IMM) state estimator embedded into an optimization framework, namely, a two-dimensional (2D) assignment problem (i.e., weighted bipartite matching). Contrary to conventional wisdom, we show that the data association (or optimization) problem is not the major computational bottleneck;instead, the interface to the optimization problem, namely, computing the rather numerous gating tests and IMM state estimates, covariance calculations, and likelihood function evaluations (used as cost coefficients in the 2D assignment problem), is the primary source of the workload. Hence, for both a general-purpose shared-memory MIMD (Multiple Instruction Multiple Data) multiprocessor system and a distributed-memory Intel Paragon high-performance computer, we developed parallelizations of the data association problem that focus on the interface problem. For the former, a coarse-grained dynamic parallelization was developed that realizes excellent performance (i.e., superlinear speedups) independent of numerous factors influencing problem size (e.g., many models in the IMM, denseycluttered environments, contentious target-measurement data, etc.). For the latter, an SPMD (Single Program Multiple Data) parallelization was developed that realizes near-linear speedups using relatively simple dynamic task allocation algorithms. Using a real measurement database based on two FAA air traffic control radars, we show that the parallelizations developed in this work offer great promise in practice.

关键词： distributed-memory parallelization shared-memory parallelization data association assignment problem task allocation problem multitarget tracking

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：