This paper describes an implementation of a highly scalable parallel computational facility with high speedup efficiency using relatively low-cost hardware, which consists of a cluster of desktop personal computers (P...
详细信息
This paper describes an implementation of a highly scalable parallel computational facility with high speedup efficiency using relatively low-cost hardware, which consists of a cluster of desktop personal computers (PCs) connected via a 10-Gigabit Ethernet. Two-levels of parallelization were implemented. Communication between different PCs was achieved using message passing interface (MPI) protocol. Domain decomposition was automated and based on element numbering. Domain continuity was assured largely by re-numbering the elements using a "front squasher" code prior to decomposition. Within each PC, the shared memory parallelization was implemented using either the open multiprocessing (OpenMP) or the MPI protocol. analysis of three different problems with number of degrees-of-freedom ranging from about 129,000 to about 2,260,000 shows a speedup efficiency generally above 70%. Super-linear speedup was achieved in several of the cases examined in this study, with the hybrid MPI-OpenMP approach generally performing better compared to the pure MPI method for parallelization. The results demonstrate the feasibility of acquiring a parallel computing facility with relatively modest outlay that is within the reach of consulting or engineering offices. (C) 2016 Elsevier Ltd. All rights reserved.
finiteelementanalysis for large-scale complicated structure such as aluminum reduction cell makes higher demand on memory capacity and calculation speed, resulting in failure or inefficiency of traditional serial co...
详细信息
ISBN:
(纸本)9781479900305
finiteelementanalysis for large-scale complicated structure such as aluminum reduction cell makes higher demand on memory capacity and calculation speed, resulting in failure or inefficiency of traditional serial computation for such large-scale problems constantly. The parallelization of Jacobi preconditioned conjugate gradient (PCG) algorithm and boundary conditions processing is discussed based on domain decomposition method (DDM). And the storage of coefficient matrix for each sub-domain is studied based on element-by-element (EBE) strategy. Coordinate-based division is used for task partition considering structural characteristics of the aluminum reduction cell. Subsequently, parallel program of finiteelementanalysis based on DDM is developed using C language and MPI standard library, and then applied to numeric simulation of electric field distribution in aluminum reduction cell. parallel performance of both entire and each part of the program developed is analyzed and parallel efficiency of DDM and EBE method is compared. Experiment results show that the method is of significantly high acceleration performance and can greatly shorten the calculation time, which indicates the effectiveness of DDM's application in parallel finite element analysis of large-scale complicated structures.
This paper studies and compares the domain partitioning algorithms presented by Farhat,(1) Al-Nasra and Nguyen,(2) Malone,(3) and Simon(4)/Hsieh et al.(5,6) for load balancing in parallel finite element analysis. Both...
详细信息
This paper studies and compares the domain partitioning algorithms presented by Farhat,(1) Al-Nasra and Nguyen,(2) Malone,(3) and Simon(4)/Hsieh et al.(5,6) for load balancing in parallel finite element analysis. Both the strengths and weaknesses of these algorithms are discussed. Some possible improvements to the partitioning algorithms are also suggested and studied. A new approach for evaluating domain partitioning algorithms is described. Direct numerical comparisons among the considered partitioning algorithms are then conducted using this suggested approach with both regular and irregular finiteelement meshes of different order and dimensionality. The test problems used in the comparative studies along with the results obtained provide a set of benchmark examples for other researchers to evaluate both new and existing partitioning algorithms. In addition, interactive graphics tools used in this work to facilitate the evaluation and comparative studies are presented.
This paper studies and compares the domain partitioning algorithms presented by Farhat, Al-Nasra and Nguyen, Malone, and Simon/Hsieh et al. , for load balancing in parallel finite element analysis. Both the strengths ...
详细信息
Dynamic finiteelement analyses of a four-story steel building frame modeled as a fine mesh of solid elements are performed using E-Simulator, which is a parallel finite element analysis software package for precisely...
详细信息
Dynamic finiteelement analyses of a four-story steel building frame modeled as a fine mesh of solid elements are performed using E-Simulator, which is a parallel finite element analysis software package for precisely simulating collapse behaviors of civil and building structures. E-Simulator is under development at the National Research Institute for Earth Science and Disaster Prevention (NIED), Japan. A full-scale shake-table test for a four-story frame was conducted using E-Defense at NIED, which is the largest shaking table in the world. A mesh of the entire structure of a four-story frame with approximately 19 million degrees of freedom is constructed using solid elements. The density of the mesh is determined by referring to the results of elastic-plastic buckling analyses of a column of the frame using meshes of different densities. Therefore, the analysis model of the frame is well verified. Seismic response analyses under 60, 100, and 115% excitations of the JR Takatori record of the 1995 Hyogoken-Nanbu earthquake are performed. Note that the simulation does not reproduce the collapse under the 100% excitation of the Takatori record in the E-Defense test. Therefore, simulations for the 115% case are also performed. The results obtained by E-Simulator are compared with those obtained by the E-Defense full-scale test in order to validate the results obtained by E-Simulator. The shear forces and interstory drift angles of the first story obtained by the simulation and the test are in good agreement. Both the response of the entire frame and the local deformation as a result of elastic-plastic buckling are simulated simultaneously using E-Simulator. Copyright (c) 2014 John Wiley & Sons, Ltd.
This paper describes a parallel fast generation method of large-scale meshes for a hierarchical domain decomposition method implemented in the open source parallelfiniteelement software ADVENTURE. Since large-scale ...
详细信息
This paper describes a parallel fast generation method of large-scale meshes for a hierarchical domain decomposition method implemented in the open source parallelfiniteelement software ADVENTURE. Since large-scale meshes need to be generated in order to perform various analyses in Japan's Petaflops Supercomputer, nicknamed the "K computer", a mesh refinement function and a communication table generation function without communication are newly developed and implemented for the hierarchical domain decomposition tool named ADVENTURE_Metis. The developed new version is named ADVENTURE_Metis Ver.2. Since a generation cost of a communication table for sending and receiving data among computational nodes becomes so expensive for the refined large-scale mesh, the present authors have newly developed a parallel algorithm such that the communication tables of vertices, edges and faces are updated each other during mesh refinement after the initial communication tables of vertices, edges and faces are generated for an initial mesh. As a result, the generation of a refined mesh model over billions degrees of freedom (DOFs) from an initial medium-size mesh model of about a million DOFs can be performed in a parallel computer in a short time.
暂无评论