KAERI (Korea Atomic Energy Research Institute) has been developing a multi-dimensional two-phase flow code named CUPID for multi-physics and multi-scale thermal hydraulics analysis of Light water reactors (LWRs). The ...
详细信息
KAERI (Korea Atomic Energy Research Institute) has been developing a multi-dimensional two-phase flow code named CUPID for multi-physics and multi-scale thermal hydraulics analysis of Light water reactors (LWRs). The CUPID code has been validated against a set of conceptual problems and experimental data. In this work, the CUPID code has been parallelized based on the domain decomposition method with Message passing interface (MPI) library. For domain decomposition, the CUPID code provides both manual and automatic methods with METIS library. For the effective memory management, the Compressed sparse row (CSR) format is adopted, which is one of the methods to represent the sparse asymmetric matrix. CSR format saves only non-zero value and its position (row and column). By performing the verification for the fundamental problem set, the parallelization of the CUPID has been successfully confirmed. Since the scalability of a parallel simulation is generally known to be better for fine mesh system, three different scales of mesh system are considered: 40000 meshes for coarse mesh system, 320000 meshes for mid-size mesh system, and 2560000 meshes for fine mesh system. In the given geometry, both single- and two-phase calculations were conducted. In addition, two types of preconditioners for a matrix solver were compared: Diagonal and incomplete LU preconditioner. In terms of enhancement of the parallel performance, the OpenMP & MPI hybrid parallel computing for a pressure solver was examined. It is revealed that the scalability of hybrid calculation was enhanced for the multi-core parallel computation.
A computational fluid dynamics(CFD)solver for a GPU/CPU heterogeneous architecture parallel computing platform is developed to simulate incompressible flows on billion-level grid *** solve the Poisson equation,the con...
详细信息
A computational fluid dynamics(CFD)solver for a GPU/CPU heterogeneous architecture parallel computing platform is developed to simulate incompressible flows on billion-level grid *** solve the Poisson equation,the conjugate gradient method is used as a basic solver,and a Chebyshev method in combination with a Jacobi sub-preconditioner is used as a *** developed CFD solver shows good performance on parallel efficiency,which exceeds 90%in the weak-scalability test when the number of grid points allocated to each GPU card is greater than *** the acceleration test,it is found that running a simulation with 10403 grid points on 125 GPU cards accelerates by 203.6x over the same number of CPU *** developed solver is then tested in the context of a two-dimensional lid-driven cavity flow and three-dimensional Taylor-Green vortex *** results are consistent with previous results in the literature.
The distributed basin model (DBM) has become one of the most effective tools in river basin studies. In order to overcome the efficiency bottleneck of DBM, an effective parallel-computing method, named temporal-spatia...
详细信息
The distributed basin model (DBM) has become one of the most effective tools in river basin studies. In order to overcome the efficiency bottleneck of DBM, an effective parallel-computing method, named temporal-spatial discretization method (TSDM), is proposed. In space, TSDM adopts the sub-basin partitioning manner to the river basin. Compared to the existing sub-basin-based parallel methods, more computable units can be supplied, organized and dispatched using TSDM. Through the characteristic of the temporal-spatial dual discretization, TSDM is capable of exploiting the river-basin parallelization degree to the maximum extent and obtaining higher computing performance. A mathematical formula assessing the maximum speedup ratio (MSR) of TSDM is provided as well. TSDM is independent of the implementation of any physical models and is preliminarily tested in the Lhasa River basin with 1-year rainfall-runoff process simulated. The MSR acquired in the existing traditional way is 7.98. Comparatively, the MSR using TSDM equals to 15.04 under the present limited computing resources, which appears to still have potential to keep increasing. The final results demonstrate the effectiveness and applicability of TSDM. (c) 2013 Elsevier Ltd. All rights reserved.
Energy consumption is one of the top challenges for achieving the next generation of supercomputing. Codesign of hardware and software is critical for improving energy efficiency (EE) for future large-scale systems. M...
详细信息
Energy consumption is one of the top challenges for achieving the next generation of supercomputing. Codesign of hardware and software is critical for improving energy efficiency (EE) for future large-scale systems. Many architectural power-saving techniques have been developed, and most hardware components are approaching physical limits. Accordingly, parallel computing software, including both applications and systems, should exploit power-saving hardware innovations and manage efficient energy use. In addition, new power-aware parallel computing methods are essential to decrease energy usage further. This article surveys software-based methods that aim to improve EE for parallel computing. It reviews the methods that exploit the characteristics of parallel scientific applications, including load imbalance and mixed precision of floating-point (FP) calculations, to improve EE. In addition, this article summarizes widely used methods to improve power usage at different granularities, such as the whole system and per application. In particular, it describes the most important techniques to measure and to achieve energy-efficient usage of various parallel computing facilities, including processors, memories, and networks. Overall, this article reviews the state-of-the-art of energy-efficient methods for parallel computing to motivate researchers to achieve optimal parallel computing under a power budget constraint.
Direct numerical simulation (DNS) of complex flows require solving the problem on parallel machines using high accuracy schemes. Compact schemes provide very high spectral resolution, while satisfying the physical dis...
详细信息
Direct numerical simulation (DNS) of complex flows require solving the problem on parallel machines using high accuracy schemes. Compact schemes provide very high spectral resolution, while satisfying the physical dispersion relation numerically. However, as shown here, compact schemes also display bias in the direction of convection - often producing numerical instability near the inflow and severely damping the solution, always near the outflow. This does not allow its use for parallel computing using domain decomposition and solving the problem in parallel in different sub-domains. To avoid this, in all reported parallel computations with compact schemes the full domain is treated integrally, while using parallel Thomas algorithm (PTA) or parallel diagonal dominant (PDD) algorithm in different processors with resultant latencies and inefficiencies. For domain decomposition methods using compact scheme in each sub-domain independently, a new class of compact schemes is proposed and specific strategies are developed to remove remaining problems of parallel computing. This is calibrated here for parallel computing by solving one-dimensional wave equation by domain decomposition method. We also provide the error norm with respect to the wavelength of the propagated wave-packet. Next, the advantage of the new compact scheme, on a parallel framework, has been shown by solving three-dimensional unsteady Navier-Stokes equations for flow past a cone-cylinder configuration at a Mach number of 4. Additionally, a test case is conducted on the advection of a vortex for a subsonic case to provide an estimate for the error and parallel efficiency of the method using the proposed compact scheme in multiple processors. (c) 2006 Elsevier Inc. All rights reserved.
Advanced scientific and engineering problems require massively parallel computing. Critical to the design-and ultimately the performance-of such computing systems is the interconnection network binding the computing e...
详细信息
Advanced scientific and engineering problems require massively parallel computing. Critical to the design-and ultimately the performance-of such computing systems is the interconnection network binding the computing elements, just as is the cardiovascular network to the human body. This paper develops a new interconnection network, ''Tori connected mESHes (TESH),'' consisting of k-ary n-cube connection of supernodes that comprise meshes of lower level nodes. its key features are the following: it is hierarchical, thus allowing exploitation of computation locality as well as easy expansion (up to a million processors), and it appears to be well suited for 3-D VLSI implementation, for it requires far fewer number of vertical wires than almost all known multi-computer networks. Presented in the paper are the architecture of the new network. node addressing and message routing. 3-D VLSI/ULSI considerations, and application of the network to massively parallel computing. Specifically, we discuss the mapping on to the network of stack filtering;a hardware oriented technique for order statistic image filtering.
The Top500 supercomputers ranking has been held twice a year according to Linpack performance for more than 20years, which greatly stimulates the development of high-performance computing. However, it is still not cle...
详细信息
The Top500 supercomputers ranking has been held twice a year according to Linpack performance for more than 20years, which greatly stimulates the development of high-performance computing. However, it is still not clear how to determine the scale limit of supercomputers. It will undoubtedly cause a waste of resources if we build bigger and bigger supercomputers without caring about other aspects of cost, energy, reliability. Thus, this paper analyses the scalability and scale limit for parallel computing with a reliability requirement. We use a Markov chain to model the state transition process of a parallel computing system, so the probability of parallel tasks running on machines successfully can be evaluated, that is the reliability of parallel computing. When parallel computing carries out an iso-speed efficiency extension under specific reliability requirements, we present an approach to calculate the maximum number of processing nodes and the maximum workload of parallel tasks, which actually reveals the function relation between the scale limit and the speed efficiency of parallel computing. Taking Tianhe-2, which is the current No. 1 supercomputer, as an example, we utilize our methods to do a case study and predict its scale limit. Finally, a simulation experiment is conducted to verify our theory.
Emergencies of metro systems have become more frequent in rush hours, which have significant consequences for metro planning, designing, operating, and even the passengers' daily travel. The motivation of this pap...
详细信息
Emergencies of metro systems have become more frequent in rush hours, which have significant consequences for metro planning, designing, operating, and even the passengers' daily travel. The motivation of this paper is to establish a hybrid metro simulation method with high efficiency and sufficient precision. To this end, a discrete-event simulation method based on a multi-agent model with parallel computing is proposed to estimate the effects of emergencies efficiently. Firstly, the trains' motion algorithms are developed to compute the train speed profile for normal operation and metro emergency operation, respectively. Moreover, three types of agents (passenger, station, and train agents) are classified for rescheduling calculation, and six types of events are defined to discretize the emergency simulation process. Furthermore, a parallel computing method is proposed to accelerate the simulation process. Finally, a case study of the Yizhuang Line in Beijing metro is conducted to verify the effectiveness of the proposed simulation methodology. The results have proved the effectiveness and practicality of the proposed simulation method and the influence of the positions where emergencies occur and the emergency durations upon delays of trains and passengers. (C) 2021 Elsevier B.V. All rights reserved.
CubeSats, the class of small standardized satellites, are quickly becoming a prevalent scientific research tool. The desire to perform ambitious missions using multiple CubeSats will lead to innovations in thruster te...
详细信息
CubeSats, the class of small standardized satellites, are quickly becoming a prevalent scientific research tool. The desire to perform ambitious missions using multiple CubeSats will lead to innovations in thruster technology and will require new tools for the development of cooperative trajectory planning. To meet this need, a new software tool was created to compute propellant-minimizing maneuvers for two or more CubeSats. By including parallelization techniques, this tool is shown to run significantly faster than its serial counterpart. (C) 2014 IAA. Published by Elsevier Ltd. All rights reserved.
Speedup, a commercial software package for the dynamic modeling of chemical processes, has been coupled with the PVM software for parallel computation. As an initial application, a coarse distribution technique was ap...
详细信息
Speedup, a commercial software package for the dynamic modeling of chemical processes, has been coupled with the PVM software for parallel computation. As an initial application, a coarse distribution technique was applied to model a batch chemical plant containing 16 unit operations. Computation time for this problem was reduced by a factor of two using only three parallel processors in the UNIX computing environment. This acceleration was achieved from the reduction in computations required to both solve the smaller subprocesses and to reinitialize the subprocesses at discontinuities in the solution. The process was physically divided at points that naturally separated the overall plant into distinct subprocesses. This facilitated the computation by minimizing the interconnection between the parallel units. Techniques were developed to make efficient material and energy transfers between the modeled subprocesses based on actual material transfers used in plant operations. (C) 2004 Elsevier Ltd. All rights reserved.
暂无评论