Shared memory programming model, represented by OpenMP, has been developed rapidly with the development of multi-core technology. The convergence speed in Method of characteristics (MOC) for solving neutron transport ...
详细信息
ISBN:
(纸本)9781510825949
Shared memory programming model, represented by OpenMP, has been developed rapidly with the development of multi-core technology. The convergence speed in Method of characteristics (MOC) for solving neutron transport equation is slow in lattice calculation of nuclear design code system. However the MOC is very suitable for parallel calculation. In this paper, the OpenMP parallel programming is applied in a new neutron transport lattice physics code COSLATC, which is one essential component of COSINE (Core and System Integrated Engine for design and analysis) software package. By analyzing the OpenMP programming model and studying the design form of fork-join parallel programming model, the energy group parallel calculation is adopted in MOC module. After studying the cost model of OpenMP parallel programming and analyzing the factors that affect the performance of OpenMP parallel algorithm, this paper proposed a series of OpenMP programming optimization methods, which include the expansion and merging parallel domain, optimal loop scheduling method, etc. Moreover, aiming at the problem of unordered migration of multithreads in operating system scheduling, the rationale of thread affinity technique in OpenMP criterion is analyzed, and the implementation scheme of its interface in computer component is designed as well. The numerical results show that the calculation results of parallelization method agreed well with the original serial calculation results. The better speedup and parallel efficiency performance will be achieved by OpenMP programming optimization methods and thread affinity technique.
NASA Technical Reports Server (Ntrs) 20000108751: Mlp: a parallel programming Alternative to Mpi for New Shared Memory parallel Systems by NASA Technical Reports Server (Ntrs); NASA Technical Reports Server (Ntrs); pu...
详细信息
NASA Technical Reports Server (Ntrs) 20000108751: Mlp: a parallel programming Alternative to Mpi for New Shared Memory parallel Systems by NASA Technical Reports Server (Ntrs); NASA Technical Reports Server (Ntrs); published by
NASA Technical Reports Server (Ntrs) 20020063612: F-Nets and Software Cabling: Deriving a Formal Model and Language for Portable parallel programming by NASA Technical Reports Server (Ntrs); NASA Technical Reports Ser...
详细信息
NASA Technical Reports Server (Ntrs) 20020063612: F-Nets and Software Cabling: Deriving a Formal Model and Language for Portable parallel programming by NASA Technical Reports Server (Ntrs); NASA Technical Reports Server (Ntrs); published by
While OpenMP is the de facto standard of shared memory parallel programming models, a number of alternative programming models and runtime systems have arisen in recent years. Fairly evaluating these programming syste...
详细信息
GPU and multicore hardware architectures are commonly used in many different application areas to accelerate problem solutions relative to single CPU architectures. The typical approach to accessing these hardware arc...
详细信息
ISBN:
(纸本)9781450336185
GPU and multicore hardware architectures are commonly used in many different application areas to accelerate problem solutions relative to single CPU architectures. The typical approach to accessing these hardware architectures requires embedding logic into the programming language used to construct the application;the two primary forms of embedding are: calls to API routines to access the concurrent functionality, or pragmas providing concurrency hints to a language compiler such that particular blocks of code are targeted to the concurrent functionality. The former approach is verbose and semantically bankrupt, while the success of the latter approach is restricted to simple, static uses of the functionality. This paper presents an extension to an existing actor-based programming model and runtime to support executing applications on parallel hardware architectures. Besides the glove-like fit of a kernel to the actor abstraction, quantitative code analysis shows that actor-based kernels are always significantly simpler than API-based coding, and generally simpler than pragma-based coding. The structuring of applications in this manner, enables the runtime to automate the initialisation and interaction with these parallel hardware platforms. Performance measurements show that the overheads of actor-based kernels are commensurate to API based kernels, and range from equivalent to vastly improved for pragma-based annotations, both for sample and real world applications.
To overcome the restriction of unbiased predictors in kriging interpolation, Bayesian Kriging integrates prior distribution of variogram parameters such as coefficients, data vari-Ance, range, and nugget to be adopted...
详细信息
ISBN:
(纸本)9783000503375
To overcome the restriction of unbiased predictors in kriging interpolation, Bayesian Kriging integrates prior distribution of variogram parameters such as coefficients, data vari-Ance, range, and nugget to be adopted as a qualified gueb in the spatial estimation . The observation uncertainty is represented as a posterior distribution and predictive parame-Ter distribution avoiding unrealistic small regions within the observations to attain optimal unbiased linear interpolation through Bayesian kriging algorithm. Prior to estimate the pre-dictive spatial distributions, the procedure includes multiple computations of an emperical variogram for the petrophysical properties given posterior distribution of the variogram pa-rameters to create many equiprobable reservoir stochastic images. Based on the statistical evaluation, these realizations are ranked to select three quartiles (P10, P50, and P90).
Heterogeneous nodes composed of a multicore CPU and accelerators are today's norm in high-performance computing (HPC) platforms due to their superior performance and energy efficiency. Tools such as OpenCL and hyb...
详细信息
Heterogeneous nodes composed of a multicore CPU and accelerators are today's norm in high-performance computing (HPC) platforms due to their superior performance and energy efficiency. Tools such as OpenCL and hybrid combinations such as OpenMP plus OpenACC are used for developing portable parallel programs for such nodes. However, these tools have some drawbacks, including a lack of compiler support for nested parallelism, performance portability, automatic heterogeneous workload distribution, user-friendly thread placement, and processor affinity essential to the portable performance of hybrid programs executing on such nodes. In this paper, we propose OpenH, a novel programming model and library API for developing portable parallel programs on heterogeneous hybrid servers composed of a multicore CPU and one or more different types of accelerators. OpenH integrates Pthreads, OpenMP, and OpenACC seamlessly to facilitate the development of hybrid parallel programs. An OpenH hybrid parallel program starts as a single main thread, creating a group of Pthreads called hosting Pthreads. A hosting Pthread then leads the execution of a software component of the program, either an OpenMP multithreaded component running on the CPU cores or an OpenACC (or OpenMP) component running on one of the accelerators of the server. The OpenH library provides API functions that allow programmers to get the configuration of the executing environment and bind the hosting Pthreads (and hence the execution of components) of the program to the CPU cores of the hybrid server to get the best performance. We illustrate the OpenH programming model and library API using two hybrid parallel applications based on matrix multiplication and 2D fast Fourier transform for the most general case of a hybrid hyperthreaded server comprising $p$ computing devices. Finally, we demonstrate the practical performance and energy consumption of OpenH for the hybrid parallel matrix multiplication application on a
Computed tomography is used nowadays for analyzing the problem in the human body and it plays a very important role in diagnosing defects in the patients. Computed tomography only became feasible with the development ...
详细信息
ISBN:
(纸本)9781467385954
Computed tomography is used nowadays for analyzing the problem in the human body and it plays a very important role in diagnosing defects in the patients. Computed tomography only became feasible with the development of computer signal processing capabilities. Technology is improved to capture the inner parts of the human body from 2D to 3D and also from 3D to 4D. A tomographic image is a cross sectional images or slices through the body. A radiologist has to analyze the slices one by one for detecting any defect, it takes long time when the number of slices is more and hence the time for doing the analysis was more. This paper presents a system which predicts the affected areas of human lungs from slices obtained from CT scan Machine, using parallel image processing and enhancing algorithms, to assist radiologists to make their final decisions. The proposed model was tested on the human lung for the detection of cancer. The scanned images are stored in the form of Digital Imaging and Communication in Medicine (DICOM).
Although the OpenMP 4.0 standard has been available since 2013, support for GPUs has been absent up until very recently, with only a handful of experimental compilers available. In this work we evaluate the performanc...
详细信息
ISBN:
(纸本)9781509036837
Although the OpenMP 4.0 standard has been available since 2013, support for GPUs has been absent up until very recently, with only a handful of experimental compilers available. In this work we evaluate the performance of Cray's new NVIDIA GPU targeting implementation of OpenMP 4.0, with the mini-apps TeaLeaf, CloverLeaf and BUDE. We successfully port each of the applications, using a simple and consistent design throughout, and achieve performance on an NVIDIA K20X that is comparable to Cray's OpenACC in all cases. BUDE, a compute bound code, required 2.2x the runtime of an equivalently optimised CUDA code, which we believe is caused by an inflated frequency of control flow operations and less efficient arithmetic optimisation. Impressively, both TeaLeaf and CloverLeaf, memory bandwidth bound codes, only required 1.3x the runtime of hand-optimised CUDA implementations. Overall, we find that OpenMP 4.0 is a highly usable open standard capable of performant heterogeneous execution, making it a promising option for scientific application developers.
暂无评论