The proceedings contain 9 papers. The topics discussed include: a new vision for coarray Fortran;a comparative study and empirical evaluation of global view HPL program in X10;evaluating error detection capabilities o...
ISBN:
(纸本)9781605588360
The proceedings contain 9 papers. The topics discussed include: a new vision for coarray Fortran;a comparative study and empirical evaluation of global view HPL program in X10;evaluating error detection capabilities of UPC run-time systems;a practical study of UPC with the NAS parallel benchmarks;UPC performance evaluation on a multicore system;evaluation of UPC programmability using classroom studies;ScaleUPC: a UPC compiler for multi-core systems;a simple parallel approximation algorithm for maximum weight matching;and fast PGAS connected components algorithms.
High-performance Linpack (HPL) benchmark is used to evaluate the performance of super computers. It implements blocked, right-looking Gaussian elimination with row partial pivoting. Block cyclic distribution is used i...
详细信息
A one-sided programming model separates communication from synchronization, and is the driving principle behind partitionedglobaladdressspace (PGAS) libraries such as global Arrays (GA) and SHMEM. PGAS models expos...
详细信息
ISBN:
(纸本)9781509028238
A one-sided programming model separates communication from synchronization, and is the driving principle behind partitionedglobaladdressspace (PGAS) libraries such as global Arrays (GA) and SHMEM. PGAS models expose a rich set of functionality that a developer needs in order to implement mathematical algorithms that require frequent multidimensional array accesses. However, use of existing PGAS libraries in application codes often requires significant development effort in order to fully exploit these programming models. On the other hand, a vast majority of scientific codes use MPI either directly or indirectly via third-party scientific computation libraries, and need features to support application-specific communication requirements (e.g., asynchronous update of distributed sparse matrices, commonly arising in machine learning workloads). For such codes it is often impractical to completely shift programming models in favor of special one-sided communication middleware. Instead, an elegant and productive solution is to exploit the one-sided functionality already offered by MPI-3 RMA (Remote Memory Access). We designed a general one-sided interface using the MPI-3 passive RMA model for remote matrix operations in the linear algebra library Elemental;we call the interface we designed RMAInterface. Elemental is an open source library for distributed-memory dense and sparse linear algebra and optimization. We employ RMAInterface to construct a global Arrays-like API and demonstrate its performance scalability and competitivity with that of the existing GA (with ARMCI-MPI) for a quantum chemistry application.
Nekbone is a proxy application of Nek5000, a scalable Computational Fluid Dynamics (CFD) code used for modeling incompressible flows. The Nekbone mini-application is used by several international co-design centers to ...
详细信息
ISBN:
(纸本)9781467365987
Nekbone is a proxy application of Nek5000, a scalable Computational Fluid Dynamics (CFD) code used for modeling incompressible flows. The Nekbone mini-application is used by several international co-design centers to explore new concepts in computer science and to evaluate their performance. We present the design and implementation of a new communication kernel in the Nekbone mini-application with the goal of studying the performance of different parallel communication models. First, a new MPI blocking communication kernel has been developed to solve Nekbone problems in a three-dimensional Cartesian mesh and process topology. The new MPI implementation delivers a 13% performance improvement compared to the original implementation. The new MPI communication kernel consists of approximately 500 lines of code against the original 7,000 lines of code, allowing experimentation with new approaches in Nekbone parallel communication. Second, the MPI blocking communication in the new kernel was changed to the MPI non-blocking communication. third, we developed a new partitionedglobaladdressspace (PGAS) communication kernel, based on the GPI-2 library. This approach reduces the synchronization among neighbor processes and is on average 3% faster than the new MPI-based, non-blocking, approach. In our tests on 8,192 processes, the GPI-2 communication kernel is 3% faster than the new MPI non -blocking communication kernel. In addition, we have used the OpenMP in all the versions of the new communication kernel. Finally, we highlight the future steps for using the new communication kernel in the parent application Nek5000.
A one-sided programming model separates communication from synchronization, and is the driving principle behind partitionedglobaladdressspace (PGAS) libraries such as global Arrays (GA) and SHMEM. PGAS models expos...
详细信息
A one-sided programming model separates communication from synchronization, and is the driving principle behind partitionedglobaladdressspace (PGAS) libraries such as global Arrays (GA) and SHMEM. PGAS models expose a rich set of functionality that a developer needs in order to implement mathematical algorithms that require frequent multidimensional array accesses. However, use of existing PGAS libraries in application codes often requires significant development effort in order to fully exploit these programming models. On the other hand, a vast majority of scientific codes use MPI either directly or indirectly via third-party scientific computation libraries, and need features to support application-specific communication requirements (e.g., asynchronous update of distributed sparse matrices, commonly arising in machine learning workloads). For such codes it is often impractical to completely shift programming models in favor of special one-sided communication middleware. Instead, an elegant and productive solution is to exploit the one-sided functionality already offered by MPI-3 RMA (Remote Memory Access). We designed a general one-sided interface using the MPI-3 passive RMA model for remote matrix operations in the linear algebra library Elemental, we call the interface we designed RMAInterface. Elemental is an open source library for distributed-memory dense and sparse linear algebra and optimization. We employ RMAInterface to construct a global Arrays-like API and demonstrate its performance scalability and competitivity with that of the existing GA (with ARMCI-MPI) for a quantum chemistry application.
Nekbone is a proxy application of Nek5000, a scalable Computational Fluid Dynamics (CFD) code used for modelling incompressible flows. The Nekbone mini-application is used by several international co-design centers to...
详细信息
Nekbone is a proxy application of Nek5000, a scalable Computational Fluid Dynamics (CFD) code used for modelling incompressible flows. The Nekbone mini-application is used by several international co-design centers to explore new concepts in computer science and to evaluate their performance. We present the design and implementation of a new communication kernel in the Nekbone mini-application with the goal of studying the performance of different parallel communication models. First, a new MPI blocking communication kernel has been developed to solve Nekbone problems in a three-dimensional Cartesian mesh and process topology. The new MPI implementation delivers a 13% performance improvement compared to the original implementation. The new MPI communication kernel consists of approximately 500 lines of code against the original 7,000 lines of code, allowing experimentation with new approaches in Nekbone parallel communication. Second, the MPI blocking communication in the new kernel was changed to the MPI non-blocking communication. third, we developed a new partitionedglobaladdressspace (PGAS) communication kernel, based on the GPI-2 library. This approach reduces the synchronization among neighbor processes and is on average 3% faster than the new MPI-based, non-blocking, approach. In our tests on 8,192 processes, the GPI-2 communication kernel is 3% faster than the new MPI non-blocking communication kernel. In addition, we have used the OpenMP in all the versions of the new communication kernel. Finally, we highlight the future steps for using the new communication kernel in the parent application Nek5000.
暂无评论