Many scientific applications are programmed using hybridprogrammingmodels that use both message passing and shared memory, due to the increasing prevalence of large-scale systems with multicore, multisocket nodes. P...
详细信息
Many scientific applications are programmed using hybridprogrammingmodels that use both message passing and shared memory, due to the increasing prevalence of large-scale systems with multicore, multisocket nodes. Previous work has shown that energy efficiency can be improved using software-controlled execution schemes that consider both the programming model and the power-aware execution capabilities of the system. However, such approaches have focused on identifying optimal resource utilization for one programming model, either shared memory or message passing, in isolation. The potential solution space, thus the challenge, increases substantially when optimizing hybridmodels since the possible resource configurations increase exponentially. Nonetheless, with the accelerating adoption of hybridprogrammingmodels, we increasingly need improved energy efficiency in hybridparallel applications on large-scale systems. In this work, we present new software-controlled execution schemes that consider the effects of dynamic concurrency throttling (DCT) and dynamic voltage and frequency scaling (DVFS) in the context of hybridprogrammingmodels. Specifically, we present predictive models and novel algorithms based on statistical analysis that anticipate application power and time requirements under different concurrency and frequency configurations. We apply our models and methods to the NPB MZ benchmarks and selected applications from the ASC Sequoia codes. Overall, we achieve substantial energy savings (8.74 percent on average and up to 13.8 percent) with some performance gain (up to 7.5 percent) or negligible performance loss.
Exascale computing systems (ECS) are anticipated to perform at Exaflop speed (10(18) operations per second) using power consumption <20 MW. This ultrascale performance requires the speedup in the system by thousand...
详细信息
Exascale computing systems (ECS) are anticipated to perform at Exaflop speed (10(18) operations per second) using power consumption <20 MW. This ultrascale performance requires the speedup in the system by thousand-fold enhancement in current Petascale. For future high-performance computing (HPC), power consumption is one of the vital challenges faced to achieve Exaflops through the traditional way of increasing clock-speed. One standard way to attain such significant performance is through massive parallelism. In the early stages, it is hard to decide the promising parallelprogramming approach that can provide massive parallelism to attain ExaFlops. This article commences with a short description and implementation of algorithms of various hybrid parallel programming models (PPMs) for homogeneous and heterogeneous cluster systems. Furthermore, the authors evaluated performance and power consumption in these hybridmodels by implementing in two HPC benchmarking applications such as square matrix multiplication and Jacobi iterative solver for two-dimensional Laplace equation. The results demonstrated that the hybrid of heterogeneous (MPI + X) outperformed to homogeneous parallelprogramming (MPI + OpenMP) model. This empirical investigation of hybrid PPMs is a leading step for researchers and development communities to select a promising model for emerging ECS.
暂无评论