Increasing node and cores-per-node counts in supercomputers render scheduling and load balancing critical for exploiting parallelism. openmp applications can achieve high performance via careful selection of schedulin...
详细信息
Increasing node and cores-per-node counts in supercomputers render scheduling and load balancing critical for exploiting parallelism. openmp applications can achieve high performance via careful selection of scheduling kind and chunk parameters on a per-loop, per-application, and per-system basis from a portfolio of advanced scheduling algorithms (Korndorfer etal., 2022). This selection approach is time-consuming, challenging, and may need to change during execution. We propose Auto4OMP, a novel approach for automated load balancing of openmp applications. With Auto4OMe we introduce three scheduling algorithm selection methods and an expert-defined chunk parameter for openmp's schedule clause's kind and chunk, respectively. Auto4OMP extends the openmp schedule (auto) and chunk parameter implementation in LLVM's openmp runtime library to automatically select a scheduling algorithm and calculate a chunk parameter during execution. Loop characteristics are inferred in Auto4OMP from the loop execution over the application's time-steps. The experiments performed in this work show that Auto4OMP improves applications performance by up to 11% compared to LLVM's schedule (auto) implementation and outperforms manual selection. Auto4OMP improves MPI+openmp applications performance by explicitly minimizing thread- and implicitly reducing process-load imbalance.
暂无评论