In recent years, heterogeneous HPC systems, which combine traditional processors with accelerator cards such as GPUs, have been shown to deliver superior performance and power efficiency. Since different scientific pr...
详细信息
ISBN:
(纸本)9781479989379
In recent years, heterogeneous HPC systems, which combine traditional processors with accelerator cards such as GPUs, have been shown to deliver superior performance and power efficiency. Since different scientific problems pose different demands on the computer architecture, some general purpose supercomputers consist of different types of nodes, where each type is suited best for certain applications. Such clusters with inter-node heterogeneity (different types of nodes) on top of intra-node heterogeneity (different processors inside one node) consist of compute nodes with different compute performances. The standard implementation of the linpack benchmark, HPL, distributes the workload evenly among all processes and thus cannot exploit the cluster's full potential if the nodes have unequal performance. This paper presents a new feature of our HPL-GPU implementation which allows a balanced fine-tuned workload distribution among all compute nodes taking into account their individual compute capabilities. We present results on some nodes of different speed-grades on the LOEWE-CSC cluster and demonstrate that our implementation can utilize all nodes of a heterogeneous configuration efficiently showing only about 3% granularity loss.
暂无评论