DLA-Future implements an efficient GPU-enabled distributed eigenvalue solver using a software architecture based on the C++ std::execution concurrency proposal. The state-of-the-art linear algebra implementations LAPA...
详细信息
ISBN:
(数字)9783031617638
ISBN:
(纸本)9783031617621;9783031617638
DLA-Future implements an efficient GPU-enabled distributed eigenvalue solver using a software architecture based on the C++ std::execution concurrency proposal. The state-of-the-art linear algebra implementations LAPACK and ScaLAPACK were designed for legacy systems and employ fork-join parallelism, which can perform inefficiently on modern architectures. The benefits of task-based linear algebra implementations are significant. The reduction of synchronization points and the ease of overlapping computation with communication are two of the main benefits that lead to improved performance. In specific cases, the ability to schedule multiple algorithms concurrently yields a noticeable reduction of time-to-solution. We present the implementation of DLA-Future and the results on different types of systems starting from Piz Daint multicore and GPU partitions, moving to more recent architectures available in ALPS. The benchmark results are divided into two categories. The first contains a comparison of DLA-Future against widely used eigensolver implementations. The second category showcases the performance of the eigensolver in real applications. We present results generated with CP2K, where DLA-Future support was easily added thanks to the provided C API, which is compatible with the ScaLAPACK interface.
暂无评论