In high-performance computing, storage is a shared resource and used by all users with many different application requirements and knowledge of storage. Consequently, the optimal storage configuration varies according...
详细信息
ISBN:
(纸本)9781728109121
In high-performance computing, storage is a shared resource and used by all users with many different application requirements and knowledge of storage. Consequently, the optimal storage configuration varies according to the I/O behavior of each application. While system logs are helpful resources in understanding the storage behavior, it is nontrivial for each user to analyze the logs and adjust complex configurations. Even for experienced users, it is difficult to understand the full stack of I/O systems and find the optimal configuration for the specific application. In this work, we analyzed the I/O activities of CORI which is an HPC system in National Energy Research Scientific Computing Center (NERSC). The result of our analysis shows that most users do not adjust storage configurations and use the default settings. Also, it shows that only a few applications are executed repeatedly in the HPC environment. Based on this result, we have developed DCA-I/O, a dynamic distributedfilesystem configuration adjustment algorithm, which utilizes system log information and widely adapted rules to adjust storage configurations automatically without any user intervention. DCA-I/) utilizes existing system logs and does not require any modifications in code or an additional library. To demonstrate the effectiveness of DCA-I/O, we have performed experiments using I/O kernels of the real applications in both isolated small-sized Lustre environment and CORI. Our experimental result shows that the use of our scheme can lead to improvements in the performance of INC applications by up to 75% in an isolated environment and 50% in a real HPC environment without user intervention.
In this work, we have analyzed the input/output (I/O) activities of Cori, which is a high-performance computing system at the National Energy Research Scientific Computing Center at Lawrence Berkeley National Laborato...
详细信息
In this work, we have analyzed the input/output (I/O) activities of Cori, which is a high-performance computing system at the National Energy Research Scientific Computing Center at Lawrence Berkeley National Laboratory. Our analysis results indicate that most users do not adjust storage configurations but rather use the default settings. In addition, owing to the interference from many applications running simultaneously, the performance varies based on the system status. To configure filesystems autonomously in complex environments, we developed DCA-IO, a dynamic distributedfilesystem configuration adjustment algorithm that utilizes the system log information to adjust storage configurations automatically. Our scheme aims to improve the application performance and avoid interference from other applications without user intervention. Moreover, DCA-IO uses the existing system logs and does not require code modifications, an additional library, or user intervention. To demonstrate the effectiveness of DCA-IO, we performed experiments using I/O kernels of real applications in both an isolated small-sized Lustre environment and Cori. Our experimental results shows that our scheme can improve the performance of HPC applications by up to 263% with the default Lustre configuration.
暂无评论