The ever increasing scale of modern high performancecomputing platforms poses challenges for system architects and code developers alike. The increase in core count densities and associated cost of components is havi...
详细信息
The ever increasing scale of modern high performancecomputing platforms poses challenges for system architects and code developers alike. The increase in core count densities and associated cost of components is having a dramatic effect on the viability of high memory-per-core ratios. Whilst the available memory per core is decreasing, the increased scale of parallel jobs is testing the efficiency of MPI implementations with respect to memory overhead. Scalability issues have always plagued both hardware manufacturers and software developers, and the combined effects can be disabling. In this paper we address the issue of MPI memory consumption with regard to InfiniBand network communications. We reaffirm some widely held beliefs regarding the existence of scalability problems under certain conditions. Additionally, we present results testing memory-optimised runtime configurations and vendor provided optimisation libraries. Using Orthrus, a linear solver benchmark developed by AWE, we demonstrate these memory-centric optimisations and their performance implications. We show the growth of OpenMPI memory consumption (demonstrating poor scalability) on both Mellanox and QLogic InfiniBand platforms. We demonstrate a 616× increase in MPI memory consumption for a 64× increase in core count, with a default OpenMPI configuration on Mellanox. Through the use of the Mellanox MXM and QLogic PSM optimisation libraries we are able to observe a 117× and 115× reduction in MPI memory at application memory high water mark. This significantly improves the potential scalability of the code.
Server-side visualisation systems comprise of a thin client connected to a rendering service located on the data server. We present a systematic and quantitative method for evaluating server-side visualisation systems...
详细信息
Server-side visualisation systems comprise of a thin client connected to a rendering service located on the data server. We present a systematic and quantitative method for evaluating server-side visualisation systems. We evaluate the fundamental limitations of such systems in relation to network performance. Using these results we benchmark four common architectures by implementing clients and/or servers in each system. The suitability of each architecture is assessed for several applications, in terms of scalability, performance and ease of development
We present the development of a predictive performance model for the high-performancecomputing code Hydra, a hydrodynamics benchmark developed and maintained by the United Kingdom Atomic Weapons Establishment (AWE). ...
详细信息
Input/Output (I/O) operations can represent a significant proportion of run-time when large scientific applications are run in parallel and at scale. In order to address the growing divergence between processing speed...
详细信息
ISBN:
(纸本)9781467309745
Input/Output (I/O) operations can represent a significant proportion of run-time when large scientific applications are run in parallel and at scale. In order to address the growing divergence between processing speeds and I/O performance, the Parallel Log-structured File System (PLFS) has been developed by EMC Corporation and the Los Alamos National Laboratory (LANL) to improve the performance of parallel file activities. Currently, PLFS requires the use of either (i) the FUSE Linux Kernel module, (ii) a modified MPI library with a customised ROMIO MPI-IO library, or (iii) an application rewrite to utilise the PLFS API directly. In this paper we present an alternative method of utilising PLFS in applications. This method employs a dynamic library to intercept the low-level POSIX operations and retarget them to use the equivalents offered by PLFS. We demonstrate our implementation of this approach, named LDPLFS, on a set of standard UNIX tools, as well on as a set of standard parallel I/O intensive mini-applications. The results demonstrate almost equivalent performance to a modified build of ROMIO and improvements over the FUSE-based approach. Furthermore, through our experiments we demonstrate decreased performance in PLFS when ran at scale on the Lustre file system.
暂无评论