The proceedings contain 10 papers. The topics discussed include: multi-level performance instrumentation for kokkos applications using TAU;the case for a common instrumentation interface for HPC codes;in situ visualiz...
ISBN:
(纸本)9781728160269
The proceedings contain 10 papers. The topics discussed include: multi-level performance instrumentation for kokkos applications using TAU;the case for a common instrumentation interface for HPC codes;in situ visualization of performance metrics in multiple domains;towards a programmable analysis and visualization framework for interactive performance analytics;designing efficient parallel software via compositional performance modeling;performance analysis of tile low-rank cholesky factorization using PaRSEC instrumentation tools;and asvie: a timing-agnostic SVE optimization methodology.
The analysis of runtime performance is important during the development and throughout the life cycle of HPC applications. One important objective in performance analysis is to identify regions in the code that show s...
详细信息
ISBN:
(纸本)9781728160269
The analysis of runtime performance is important during the development and throughout the life cycle of HPC applications. One important objective in performance analysis is to identify regions in the code that show significant runtime increase with larger problem sizes or more processes. One approach to identify such regions is to use empirical performance modeling, i.e., building performance models based on measurements. While the modeling itself has already been streamlined and automated, the generation of the required measurements is time consuming and tedious. In this paper, we propose an approach to automatically adjust the instrumentation to reduce overhead and focus the measurements to relevant regions, i.e., such that show increasing runtime with larger input parameters or increasing number of MPI ranks. Our approach employs Extra-P to generate performance models, which it then uses to extrapolate runtime and, finally, decide which functions should be kept for measurement. Also, the analysis expands the instrumentation, by heuristically adding functions based on static source-code features. We evaluate our approach using benchmarks from SPEC CPU 2006, SU2, and parallel MILC. The evaluation shows that our approach can filter functions of little interest and generate profiles that contain mostly relevant regions. For example, the overhead for SU2 can be improved automatically from 200% to 11% compared to filtered Score-P measurements.
Understanding the performance characteristics of applications in modern HPC environments is becoming more challenging due to the increase in the architectural and programming complexities. HPC software developers rely...
详细信息
High-dimensional torus networks are becoming common in flagship HPC systems, with five of the top ten systems in June 2014 having networks with more than three dimensions. Although such networks combine performance wi...
详细信息
ISBN:
(纸本)9781479970582
High-dimensional torus networks are becoming common in flagship HPC systems, with five of the top ten systems in June 2014 having networks with more than three dimensions. Although such networks combine performance with scalability at reasonable cost, the challenge of how to achieve optimal performance remains. tools are needed to help understand how well the traffic is distributed among the many dimensions. This involves not only capturing network traffic but also its comprehensible visualization. However, visualizing such networks requires projecting multiple dimensions onto a two-dimensional screen, which is naturally challenging. To tackle this problem, in this position paper, we propose a visualization technique which can display traffic on torus networks with up to six dimensions. Our fundamental approach is to simultaneously present multiple views of the same network section, with each view visualizing different dimensions. Furthermore, we leverage the multiple-coordinate system concept and combine it with a customized polygon view to provide both a global and a zoomed-in perspective of the network. By interactively linking all the views, our technique makes it possible to analyze how the communication pattern of an application is mapped onto a network.
暂无评论