Performance modeling of parallel applications is essential for optimizing resource usage in high-performance computing (HPC) systems. However, some scientific applications exhibit irregular performance behaviors, whic...
详细信息
The performance modeling of a parallel application is crucial for the better use of the HPC resources. However, certain scientific applications exhibit irregular performance characteristics, posing challenges in accur...
详细信息
Performance modeling of parallel applications is essential for optimizing resource usage in high-performance computing (HPC) systems. However, some scientific applications exhibit irregular performance behaviors, whic...
详细信息
ISBN:
(数字)9798331527891
ISBN:
(纸本)9798331527907
Performance modeling of parallel applications is essential for optimizing resource usage in high-performance computing (HPC) systems. However, some scientific applications exhibit irregular performance behaviors, which complicates cre-ating accurate characteristic models. This irregularity is mainly due to these applications' nondeterministic computational and communication patterns. Tools such as PAS2P (Parallel Application Signatures for Performance Prediction) are used to extract detailed information about parallel applications. PAS2P is based on the repetitive behavior of the application to analyze and predict the application's performance, using the same resources that the parallel application uses for its execution. This paper presents a characterization model based on the PAS2P methodology for irregular applications that groups the repeatability patterns of all the processes running the application into a single characteristic model. To achieve this, we consolidate the different characterizations performed by each process independently, using metrics such as the number of instructions, the execution time of relevant sections, and the topological characteristics of the application. By grouping these repeatability patterns of all processes, we can obtain a concise and accurate representation of the behavior of irregular applications, thus improving predictability and performance optimization in HPC systems.
The performance modeling of a parallel application is crucial for the better use of the HPC resources. However, certain scientific applications exhibit irregular performance characteristics, posing challenges in accur...
The performance modeling of a parallel application is crucial for the better use of the HPC resources. However, certain scientific applications exhibit irregular performance characteristics, posing challenges in accurately modeling their behavior. This irregularity primarily arises from these applications’ non-deterministic computation and communication patterns. This article introduces a performance modeling methodology designed for irregular parallel applications based on the PAS2P methodology. The PAS2P tool generates an application signature and utilizes it to analyze and predict performance. Our approach is based on process-based data analysis to characterize these applications according to the behavior of individual processes, proposing a model to group processes at the time of signature construction. This model allowed us to obtain a reduced number of phases and weights in a limited time, allowing us to characterize the application.
When performance tools are used to analyze an application with thousands of processes, the data generated can be bigger than the memory size of the cluster node, causing this data to be loaded in swap memory. In HPC s...
详细信息
Currently, there are benchmark sets that measure the performance of HPC systems under specific computing and communication properties. These benchmarks represent the kernels of applications that measure specific hardw...
详细信息
When performance tools are used to analyze an application with thousands of processes, the data generated can be bigger than the memory size of the cluster node, causing this data to be loaded in swap memory. In HPC s...
详细信息
ISBN:
(数字)9781728144849
ISBN:
(纸本)9781728144856
When performance tools are used to analyze an application with thousands of processes, the data generated can be bigger than the memory size of the cluster node, causing this data to be loaded in swap memory. In HPC systems, moving data to swap is not always an option. This problem causes scalability limitations that affect the user experience and it presents serious restrictions for executing on a large scale. In order to obtain knowledge about the application's performance, the performance tools usually instrument the application to generate the data. When the instrumented parallel application is executed with thousands of processes, the data generated may be higher than the memory size of the compute node used to analyze the data in order to obtain the knowledge. Performance tools such as PAS2P predict the execution time in target machines. In order to predict the performance, PAS2P carries out a data analysis with the data in each application process. The data collected is analyzed sequentially, which results in an inefficient use of system resources. To solve this, we propose designing a parallel method to solve the problem when we manage a high volume of data, decreasing its execution time and increasing scalability, improving the PAS2P toolkit to generate performance knowledge defined by the application's behavior phases.
Nowadays, rapid progress in next generation sequencing (NGS) technologies has drastically decreased the cost and time required to obtain genome sequences. A series of powerful computing accelerators, such as GPUs and ...
详细信息
Nowadays, rapid progress in next generation sequencing (NGS) technologies has drastically decreased the cost and time required to obtain genome sequences. A series of powerful computing accelerators, such as GPUs and Xeon Phi MIC, are becoming a common platform to reduce the computational cost of the most demanding processes when genomic data is analyzed. GPU has received more attention at literature so far. However, Xeon Phi constitutes a very attractive approach to improve performance because applications don’t need to be rewritten in a different programming language specifically oriented to it. Sequence alignment is a fundamental step in any variant analysis study and there are many tools that cope with this problem. We have selected BWA, one of the most popular sequence aligner, and studied different data management strategies to improve its execution time on hybrid systems made of multicore CPUs and Xeon Phi accelerators. Our main contributions are focused on designing new strategies that combine data splitting and index replication in order to achieve a better balance in the use of system memory and reduce latency penalties. Our experimental results show significant speed-up improvements when such strategies are executed in our hybrid platform, taking advantage of the combined computing power of a standard multicore CPU and a Xeon Phi accelerator.
Wind field calculation is a common problem in different environmental applications from design of wind farms to forest fire propagation prediction. Calculating the wind field is a complex problem that involves solving...
详细信息
Wind field calculation is a common problem in different environmental applications from design of wind farms to forest fire propagation prediction. Calculating the wind field is a complex problem that involves solving huge linear systems. Solving such systems requires the use of iterative methods, such as Preconditioned Conjugate Gradient (PCG) that in most cases take long execution time. The PCG solver with different preconditioners has been analyzed and the performance and scalability of this solver has been determined. The most time consuming operations have been identified and a new method has been developed to improve the parallelization reducing the execution time and increasing the scalability. The new method has been applied on a wind field simulator, called WindNinja, usually coupled to forest fire propagation models. The results are very promising and the new parallelization method appears as a key point to be integrated in other approaches.
In this paper we propose a methodology that allows us to predict the application scalability behavior in a specific system, providing information to select the most appropriate resources to run the application. We exp...
详细信息
In this paper we propose a methodology that allows us to predict the application scalability behavior in a specific system, providing information to select the most appropriate resources to run the application. We explain the general methodology, focusing on the presentation of a novel method to model the logical application trace for a large number of processes. This method is based on the projection of a set of executions of the application signature for a small number of processes. The generated traces are validated by comparing them with the real traces obtained with PAS2P tool. We present the experimental validation for the BT Nas Parallel Benchmark. The signatures for 16, 36, 64, 81 and 100 processes were executed and used to model and project the logical trace for 1024 processes. The results obtained show the accuracy of the method. The communication pattern was predicted without error, while the predicted error is less than 10% for the communication volume and less than 5% for the number of instructions.
暂无评论