Developing and optimizing software applications for high performance and energy efficiency is a very challenging task, even when considering a single target machine. For instance, optimizing for multicore-based comput...
详细信息
Developing and optimizing software applications for high performance and energy efficiency is a very challenging task, even when considering a single target machine. For instance, optimizing for multicore-based computing systems requires in-depth knowledge about programming languages, application programming interfaces (APIs), compilers, performance tuning tools, and computer architecture and organization. Many of the tasks of performance engineering methodologies require manual efforts and the use of different tools not always part of an integrated toolchain. This paper presents Pegasus, a performance engineering approach supported by a framework that consists of a source-to-source compiler, controlled and guided by strategies programmed in a Domain-Specific Language, and an autotuner. Pegasus is a holistic and versatile approach spanning various decision layers composing the software stack, and exploiting the system capabilities and workloads effectively through the use of runtime autotuning. The Pegasus approach helps developers by automating tasks regarding the efficient implementation of software applications in multicore computing systems. These tasks focus on application analysis, profiling, code transformations, and the integration of runtime autotuning. Pegasus allows developers to program their strategies or to automatically apply existing strategies to software applications in order to ensure the compliance of non-functional requirements, such as performance and energy efficiency. We show how to apply Pegasus and demonstrate its applicability and effectiveness in a complex case study, which includes tasks from a smart navigation system.
This paper describes MATISSE, a compiler able to translate a MATLAB subset to C targeting embedded systems. MATISSE uses LARA, an aspect-oriented programming language, to specify additional information and transformat...
详细信息
This paper describes MATISSE, a compiler able to translate a MATLAB subset to C targeting embedded systems. MATISSE uses LARA, an aspect-oriented programming language, to specify additional information and transformations to the input MATLAB code, for example, insertion of code for initialization of variables, and specification of types and shapes of variables. The compiler is being developed bearing in mind flexibility, multitarget and multitoolchain support, allowing for the generation of several implementations in C from the same reference code in MATLAB. In this paper, we also present a number of techniques being employed in MATLAB to C compilation, such as element-wise mapping operations, matrix views, weak types, and intrinsics. We validate these techniques using MATISSE and a set of representative benchmarks. More specifically, we evaluate the compiler with a set of 31 benchmarks using an embedded system board and a desktop computer. The results show speedups up to 1.8x by employing information provided by LARA aspects, when compared with C code generated without additional user information. When compared with the execution time of the original code running on MATLAB, the execution time of the generated C code achieved a geometric mean speedup of 13x. Copyright (c) 2016 John Wiley & Sons, Ltd.
The computational resources at open-science supercomputing centers are shared among multiple users at a given time, and hence are governed by policies that ensure their fair and optimal usage. Such policies can impose...
详细信息
ISBN:
(纸本)9781450351300
The computational resources at open-science supercomputing centers are shared among multiple users at a given time, and hence are governed by policies that ensure their fair and optimal usage. Such policies can impose upper-limits on (1) the number of compute-nodes, and (2) the wall-clock time that can be requested per computational job. Given these limits on computational jobs, several applications may not run to completion in a single session. Therefore, as a workaround, the users are advised to take advantage of the checkpoint-and-restart technique and spread their computations across multiple interdependent computational jobs. The checkpoint-and-restart technique helps in saving the execution state of the applications periodically. A saved state is known as a checkpoint. When their computational jobs time-out after running for the maximum wall-clock time, while leaving their computations incomplete, the users can submit new jobs to resume their computations using the checkpoints saved during their previous job runs. The checkpoint-and-restart technique can also be useful for making the applications tolerant to certain types of faults, viz., network and compute-node failures. When this technique is built within an application itself, it is called Application-Level Checkpointing (ALC). We are developing an interactive tool to assist the users in semi-automatically inserting the ALC mechanism into their existing applications without doing any manual reengineering. As compared to other approaches for checkpointing, the checkpoints written with our tool have smaller memory footprint, and thus, incur a smaller I/O overhead.
The demise of frequency scaling, which is the easiest way to improve computing performance, in addition to the growing gap between CPU and memory speeds and the increase in arithmetic intensity in current problems, ha...
详细信息
The demise of frequency scaling, which is the easiest way to improve computing performance, in addition to the growing gap between CPU and memory speeds and the increase in arithmetic intensity in current problems, has given rise to a new range of devices created to improve performance. Heterogeneous Computing (HC), and many-cores are examples of this new range of devices. However, the complexity of these new hardware architectures is not easily hidden from the programmer. In this thesis, I propose a set of tools that seek to exploit (through source-to-source (S2S) compilers) the capabilities and peculiarities of parallel computing and HC to speed up and increase the energy efficiency of originally sequential source code. The proposed modular programs are implemented as a set of tools that help port sequential source code to OpenMP, MPI, and HMPP, demonstrating how the in- put code can effectively automatically be translated. Through a real-life example, I show how the proposed dependency analysis tool trivializes the task of paral- lelizing sequential code, breaking the first performance barrier. The OMP2MPI experiments generate code that is more than 60× faster than its sequential version and also faster than its original OpenMP code. The OMP2HMPP experiments ob- tain an average speedup of 31× and average increase in energy efficiency of 5.86×. Both tools were tested with OpenMP, obtaining successful results that demonstrate the feasibility of using this set of tools for exploring HC.
This article presents Kadabra, a Java source-to-source compiler that allows users to make code queries, code analysis and code transformations, all user-programmable using the domain-specific language LARA. We show ho...
详细信息
This article presents Kadabra, a Java source-to-source compiler that allows users to make code queries, code analysis and code transformations, all user-programmable using the domain-specific language LARA. We show how Kadabra can be used as the basis for developing a runtime autotuning and adaptivity framework, able to adapt existing source Java code in order to take advantage of runtime autotuning. Specifically, this article presents the framework, consisting of Kadabra and an API for runtime adaptivity. We show the use of the framework to extend Java applications with autotuning and runtime adaptivity mechanisms to target performance improvement and/or energy saving goals.(c) 2023 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://***/licenses/by/4.0/).
暂无评论