An adaptation of the classic register allocation algorithm to the problem of array storage optimization in MATLAB is presented. The method involves the decomposition of an interference graph's color classes using ...
详细信息
ISBN:
(纸本)9781581136623
An adaptation of the classic register allocation algorithm to the problem of array storage optimization in MATLAB is presented. The method involves the decomposition of an interference graph's color classes using inferred type information. A key trait is the use of symbolic types, along with control flow, in performing the decomposition. On a benchmark suite spanning the published test suites of some recent research MATLAB compilers, our implementation produces savings in the average virtual memory size, with respect to code generated by a commercial MATLAB compiler, of between 51% and 139% in 6 out of 11 programs, and savings between 0.7% and 47% in the remaining. In absolute terms, this ranged from 123KB to over 9MB. Substantial improvements in other categories of memory, such as resident sets and dynamic data (stack plus heap), were also observed and are reported. Speedups in execution times of at least over an order of magnitude in 4 programs, of over 100% in 4 of the remaining, and 10% and over in-the rest are also reported.
As more complex DSP algorithms are realized in practice, there is an increasing need for high-level stream abstractions that can be compiled without sacrificing efficiency. Toward this end, we present a set of aggress...
详细信息
ISBN:
(纸本)9781581136623
As more complex DSP algorithms are realized in practice, there is an increasing need for high-level stream abstractions that can be compiled without sacrificing efficiency. Toward this end, we present a set of aggressive optimizations that target linear sections of a stream program. Our input language is StreamIt, which represents programs as a hierarchical graph of autonomous filters. A filter is linear if each of its outputs can be represented as an affine combination of its inputs. Linearity is common in DSP components;examples include FIR filters, expanders, compressors, FFTs and DCTs. We demonstrate that several algorithmic transformations, traditionally hand-tuned by DSP experts, can be completely automated by the compiler. First, we present a linear extraction analysis that automatically detects linear filters from the C-like code in their work function. Then, we give a procedure for combining adjacent linear filters into a single filter, as well as for translating a linear filter to operate in the frequency domain. We also present an optimization selection algorithm, which finds the sequence of combination and frequency transformations that will give the maximal benefit. We have completed a fully-automatic implementation of the above techniques as part of the StreamIt compiler, and we demonstrate a 450% performance improvement over our benchmark suite.
Typed assembly languages provide a way to generate machine-checkable safety proofs for machine-language programs. But the soundness proofs of most existing typed assembly languages are hand-written and cannot be machi...
详细信息
ISBN:
(纸本)9781581136623
Typed assembly languages provide a way to generate machine-checkable safety proofs for machine-language programs. But the soundness proofs of most existing typed assembly languages are hand-written and cannot be machine-checked, which is worrisome for such large calculi. We have designed and implemented a low-level typed assembly language (LTAL) with a semantic model and established its soundness from the model. Compared to existing typed assembly languages, LTAL is more scalable and more secure;it has no macro instructions that hinder low-level optimizations such as instruction scheduling;its type constructors are expressive enough to capture dataflow information, support the compiler's choice of data representations and permit typed position-independent code;and its type-checking algorithm is completely syntax-directed. We have built a prototype system, based on Standard ML of New Jersey, that compiles most of core ML to Sparc code. We explain how we were able to make the untyped back end in SML/NJ preserve types during instruction selection and register allocation, without restricting low-level optimizations and without knowledge of any type system pervading the instruction selector and register allocator.
Software prefetching is a promising technique to hide cache miss latencies, but it remains challenging to effectively prefetch pointer-based data structures because obtaining the memory address to be prefetched requir...
详细信息
Software prefetching is a promising technique to hide cache miss latencies, but it remains challenging to effectively prefetch pointer-based data structures because obtaining the memory address to be prefetched requires pointer dereferences. The recently proposed stride prefetching overcomes this problem, but it only exploits inter-iteration stride patterns and relies on an off-line profiling method. We propose a new algorithm for stride prefetching which is intended for use in a dynamic compiler. We exploit both inter- and intra-iteration stride patterns, which we discover using an ultra-lightweight profiling technique, called object inspection. This is a kind of partial interpretation that only a dynamic compiler can perform. During the compilation of a method, the dynamic compiler gathers the profile information by partially interpreting the method using the actual values of parameters and causing no side effects. We evaluated an implementation of our prefetching algorithm in a production-level Java just-in-time compiler. The results show that the algorithm achieved up to an 18.9% and 25.1% speedup in industry-standard benchmarks on the Pentium 4 and the Athlon MP, respectively, while it increased the compilation time by less than 3.0%.
We show that abstract interpretation-based static program analysis can be made efficient and precise enough to formally verify a class of properties for a family of large programs with few or no false alarms. This is ...
详细信息
ISBN:
(纸本)9781581136623
We show that abstract interpretation-based static program analysis can be made efficient and precise enough to formally verify a class of properties for a family of large programs with few or no false alarms. This is achieved by refinement of a general purpose static analyzer and later adaptation to particular programs of the family by the end-user through parametrization. This is applied to the proof of soundness of data manipulation operations at the machine level for periodic synchronous safety critical embedded software. The main novelties are the design principle of static analyzers by refinement and adaptation through parametrization (Sect. 3 and 7), the symbolic manipulation of expressions to improve the precision of abstract transfer functions (Sect. 6.3), the octagon (Sect. 6.2.2), ellipsoid (Sect. 6.2.3), and decision tree (Sect. 6.2.4) abstract domains, all with sound handling of rounding errors in floating point computations, widening strategies (with thresholds: Sect. 7.1.2, delayed: Sect. 7.1.3) and the automatic determination of the parameters (parametrized packing: Sect. 7.2).
Compiler writers have crafted many heuristics over the years to approximately solve NP-hard problems efficiently. Finding a heuristic that performs well on a broad range of applications is a tedious and difficult proc...
详细信息
ISBN:
(纸本)9781581136623
Compiler writers have crafted many heuristics over the years to approximately solve NP-hard problems efficiently. Finding a heuristic that performs well on a broad range of applications is a tedious and difficult process. This paper introduces Meta Optimization, a methodology for automatically fine-tuning compiler heuristics. Meta Optimization uses machine-learning techniques to automatically search the space of compiler heuristics. Our techniques reduce compiler design complexity by relieving compiler writers of the tedium of heuristic tuning. Our machine-learning system uses an evolutionary algorithm to automatically find effective compiler heuristics. We present promising experimental results. In one mode of operation Meta Optimization creates application-specific heuristics which often result in impressive speedups. For hyperblock formation, one optimization we present in this paper, we obtain an average speedup of 23% (up to 73%) for the applications in our suite. Furthermore, by evolving a compiler's heuristic over several benchmarks, we can create effective, general-purpose heuristics. The best general-purpose heuristic our system found for hyperblock formation improved performance by an average of 25% on our training set, and 9% on a completely unrelated test set. We demonstrate the efficacy of our techniques - on three different optimizations in this paper: hyperblock formation, register allocation, and data prefetching.
Method inlining and data flow analysis are two major optimization components for effective program transformations, however they often suffer from the existence of rarely or never executed code contained in the target...
详细信息
ISBN:
(纸本)9781581136623
Method inlining and data flow analysis are two major optimization components for effective program transformations, however they often suffer from the existence of rarely or never executed code contained in the target method. One major problem lies in the assumption that the compilation unit is partitioned at method boundaries. This paper describes the design and implementation of a region-based compilation technique in our dynamic compilation system, in which the compiled regions are selected as code portions without rarely executed code. The key part of this technique is the region selection, partial inlining, and region exit handling. For region selection, we employ both static heuristics and dynamic profiles to identify rare sections of code. The region selection process and method inlining decision are interwoven, so that method inlining exposes other targets for region selection, while the region selection in the inline target conserves the inlining budget, leading to more method inlining. Thus the inlining process can be performed for parts of a method, not for the entire body of the method. When the program attempts to exit from a region boundary, we trigger recompilation and then rely on on-stack replacement to continue the execution from the corresponding entry point in the recompiled code. We have implemented these techniques in our Java JIT compiler, and conducted a comprehensive evaluation. The experimental results show that the approach of region-based compilation achieves approximately 5% performance improvement on average, while reducing the compilation overhead by 20 to 30%, in comparison to the traditional function-based compilation techniques.
Speculative execution, such as control speculation and data speculation, is an effective way to improve program performance. Using edge/path profile information or simple heuristic rules, existing compiler frameworks ...
详细信息
ISBN:
(纸本)9781581136623
Speculative execution, such as control speculation and data speculation, is an effective way to improve program performance. Using edge/path profile information or simple heuristic rules, existing compiler frameworks can adequately incorporate and exploit control speculation. However, very little has been done so far to allow existing compiler frameworks to incorporate and exploit data speculation effectively in various program transformations beyond instruction scheduling. This paper proposes a speculative SSA form to incorporate information from alias profiling and/or heuristic rules for data speculation, thus allowing existing program analysis frameworks to be easily extended to support both control and data speculation. Such a general framework is very useful for EPIC architectures that provide checking (such as advanced load address table (ALAT) [10]) on data speculation to guarantee the correctness of program execution. We use SSAPRE [2 1] as one example to illustrate how to incorporate data speculation in those important compiler optimizations such as partial redundancy elimination (PRE), register promotion, strength reduction and linear function test replacement. Our extended framework allows both control and data speculation to be performed on top of SSAPRE and, thus, enables more aggressive speculative optimizations. The proposed framework has been implemented on Intel's Open Research Compiler (ORC). We present experimental data on some SPEC2000 benchmark programs to demonstrate the usefulness of this framework and how data speculation benefits partial redundancy elimination.
With power-related concerns becoming dominant aspects of hardware and software design, significant research effort has been devoted towards system power minimization. Among run-time power-management techniques, dynami...
详细信息
ISBN:
(纸本)9781581136623
With power-related concerns becoming dominant aspects of hardware and software design, significant research effort has been devoted towards system power minimization. Among run-time power-management techniques, dynamic voltage scaling (DVS) has emerged as an important approach, with the ability to provide significant power savings. DVS exploits the ability to control the power consumption by varying a processor's supply voltage (V) and clock frequency (f). DVS controls energy by scheduling different parts of the computation to different (V, f) pairs;the goal is to minimize energy while meeting performance needs. Although processors like the Intel XScale and Transmeta Crusoe allow software DVS control, such control has thus far largely been used at the process/task level under operating system control. This is mainly because the energy and time overhead for switching DVS modes is considered too large and difficult to manage within a single program. In this paper we explore the opportunities and limits of compile-time DVS scheduling. We derive an analytical model for the maximum energy savings that can be obtained using DVS given a few known program and processor parameters. We use this model to determine scenarios where energy consumption benefits from compile-time DVS and those where there is no benefit. The model helps us extrapolate the benefits of compile-time DVS into the future as processor parameters change. We then examine how much of these predicted benefits can actually be achieved through optimal settings of DVS modes. This is done by extending the existing Mixed-integer Linear Program (MILP) formulation for this problem by accurately accounting for DVS energy switching overhead, by providing finer-grained control on settings and by considering multiple data categories in the optimization. Overall, this research provides a comprehensive view of compile-time DVS management, providing both practical techniques for its immediate deployment as well theoretical
Software writing is difficult for many reasons. One important reason is the interplay between the formal world of the computer and its programminglanguage with the informal world where the problem to be solved is loc...
详细信息
Software writing is difficult for many reasons. One important reason is the interplay between the formal world of the computer and its programminglanguage with the informal world where the problem to be solved is located. In this paper some of the direct and indirect consequences of this interplay are briefly discussed, both for software specification and design generally and for the composition of independently identified and specified subproblems. (C) 2003 Elsevier B.V. All rights reserved.
暂无评论