This paper presents a simple, powerful and flexible technique for reasoning about and translating the goal-directed evaluation of programminglanguage constructs that either succeed (and generate sequences of values) ...
详细信息
This paper presents a simple, powerful and flexible technique for reasoning about and translating the goal-directed evaluation of programminglanguage constructs that either succeed (and generate sequences of values) or fail. The technique generalizes the Byrd Box, a well-known device for describing Prolog backtracking.
Object-oriented languages like Java and Smalltalk provide a uniform object model that simplifies programming by providing a consistent, abstract model of object behavior. But direct implementations introduce overhead,...
详细信息
ISBN:
(纸本)9780897919074
Object-oriented languages like Java and Smalltalk provide a uniform object model that simplifies programming by providing a consistent, abstract model of object behavior. But direct implementations introduce overhead, removal of which requires aggressive implementation techniques (e.g. type inference, function specialization);in this paper, we introduce object inlining, an optimization that automatically inline allocates objects within containers (as is done by hand in CS++) within a uniform model. We present our technique, which includes novel program analyses that track how inlinable objects are used throughout the program. We evaluated object inlining on several object-oriented benchmarks. It produces performance up to three times as fast as a dynamic model without inlining and roughly equal to that of manually-inlined codes.
Existing research understates the benefits that can be obtained from inlining and cloning, especially when guided by profile information. Our implementation of inlining and cloning yields excellent results on average ...
详细信息
Existing research understates the benefits that can be obtained from inlining and cloning, especially when guided by profile information. Our implementation of inlining and cloning yields excellent results on average and very rarely lowers performance. We believe our good results can be explained by a number of factors: inlining at the intermediate-code level removes most technical restrictions on what can be inlined;the ability to inline across files and incorporate profile information enables us to choose better inline candidates;a high-quality back end can exploit the scheduling and register allocation opportunities presented by larger subroutines;an aggressive processor architecture benefits from more predictable branch behavior;and a large instruction cache mitigates the impact of code expansion. We describe the often dramatic impact of our inlining and cloning on performance: for example, the implementations of our inlining and cloning algorithms in the HP-UX 10.20 compilers boost SPECint95 performance on a PA8000-based workstation by a factor of 1.32.
The combination of pointers and pointer arithmetic in C makes the task of improving C programs somewhat more difficult than improving programs written in simpler languages like Fortran. While much work has been publis...
详细信息
The combination of pointers and pointer arithmetic in C makes the task of improving C programs somewhat more difficult than improving programs written in simpler languages like Fortran. While much work has been published that focuses on the analysis of pointers, little has appeared that uses the results of such analysis to improve the code compiled for C. This paper examines the problem of register promotion in C and presents experimental results showing that it can have dramatic effects on memory traffic.
It is difficult to map the execution model of multithreading languages (languages which support fine-grain dynamic thread creation) onto the single stack execution model of C. Consequently, previous work on efficient ...
详细信息
It is difficult to map the execution model of multithreading languages (languages which support fine-grain dynamic thread creation) onto the single stack execution model of C. Consequently, previous work on efficient multithreading uses elaborate frame formats and allocation strategy, with compilers customized for them. This paper presents an alternative cost-effective implementation strategy for multithreading languages which can maximally exploit current sequential C compilers. We identify a set of primitives whereby efficient dynamic thread creation and switch can be achieved and clarify implementation issues and solutions which work under the stack frame layout and calling conventions of current C compilers. The primitives are implemented as a C library and named StackThreads. In StackThreads, a thread creation is done just by a C procedure call, maximizing thread creation performance. When a procedure suspends an execution, the context of the procedure, which is roughly a stack frame of the procedure, is saved into heap and resumed later. With StackThreads, the compiler writer can straightforwardly translate sequential constructs of the source language into corresponding C statements or expressions, while using StackThreads primitives as a blackbox mechanism which switches execution between C procedures.
The member lookup problem in C++ is the problem of resolving a specified member name in the context of a specified class. Member lookup in C++ is complicated by the presence of virtual inheritance and multiple inherit...
详细信息
The member lookup problem in C++ is the problem of resolving a specified member name in the context of a specified class. Member lookup in C++ is complicated by the presence of virtual inheritance and multiple inheritance. In this paper, we present an efficient algorithm for member lookup in C++. We also present a formalism for the multiple inheritance mechanism of C++, which we use as the basis for deriving our algorithm. The formalism may also be of use as a formal basis for deriving other C++ compiler algorithms.
Two important problems of object-oriented reuse are the propagation of design and implementation specifics of the base software to the inheritors, and the protection of the inheritors against changes in the base softw...
详细信息
Two important problems of object-oriented reuse are the propagation of design and implementation specifics of the base software to the inheritors, and the protection of the inheritors against changes in the base software. In this paper, we argue that the simple inheritance rules of existing object-oriented languages are not sufficient for properly dealing with these problems. In the proposal presented in this paper, programmers are enabled to make metalevel declarations of the internal protocols and dependencies of their classes. Additionally, changes of the base module are automatically monitored to filter out information about the alterations that may invalidate already existing inheritors. Based on these informations, the subclassing semantics is adjusted such that the maintenance of the base module properties and the protection of the inheritor is ensured during their integration. In this way, language support is provided for keeping the behavior of reusable software consistent during its evolution.
We present an approach for specializing large programs, such as programs consisting of several modules, or libraries. This approach is based on the idea of using a compiler generator (cogen) for creating generating ex...
详细信息
We present an approach for specializing large programs, such as programs consisting of several modules, or libraries. This approach is based on the idea of using a compiler generator (cogen) for creating generating extensions. Generating extensions are specializers specialized with respect to some input program. When run on some input data the generating extension produces a specialized version of the input program. Here we use the cogen to tailor modules for specialization. This happens once and for all, independently of all other modules. The resulting module can then be used as a building block for generating extensions for complete programs, in much the same way as the original modules can be put together into complete programs. The result of running the final generating extension is a collection of residual modules, with a module structure derived from the original program.
If a fixed exponentially decreasing probability distribution function is used to model every object's lifetime, then the age of an object gives no information about its future life expectancy. This radioactive dec...
详细信息
ISBN:
(纸本)9780897919074
If a fixed exponentially decreasing probability distribution function is used to model every object's lifetime, then the age of an object gives no information about its future life expectancy. This radioactive decay model implies there can be no rational basis for deciding which live objects should be promoted to another generation. Yet there remains a rational basis for deciding how many objects to promote, when to collect garbage, and which generations to collect. Analysis of the model leads to a new kind of generational garbage collector whose effectiveness does not depend upon heuristics that predict which objects will live longer than others. This result provides insight into the computational advantages of generational garbage collection, with implications for the management of objects whose life expectancies are difficult to predict.
Modulo scheduling algorithms based on optimal solvers have been proposed to investigate and tune the performance of modulo scheduling heuristics. While recent advances have broadened the scope for which the optimal ap...
详细信息
Modulo scheduling algorithms based on optimal solvers have been proposed to investigate and tune the performance of modulo scheduling heuristics. While recent advances have broadened the scope for which the optimal approach is applicable, this approach increasingly suffers from large execution times. In this paper, we propose a more efficient formulation of the modulo scheduling space that significantly decreases the execution time of solvers based on integer linear programs. For example, the total execution time is reduced by a factor of 8.6 when 782 loops from the Perfect Club, SPEC, and Livermore Fortran Kernels are scheduled for minimum register requirements using the more efficient formulation instead of the traditional formulation. Experimental evidence further indicates that significantly larger loops can be scheduled under realistic machine constraints.
暂无评论