Ada 95 is being used as the implementationlanguage for a senior level compiler design course at the United States Military Academy. This paper describes experiences and lessons learned as well as the scenario based a...
详细信息
This paper presents dynamic feedback, a technique that enables computations to adapt dynamically to different execution environments. A compiler that uses dynamic feedback produces several different versions of the sa...
详细信息
This paper presents dynamic feedback, a technique that enables computations to adapt dynamically to different execution environments. A compiler that uses dynamic feedback produces several different versions of the same source code;each version uses a different optimization policy. The generated code alternately performs sampling phases and production phases. Each sampling phase measures the overhead of each version in the current environment. Each production phase uses the version with the least overhead in the previous sampling phase. The computation periodically resamples to adjust dynamically to changes in the environment. We have implemented dynamic feedback in the context of a parallelizing compiler for object-based programs. The generated code uses dynamic feedback to automatically choose the best synchronization optimization policy. Our experimental results show that the synchronization optimization policy has a significant impact on the overall performance of the computation, that the best policy varies from program to program, that the compiler is unable to statically choose the best policy, and that dynamic feedback enables the generated code to exhibit performance that is comparable to that of code that has been manually tuned to use the best policy. We have also performed a theoretical analysis which provides, under certain assumptions, a guaranteed optimality bound for dynamic feedback relative to a hypothetical (and unrealizable) optimal algorithm that uses the best policy at every point during the execution.
A program profile attributes run-time costs to portions of a program's execution. Most profiling systems suffer from two major deficiencies: first, they only apportion simple metrics, such as execution frequency o...
详细信息
ISBN:
(纸本)9780897919074
A program profile attributes run-time costs to portions of a program's execution. Most profiling systems suffer from two major deficiencies: first, they only apportion simple metrics, such as execution frequency or elapsed time to static, syntactic units, such as procedures or statements;second, they aggressively reduce the volume of information collected and reported, although aggregation can hide striking differences in program behavior. This paper addresses both concerns by exploiting the hardware counters available in most modern processors and by incorporating two concepts from data flow analysis-flow and context sensitivity-to report more context for measurements. This paper extends our previous work on efficient path profiling to flow sensitive profiling, which associates hardware performance metrics with a path through a procedure. In addition, it describes a data structure, the calling context tree, that efficiently captures calling contexts for procedure-level measurements. Our measurements show that the SPEC95 benchmarks execute a small number (3-28) of hot paths that account for 9-98% of their L1 data cache misses. Moreover, these hot paths are concentrated in a few routines, which have complex dynamic behavior.
A major research goal for compilers and environments is the automatic derivation of tools from formal specifications. However, the formal model of the language is often inadequate;in particular, LR(k) grammars are una...
详细信息
A major research goal for compilers and environments is the automatic derivation of tools from formal specifications. However, the formal model of the language is often inadequate;in particular, LR(k) grammars are unable to describe the natural syntax of many languages, such as C++ and Fortran, which are inherently non-deterministic. designers of batch compilers work around such limitations by combining generated components with ad hoc techniques (for instance, performing partial type and scope analysis in tandem with parsing). Unfortunately, the complexity of incremental systems precludes the use of batch solutions. The inability to generate incremental tools for important languages inhibits the widespread use of language-rich interactive environments. We address this problem by extending the language model itself, introducing a program representation based on parse dags that is suitable for both batch and incremental analysis. Ambiguities unresolved by one stage are retained in this representation until further stages can complete the analysis, even if the resolution depends on further actions by the user. Representing ambiguity explicitly increases the number and variety of languages that can be analyzed incrementally using existing methods. To create this representation, we have developed an efficient incremental parser for general context-free grammars. Our algorithm combines Tomita's generalized LR parser with reuse of entire subtrees via state-matching. Disambiguation can occur statically, during or after parsing, or during semantic analysis (using existing incremental techniques);program errors that preclude disambiguation retain multiple interpretations indefinitely. Our representation and analyses gain efficiency by exploiting the local nature of ambiguities: for the SPEC95 C programs, the explicit representation of ambiguity requires only 0.5% additional space and less than 1% additional time during reconstruction.
The proceedings contains 28 papers. Topics discussed include software pipelining, reduced multipipeline machine description in scheduling constraints, program debugging, program compilers, parallel processing systems,...
详细信息
The proceedings contains 28 papers. Topics discussed include software pipelining, reduced multipipeline machine description in scheduling constraints, program debugging, program compilers, parallel processing systems, logic programming, run time code generation, language support for memory coherence protocols, and data flow analysis.
Java offers the real possibility that most programs can be written in a type-safe language. However, for Java to be broadly useful, it needs additional expressive power. This paper extends Java in one area where more ...
详细信息
Java offers the real possibility that most programs can be written in a type-safe language. However, for Java to be broadly useful, it needs additional expressive power. This paper extends Java in one area where more power is needed: support for parametric polymorphism, which allows the definition and implementation of generic abstractions. We discuss both the rationale for our design decisions and the impact of the extension on other parts of Java, including arrays and the class library. We also describe optional extensions to the Java virtual machine to allow parameterized bytecodes, and how to verify them efficiently. We have extended the Jave bytecode interpreter to provide good performance for parameterized code in both execution speed and code size, without slowing down non-parameterized code.
We present the design and implementation of a new programminglanguage for media-intensive applications called Flavor (Formal language for Audio-Visual Object Representation). It is an extension of C++ and Java in whi...
详细信息
ISBN:
(纸本)9780897919913
We present the design and implementation of a new programminglanguage for media-intensive applications called Flavor (Formal language for Audio-Visual Object Representation). It is an extension of C++ and Java in which the typing system is extended to incorporate bitstream representation semantics. This allows to describe in a single place both the in-memory representation of data as well as their bitstream-level (compressed) representation as well. We have developed software tools (***. ***/flavor) that automatically generate standard C++ and Java code from the Flavor source code, so that direct access to compressed multimedia information by application developers can be achieved with essentially zero programming.
PROGRES is a partly rule-oriented, partly object-oriented language which supports the design of graph structures and the implementation of graph manipulating tools. It has a formal definition based on graph rewriting ...
详细信息
ISBN:
(纸本)0897919149
PROGRES is a partly rule-oriented, partly object-oriented language which supports the design of graph structures and the implementation of graph manipulating tools. It has a formal definition based on graph rewriting systems. Its integrated programming environment offers means for syntax-directed editing, type checking, interactive debugging, graph browsing, and rapid prototyping activities.
Open implementation Analysis and design (OIA/D) has been introduced as a design methodology for object-oriented software systems, and in particular for substrate software. In this paper we detail our experiences with ...
ISBN:
(纸本)9780897919081
Open implementation Analysis and design (OIA/D) has been introduced as a design methodology for object-oriented software systems, and in particular for substrate software. In this paper we detail our experiences with using OIA/D to design and implement a common substrate component for parallel language runtime systems: a lightweight threads package. We show how existing thread packages employ a "black-box" design, hiding crucial design decisions that drastically reduce their ability to be re-used. We detail these design decisions (dilemmas) and show how an implementation based on OIA/D principles results in a thread package that is flexible, efficient, portable, and re-usable.
Existing research understates the benefits that can be obtained from inlining and cloning, especially when guided by profile information. Our implementation of inlining and cloning yields excellent results on average ...
ISBN:
(纸本)9780897919074
Existing research understates the benefits that can be obtained from inlining and cloning, especially when guided by profile information. Our implementation of inlining and cloning yields excellent results on average and very rarely lowers performance. We believe our good results can be explained by a number of factors: inlining at the intermediate-code level removes most technical restrictions on what can be inlined; the ability to inline across files and incorporate profile information enables us to choose better inline candidates; a high-quality back end can exploit the scheduling and register allocation opportunities presented by larger subroutines; an aggressive processor architecture benefits from more predictable branch behavior; and a large instruction cache mitigates the impact of code expansion. We describe the often dramatic impact of our inlining and cloning on performance: for example, the implementations of our inlining and cloning algorithms in the HP-UX 10.20 compilers boost SPECint95 performance on a PA8000-based workstation by a factor of 1.32.
暂无评论