We describe a framework for adding type qualifiers to a language. Type qualifiers encode a simple but highly useful form of subtyping. Our framework extends standard type rules to model the flow of qualifiers through ...
详细信息
We describe a framework for adding type qualifiers to a language. Type qualifiers encode a simple but highly useful form of subtyping. Our framework extends standard type rules to model the flow of qualifiers through a program, where each qualifier or set of qualifiers comes with additional rules that capture its semantics. Our framework allows types to be polymorphic in the type qualifiers. We present a const-inference system for C as an example application of the framework. We show that for a set of real C programs, many more consts can be used than are actually present in the original code.
This paper presents a novel interprocedural, flow-sensitive, and context-sensitive pointer analysis algorithm for multi-threaded programs that may concurrently update shared pointers. For each pointer and each program...
详细信息
This paper presents a novel interprocedural, flow-sensitive, and context-sensitive pointer analysis algorithm for multi-threaded programs that may concurrently update shared pointers. For each pointer and each program point, the algorithm computes a conservative approximation of the memory locations to which that pointer may point. The algorithm correctly handles a full range of constructs in multi-threaded programs, including recursive functions, function pointers, structures, arrays, nested structures and arrays, pointer arithmetic, casts between pointer variables of different types, heap and stack allocated memory, shared global variables, and thread-private global variables. We have implemented the algorithm in the SUIF compiler system and used the implementation to analyze a sizable set of multithreaded programs written in the Cilk multi-threaded programminglanguage. Our experimental results show that the analysis has good precision and converges quickly for our set of Cilk programs.
Typical class-based languages, such as C++ and JAVA, provide complex class mechanisms but only weak module systems. In fact, classes in these languages incorporate many of the features found in richer module mechanism...
详细信息
Typical class-based languages, such as C++ and JAVA, provide complex class mechanisms but only weak module systems. In fact, classes in these languages incorporate many of the features found in richer module mechanisms. In this paper, we describe an alternative approach to designing a language that has both classes and modules. In our design, we rely on a rich ML-style module system to provide features such as visibility control and parameterization, while providing a minimal class mechanism that includes only those features needed to support inheritance. Programmers can then use the combination of modules and classes to implement the full range of class-based features and idioms. Our approach has the advantage that it provides a full-featured module system (useful in its own right), while keeping the class mechanism quite simple. We have incorporated this design in MOBY, which is an ML-style language that supports class-based object-oriented programming. In this paper, we describe our design via a series of simple examples, show how various class-based features and idioms are realized in MOBY, compare our design with others, and sketch its formal semantics.
A hierarchical module system is an effective tool for structuring large programs. Strictly hierarchical module systems impose an acyclic ordering on import dependencies among program units. This can impede modular pro...
详细信息
A hierarchical module system is an effective tool for structuring large programs. Strictly hierarchical module systems impose an acyclic ordering on import dependencies among program units. This can impede modular programming by forcing mutually-dependent components to be consolidated into a single module. Recently there have been several proposals for module systems that admit cyclic dependencies, but it is not clear how these proposals relate to one another, nor how one might integrate them into an expressive module system such as that of ML. To address this question we provide a type-theoretic analysis of the notion of a recursive module in the context of a `phase-distinction' formalism for higher-order module systems. We extend this calculus with a recursive module mechanism and a new form of signature, called a recursively dependent signature, to support the definition of recursive modules. These extensions are justified by an interpretation in terms of more primitive language constructs. This interpretation may also serve as a guide for implementation.
The standard implementation technique for lazy functional languages is call-by-need, which ensures that an argument to a function in any given call is evaluated at most once. A significant problem with call-by-need is...
详细信息
The standard implementation technique for lazy functional languages is call-by-need, which ensures that an argument to a function in any given call is evaluated at most once. A significant problem with call-by-need is that it is difficult - even for compiler writers - to predict the effects of program transformations. The traditional theories for lazy functional languages are based on call-by-name models, and offer no help in determining which transformations do indeed optimize a program. In this article we present an operational theory for call-by-need, based upon an improvement ordering on programs: M is improved by N if in all program-contexts C, when C[M] terminates then C[N] terminates at least as cheaply. We show that this improvement relation satisfies a `context lemma', and supports a rich inequational theory, subsuming the call-by-need lambda calculi of Ariola et al. [AFM+95]. The reduction-based call-by-need calculi are inadequate as a theory of lazy-program transformation since they only permit transformations which speed up programs by at most a constant factor (a claim we substantiate);we go beyond the various reduction-based calculi for call-by-need by providing powerful proof rules for recursion, including syntactic continuity - the basis of fixed-point-induction style reasoning, and an improvement theorem, suitable for arguing the correctness and safety of recursion-based program transformations.
The challenge of exploiting high degrees of instruction-level parallelism is often hampered by frequent branching. Both exposed branch latency and low branch throughput can restrict parallelism. Control critical path ...
详细信息
The challenge of exploiting high degrees of instruction-level parallelism is often hampered by frequent branching. Both exposed branch latency and low branch throughput can restrict parallelism. Control critical path reduction (control CPR) is a compilation technique to address these problems. Control CPR can reduce the dependence height of critical paths through branch operations as well as decrease the number of executed branches. In this paper, we present an approach to control CPR that recognizes sequences of branches using profiling statistics. The control CPR transformation is applied to the predominant path through this sequence. Our approach, its implementation, and experimental results are presented. This work demonstrates that control CPR enhances instruction-level parallelism for a variety of application programs and improves their performance across a range of processors.
Static Single Assignment (SSA) is an effective intermediate representation in optimizing compilers. However, traditional SSA form and optimizations are not applicable to programs represented as native machine instruct...
详细信息
Static Single Assignment (SSA) is an effective intermediate representation in optimizing compilers. However, traditional SSA form and optimizations are not applicable to programs represented as native machine instructions because the use of dedicated registers imposed by calling conventions, the runtime system, and target architecture must be made explicit. We present a simple scheme for converting between programs in machine code and in SSA, such that references to dedicated physical registers in machine code are preserved. Our scheme ignores all output- and anti-dependences imposed by physical registers while a program is in SSA form, but inserts compensation code during machine code reconstruction if any naming requirements have been violated. By resolving all mismatches between the two representations in separate phases, we are able to utilize existing SSA algorithms unaltered to perform machine code optimizations.
Type casting allows a program to access an object as if it had a type different from its declared type. This complicates the design of a pointer-analysis algorithm that treats structure fields as separate objects;ther...
详细信息
Type casting allows a program to access an object as if it had a type different from its declared type. This complicates the design of a pointer-analysis algorithm that treats structure fields as separate objects;therefore, some previous pointer-analysis algorithms `collapse' a structure into a single variable. The disadvantage of this approach is that it can lead to very imprecise points-to information. Other algorithms treat each field as a separate object based on its offset and size. While this approach leads to more precise results, the results are not portable because the memory layout of structures is implementation dependent. This paper first describes the complications introduced by type casting, then presents a tunable pointer-analysis framework for handling structures in the presence of casting. Different instances of this framework produce algorithms with different levels of precision, portability, and efficiency. Experimental results from running our implementations of four instances of this framework show that (i) it is important to distinguish fields of structures in pointer analysis, but (ii) making conservative approximations when casting is involved usually does not cost much in terms of time, space, or the precision of the results.
Load-reuse analysis finds instructions that repeatedly access the same memory location. This location can be promoted to a register, eliminating redundant loads by reusing the results of prior memory accesses. This pa...
详细信息
Load-reuse analysis finds instructions that repeatedly access the same memory location. This location can be promoted to a register, eliminating redundant loads by reusing the results of prior memory accesses. This paper develops a load-reuse analysis and designs a method for evaluating its precision. In designing the analysis, we aspire for completeness - the goal of exposing all reuse that can be harvested by a subsequent program transformation. For register promotion, a suitable transformation is partial redundancy elimination (PRE). To approach the ideal goal of PRE-completeness, the load-reuse analysis is phrased as a data-flow problem on a program representation that is path-sensitive, as it detects reuse even when it originates in a different instruction along each control flow path. Furthermore, the analysis is comprehensive, as it treats scalar, array and pointer-based loads uniformly. In evaluating the analysis, we compare it with an ideal analysis. By observing the run-time stream of memory references, we collect all PRE-exploitable reuse and treat it as the ideal analysis performance. To compare the (static) load-reuse analysis with the (dynamic) ideal reuse, we use an estimator algorithm that computes, given a data-flow solution and a program profile, the dynamic amount of reuse detected by the analysis. We developed a family of estimators that differ in how well they bound the profiling error inherent in the edge profile. By bounding the error, the estimators offer a precise and practical method for determining the run-time optimization benefit. Our experiments show that about 55% of loads executed in Spec95 exhibit reuse. Of those, our analysis exposes about 80%.
A high-performance implementation of a Java Virtual Machine requires a compiler to translate Java bytecodes into native instructions, as well as an advanced garbage collector (e.g., copying or generational). When the ...
详细信息
A high-performance implementation of a Java Virtual Machine requires a compiler to translate Java bytecodes into native instructions, as well as an advanced garbage collector (e.g., copying or generational). When the Java heap is exhausted and the garbage collector executes, the compiler must report to the garbage collector all live object references contained in physical registers and stack locations. Typical compilers only allow certain instructions (e.g., call instructions and backward branches) to be GC-safe;if GC happens at some other instruction, the compiler may need to advance execution to the next GC-safe point. Until now, no one has ever attempted to make every compiler-generated instruction GC-safe, due to the perception that recording this information would require too much space. This kind of support could improve the GC performance in multithreaded applications. We show how to use simple compression techniques to reduce the size of the GC map to about 20% of the generated code size, a result that is competitive with the best previously published results. In addition, we extend the work of Agesen, Detlefs, and Moss, regarding the so-called `JSR Problem' (the single exception to Java's type safety property), in a way that eliminates the need for extra runtime overhead in the generated code.
暂无评论