Object-oriented applications may contain data members that can be removed from the application without affecting program behavior. Such `dead' data members may occur due to unused functionality in class libraries,...
详细信息
Object-oriented applications may contain data members that can be removed from the application without affecting program behavior. Such `dead' data members may occur due to unused functionality in class libraries, or due to the programmer losing track of member usage as the application changes over time. We present a simple and efficient algorithm for detecting dead data members in C++ applications. This algorithm has been implemented using a prototype version of the IBM VisualAge C++ compiler, and applied to a number of realistic benchmark programs ranging from 600 to 58,000 lines of code. For the non-trivial benchmarks, we found that up to 27.3% of the data members in the benchmarks are dead (average 12.5%), and that up to 11.6% of the object space of these applications may be occupied by dead data members at run-time (average 4.4%).
The IEEE/ANSI standard for Scheme requires implementations to be properly tail recursive. This ensures that portable code can rely upon the space efficiency of continuation-passing style and other idioms. On its face,...
详细信息
ISBN:
(纸本)9780897919876
The IEEE/ANSI standard for Scheme requires implementations to be properly tail recursive. This ensures that portable code can rely upon the space efficiency of continuation-passing style and other idioms. On its face, proper tail recursion concerns the efficiency of procedure calls that occur within a tall context. When examined closely, proper tail recursion also depends upon the fact that garbage collection can be asymptotically more space-efficient than Algol-like stack allocation. Proper tail recursion is not the same as ad hoc tail call optimization in stack-based languages. Proper tail recursion often precludes stack allocation of variables, but yields a well-defined asymptotic space complexity that can be relied upon by portable programs. This paper offers a formal and implementation-independent definition of proper tail recursion for Scheme. It also shows how an entire family of reference implementations can be used to characterize related safe-for-space properties, and proves the asymptotic inequalities that hold between them.
This paper presents a typed programminglanguage and compiler for run-time code generation. The language, called ML, extends ML with modal operators in the style of the Mini-MLe language of Davies and Pfenning. ML all...
详细信息
ISBN:
(纸本)9780897919876
This paper presents a typed programminglanguage and compiler for run-time code generation. The language, called ML, extends ML with modal operators in the style of the Mini-MLe language of Davies and Pfenning. ML allows programmers to use types to specify precisely the stages of computation in a program. The types also guide the compiler in generating target code that exploits the staging information through the use of run-time code generation. The target machine is currently a version of the Categorical Abstract Machine, called the CCAM, which we have extended with facilities for run-time code generation. This approach allows the programmer to express the staging that he wants directly to the compiler. It also provides a typed framework in which to verify the correctness of his staging intentions, and to discuss his staging decisions with other programmers. Finally, it supports in a natural way multiple stages of run-time specialization, so that dynamically generated code can be used in the generation of yet further specialized code. This paper presents an overview of the language, with several examples of programs that illustrate key concepts and programming techniques. Then, it discusses the CCAM and the compilation of ML programs into CCAM code. Finally, the results of some experiments are shown, to demonstrate the benefits of this style of run-time code generation for some applications.
In this paper, we describe our experience with using an abstract integer-set framework to develop the Rice dHPF compiler, a compiler for High Performance Fortran. We present simple, yet general formulations of the maj...
详细信息
In this paper, we describe our experience with using an abstract integer-set framework to develop the Rice dHPF compiler, a compiler for High Performance Fortran. We present simple, yet general formulations of the major computation partitioning and communication analysis tasks as well as a number of important optimizations in terms of abstract operations on sets of integer tuples. This approach has made it possible to implement a comprehensive collection of advanced optimizations in dHPF, and to do so in the context of a more general computation partitioning model than previous compilers. One potential limitation of the approach is that the underlying class of integer set problems is fundamentally unable to represent HPF data distributions on a symbolic number of processors. We describe how we extend the approach to compile codes for a symbolic number of processors, without requiring any changes to the set formulations for the above optimizations. We show experimentally that the set representation is not a dominant factor in compile times on both small and large codes. Finally, we present preliminary performance measurements to show that the generated code achieves good speedups for a few benchmarks. Overall, we believe we are the first to demonstrate by implementation experience that it is practical to build a compiler for HPF using a general and powerful integer-set framework.
The conditional branch has long been considered an expensive operation. The relative cost of conditional branches has increased as recently designed machines are now relying on deeper pipelines and higher multiple iss...
详细信息
ISBN:
(纸本)9780897919876
The conditional branch has long been considered an expensive operation. The relative cost of conditional branches has increased as recently designed machines are now relying on deeper pipelines and higher multiple issue. Reducing the number of conditional branches executed can often result in a substantial performance benefit. This paper describes a code-improving transformation to reorder sequences of conditional branches. First, sequences of branches that can be reordered are detected in the control flow. Second, profiling information is collected to predict the probability that each branch will transfer control out of the sequence. Third, the cost of performing each conditional branch is estimated. Fourth, the most beneficial ordering of the branches based on the estimated probability and cost is selected. The most beneficial ordering often included the insertion of additional conditional branches that did not previously exist in the sequence. Finally, the control flow is restructured to reflect the new ordering. The results of applying the transformation were significant reductions in the dynamic number of instructions and branches, as well as decreases in execution time.
Large applications are typically partitioned into separately compiled modules. Large performance gains in these applications are available by optimizing across module boundaries. One barrier to applying cross-module o...
详细信息
Large applications are typically partitioned into separately compiled modules. Large performance gains in these applications are available by optimizing across module boundaries. One barrier to applying cross-module optimization (CMO) to large applications is the potentially enormous amount of time and space consumed by the optimization process. We describe a framework for scalable CMO that provides large gains in performance on applications that contain millions of lines of code. Two major techniques are described. First, careful management of in-memory data structures results in sub-linear memory occupancy when compared to the number of lines of code being optimized. Second, profile data is used to focus optimization effort on the performance-critical portions of applications. We also present practical issues that arise in deploying this framework in a production environment. These issues include debuggability and compatibility with existing development tools, such as make. Our framework is deployed in Hewlett-Packard's (HP) UNIX compiler products and speeds up shipped independent software vendors' applications by as much as 71%.
This paper evaluates three alias analyses based on programminglanguage types. The first analysis uses type compatibility to determine aliases. The second extends the first by using additional high-level information s...
详细信息
ISBN:
(纸本)9780897919876
This paper evaluates three alias analyses based on programminglanguage types. The first analysis uses type compatibility to determine aliases. The second extends the first by using additional high-level information such as field names. The third extends the second with a flow-insensitive analysis. Although other researchers suggests using types to disambiguate memory references, none evaluates its effectiveness. We perform both static and dynamic evaluations of type-based alias analyses for Modula-3, a statically-typed type-safe language. The static analysis reveals that type compatibility alone yields a very imprecise alias analysis, but the other two analyses significantly improve alias precision. We use redundant load elimination (RLE) to demonstrate the effectiveness of the three alias algorithms in terms of the opportunities for optimization, the impact on simulated execution times, and to compute an upper bound on what a perfect alias analysis would yield. We show modest dynamic improvements for (RLE), and more surprisingly, that on average our alias analysis is within 2.5% of a perfect alias analysis with respect to RLE on 8 Modula-3 programs. These results illustrate that to explore thoroughly the effectiveness of alias analyses, researchers need static, dynamic, and upper-bound analysis. In addition, we show that for type-safe languages like Modula-3 and Java, a fast and simple alias analysis may be sufficient for many applications.
Full precision in garbage collection implies retaining only those heap allocated objects that will actually be used in the future. Since full precision is not computable in general, garbage collectors use safe (i.e., ...
详细信息
Full precision in garbage collection implies retaining only those heap allocated objects that will actually be used in the future. Since full precision is not computable in general, garbage collectors use safe (i.e., conservative) approximations such as reachability from a set of root references. Ambiguous roots collectors (commonly called `conservative') can be overly conservative because they overestimate the root set, and thereby retain unexpectedly large amounts of garbage. We consider two more precise collection schemes for Java virtual machines (JVMs). One uses a type analysis to obtain a type-precise root set (only those variables that contain references);the other adds a live variable analysis to reduce the root set to only the live reference variables. Even with the Java programminglanguage's strong typing, it turns out that the JVM specification has a feature that makes type-precise root sets difficult to compute. We explain the problem and ways in which it can be solved. Our experimental results include measurements of the costs of the type and liveness analyses at load time, of the incremental benefits at run time of the liveness analysis over the type analysis alone, and of various map sizes and counts. We find that the liveness analysis often produces little or no improvement in heap size, sometimes modest improvements, and occasionally the improvement is dramatic. While further study is in order, we conclude that the main benefit of the liveness analysis is preventing bad surprises.
Optimizing the design of complex weapons systems constitutes an important task in efficiently developing and deploying effective complex weapons systems. Often architectural tradeoffs between hardware and software imp...
详细信息
ISBN:
(纸本)1581130333
Optimizing the design of complex weapons systems constitutes an important task in efficiently developing and deploying effective complex weapons systems. Often architectural tradeoffs between hardware and software implementation must be performed early in the design cycle, resulting in potentially inefficient systems or subsystems. As technologies and costs in hardware and software implementation change over time, the best partitioning of system functionality into hardware and software components will also change. Currently, recasting a component from hardware to software (or vice-versa) is a difficult and error-prone activity. This paper explores a new approach to ease the hardware software co-design and repartitioning activities by providing a mechanism to exchange software written in Ada 95 with behavioral VHDL.
MAP has been developed to address the need for an analysis and design method which directly supports software development using a functional language. MAP breaks down software development into a set of manageable and ...
详细信息
MAP has been developed to address the need for an analysis and design method which directly supports software development using a functional language. MAP breaks down software development into a set of manageable and linked techniques which are combined to make a productive whole. In addition, it can be used as a natural communication medium which avoids the idiosyncracies of particular implementationlanguages, and through CASE tool support can act as an effective and efficient consistency checker.
暂无评论