This study evaluates a global optimization technique that avoids unconditional jumps by replicating code. When implemented in the back-end of an optimizing compiler, this technique can be generalized to work on almost...
详细信息
ISBN:
(纸本)0897914759
This study evaluates a global optimization technique that avoids unconditional jumps by replicating code. When implemented in the back-end of an optimizing compiler, this technique can be generalized to work on almost all instances of unconditional jumps, including those generated from conditional statements and unstructured loops. The replication method is based on the idea of finding a replacement for each unconditional jump which minimizes the growth in code size. This is achieved by choosing the shortest sequence of instructions as a replacement. Measurements taken from a variety of programs showed that not only the number of executed instructions decreased, but also that the total cache work was reduced (except for small caches) despite increases in code size. Pipelined and superscalar machines may also benefit from an increase in the average basic block size.
These are the highlights of a successfully completed application of object-oriented software development for a new product. The project was of medium size, the duration was less than 24 months (from the end of the req...
详细信息
ISBN:
(纸本)0201533723
These are the highlights of a successfully completed application of object-oriented software development for a new product. The project was of medium size, the duration was less than 24 months (from the end of the requirements specification to product shipment), and the average team size was 8-10 software engineers. We discuss how the team dealt with major new aspects: a different paradigm, a different programminglanguage, a different user interface environment, and a different development environment. In spite of all these novelties and in spite of the fact that almost twice as much code was produced as was predicted, the project schedule slipped only by 20%. We touch upon all phases of the development life cycle: requirement capture, OO analysis, OO design, OO implementation and the verification phase. Some management perspectives are addressed as well.
This paper examines a problem that arises during global register allocation - rematerialization. If a value cannot be kept in a register, the allocator should recognize when it is cheaper to recompute the value (remat...
详细信息
ISBN:
(纸本)0897914759
This paper examines a problem that arises during global register allocation - rematerialization. If a value cannot be kept in a register, the allocator should recognize when it is cheaper to recompute the value (rematerialize it) than to store and reload it. Chaitin's original graph-coloring allocator handled simple instances of this problem correctly. This paper details a general solution to the problem and presents experimental evidence that shows its importance. Our approach is to tag individual values in the procedure's SSA graph with information specifying how it should be spilled. We use a variant of Wegman and Zadeck's sparse simple constant algorithm to propagate tags throughout the graph. The allocator then splits live ranges into values with different tags. This isolates those values that can be easily rematerialized from values that require general spilling. We modify the base allocator to use this information when estimating spill costs and introducing spill code. Our presentation focuses on rematerialization in the context of Chaitin's allocator;however, the problem arises in any global allocator. We believe that our approach will work in other allocators - while the details of implementation will vary, the key insights should carry over directly.
Even though impressive progress has been made in the area of optimizing and parallelizing programs with arrays, the application of similar techniques to programs with pointer data structures has remained difficult. In...
详细信息
ISBN:
(纸本)0897914759
Even though impressive progress has been made in the area of optimizing and parallelizing programs with arrays, the application of similar techniques to programs with pointer data structures has remained difficult. In this paper we introduce a new approach that leads to improved analysis and transformation of programs with recursively-defined pointer data structures. Our approach is based on a mechanism for the Abstract Description of Data Structures (ADDS), which makes explicit the important properties, such as dimensionality, of pointer data structures. Numerous examples demonstrate that ADDS definitions are both natural to specify and flexible enough to describe complex, cyclic pointer data structures. We discuss how an abstract data structure description can improve program analysis by presenting an analysis approach that combines an alias analysis technique, path matrix analysis, with information available from an ADDS declaration. Given this improved alias analysis technique, we provide a concrete example of applying a software pipelining transformation to loops involving pointer data structures.
The practical value of research involving the abstract interpretation of Prolog programs was examined experimentally. The design and implementation of the generic abstract interpretation algorithm originally proposed ...
详细信息
ISBN:
(纸本)0818625856
The practical value of research involving the abstract interpretation of Prolog programs was examined experimentally. The design and implementation of the generic abstract interpretation algorithm originally proposed by B. Le Charlier (1991), its instantiation in a sophisticated abstract domain containing modes, types, sharing, and aliasing, and its evaluation in terms of performance and accuracy are described. The overall implementation (over 5000 lines of Pascal) was systematically analyzed on a variety of programs. The experimental results, given the abstract domain and the programs analyzed, indicate that (1) the number of iterations of the algorithm is bounded by 7.5 × N and is in most cases smaller than 3 × N, where N is the size of the analyzed program (e.g., the number of program points);(2) the CPU time in seconds is bounded by N and is in most cases smaller than 0.6 × N;(3) the algorithm explores few elements (less than 11% and often none) outside the subset of the fixpoint required to answer the query and hence is close to optimality;and (4) the results are quite accurate and could be used in a Prolog compiler.
Many loop nests in scientific codes contain a parallelizable outer loop but have an inner loop for which the number of iterations varies between different iterations of the outer loop. When running this kind of loop n...
详细信息
ISBN:
(纸本)0897914759
Many loop nests in scientific codes contain a parallelizable outer loop but have an inner loop for which the number of iterations varies between different iterations of the outer loop. When running this kind of loop nest on a SIMD machine, the SIMD-inherent restriction to single program counter common to all processors will cause a performance degradation relative to comparable MIMD implementations. This problem is not due to limited parallelism or bad load balance, it is merely a problem of control flow. This paper presents a loop transformation, which we call loop flattening, that overcomes this limitation by letting each processor advance to the next loop iteration containing useful computation, if there is such an iteration for the given processor. We study a concrete example derived from a molecular dynamics code and compare performance results for flattened and unflattened versions of this kernel on two SIMD machines, the CM-2 and the DECmpp 12000. We then evaluate loop flattening from the compiler's perspective in terms of applicability, cost, profitability, and safety. We conclude with arguing that loop flattening, whether performed by the programmer or by the compiler, introduces negligible overhead and can significantly improve the performance of scientific codes for solving irregular problems.
Higher order functional programs constantly allocate objects dynamically. These objects are typically cons cells, closures, and records and are generally allocated in the heap and reclaimed later by some garbage colle...
详细信息
ISBN:
(纸本)0897914759
Higher order functional programs constantly allocate objects dynamically. These objects are typically cons cells, closures, and records and are generally allocated in the heap and reclaimed later by some garbage collection process. This paper describes a compile time analysis, called escape analysis, for determining the lifetime of dynamically created objects in higher order functional programs, and describes optimizations that can be performed, based on the analysis, to improve storage allocation and reclamation of such objects. In particular, our analysis can be applied to programs manipulating lists, in which case optimizations can be performed to allow cons cells in spines of lists to be either reclaimed immediately or reused without incurring any garbage collection overhead. In a previous paper on escape analysis, we had left open the problem of performing escape analysis on lists. Escape analysis simply determines when the argument (or some part of the argument) to a function call is returned by that call. This simple piece of information turns out to be sufficiently powerful to allow stack allocation of objects, compile-time garbage collection, reduction of run-time storage reclamation overhead, and other optimizations that are possible when the lifetimes of objects can be computed statically. Our approach is to define a high-level non-standard semantics that, in many ways, is similar to the standard semantics and captures the escape behavior caused by the constructs in a functional language. The advantage of our analysis lies in its conceptual simplicity and portability (i.e. no assumption is made about an underlying abstract machine).
暂无评论