During execution, when two or more names exist for the same location at some program point, we call them aliases. In a language which allows arbitrary pointers, the problem of determining aliases at a program point is...
详细信息
ISBN:
(纸本)0897914759
During execution, when two or more names exist for the same location at some program point, we call them aliases. In a language which allows arbitrary pointers, the problem of determining aliases at a program point is P-space-hard [Lan92]. We present an algorithm for the Conditional May Alias problem, which can be used to safely approximate Interprocedural May Alias in the presence of pointers. This algorithm is as precise as possible in the worst case and has been implemented in a prototype analysis tool for C programs. Preliminary speed and precision results are presented.
This paper describes a general framework for representing iteration-reordering transformations. These transformations can be both matrix-based and non-matrix-based. Transformations are defined by rules for mapping dep...
详细信息
ISBN:
(纸本)0897914759
This paper describes a general framework for representing iteration-reordering transformations. These transformations can be both matrix-based and non-matrix-based. Transformations are defined by rules for mapping dependence vectors, rules for mapping loop bound expressions, and rules for creating new initialization statements. The framework is extensible, and can be used to represent any iteration-reordering transformation. Mapping rules for several common transformations are included in the paper.
A new global register allocation technique, probabilistic register allocation, is described. Probabilistic register allocation quantifies the costs and benefits of allocating variables to registers over live ranges so...
详细信息
ISBN:
(纸本)0897914759
A new global register allocation technique, probabilistic register allocation, is described. Probabilistic register allocation quantifies the costs and benefits of allocating variables to registers over live ranges so that excellent allocation choices can be made. Local allocation is done first, and then global allocation is done iteratively beginning in the the most deeply nested loops. Because local allocation precedes global allocation, probabilistic allocation does not interfere with the use of well-known, high-quality local register allocation and instruction scheduling techniques.
HoME is a version of Smalltalk which can be efficiently executed on a multiprocessor and can be executed in parallel by combining a Smalltalk process with a Mach thread and executing the process on the thread. HoME is...
详细信息
ISBN:
(纸本)0897914759
HoME is a version of Smalltalk which can be efficiently executed on a multiprocessor and can be executed in parallel by combining a Smalltalk process with a Mach thread and executing the process on the thread. HoME is nearly the same as ordinary Smalltalk except that multiple processes may execute in parallel. Thus, almost all applications running on ordinary Smalltalk can be executed on HoME without changes in their code. HoME was designed and implemented based on the following fundamental policies: (1) theoretically, an infinite number of processes can become active;(2) the moment a process is scheduled, it becomes active;(3) no process switching occurs;(4) HoME is equivalent to ordinary Smalltalk except for the previous three policies. The performance of the current implementation of HoME running on OMRON LUNA-88K, which had four processors, was measured by benchmarks which execute in parallel with multiple processes. In all benchmarks, the results showed that HoME's performance is much better than HPS on the same workstation.
We present a bit-vector algorithm for the optimal and economical placement of computations within flow graphs, which is as efficient as standard uni-directional analyses. The point of our algorithm is the decompositio...
详细信息
ISBN:
(纸本)0897914759
We present a bit-vector algorithm for the optimal and economical placement of computations within flow graphs, which is as efficient as standard uni-directional analyses. The point of our algorithm is the decomposition of the bi-directional structure of the known placement algorithms into a sequence of a backward and a forward analysis, which directly implies the efficiency result. Moreover, the new compositional structure opens the algorithm for modification: two further uni-directional analysis components exclude any unnecessary code motion. This laziness of our algorithm minimizes the register pressure, which has drastic effects on the run-time behaviour of the optimized programs in practice, where an economical use of registers is essential.
We describe an approach to implementing a wide-range of concurrency paradigms in high-level (symbolic) programminglanguages. The focus of our discussion is STING, a dialect of Scheme, that supports lightweight thread...
详细信息
ISBN:
(纸本)0897914759
We describe an approach to implementing a wide-range of concurrency paradigms in high-level (symbolic) programminglanguages. The focus of our discussion is STING, a dialect of Scheme, that supports lightweight threads of control and virtual processors as first-class objects. Given the significant degree to which the behavior of these objects may be customized, we can easily express a variety of concurrency paradigms and linguistic structures within a common framework without loss of efficiency. Unlike parallel systems that rely on operating system services for managing concurrency, STING implements concurrency management entirely in terms of Scheme objects and procedures. It, therefore, permits users to optimize the runtime behavior of their applications without requiring knowledge of the underlying runtime system. This paper concentrates on (a) the implications of the design for building asynchronous concurrency structures, (b) organizing large-scale concurrent computations, and (c) implementing robust programming environments for symbolic computing.
This study evaluates a global optimization technique that avoids unconditional jumps by replicating code. When implemented in the back-end of an optimizing compiler, this technique can be generalized to work on almost...
详细信息
ISBN:
(纸本)0897914759
This study evaluates a global optimization technique that avoids unconditional jumps by replicating code. When implemented in the back-end of an optimizing compiler, this technique can be generalized to work on almost all instances of unconditional jumps, including those generated from conditional statements and unstructured loops. The replication method is based on the idea of finding a replacement for each unconditional jump which minimizes the growth in code size. This is achieved by choosing the shortest sequence of instructions as a replacement. Measurements taken from a variety of programs showed that not only the number of executed instructions decreased, but also that the total cache work was reduced (except for small caches) despite increases in code size. Pipelined and superscalar machines may also benefit from an increase in the average basic block size.
This paper examines a problem that arises during global register allocation - rematerialization. If a value cannot be kept in a register, the allocator should recognize when it is cheaper to recompute the value (remat...
详细信息
ISBN:
(纸本)0897914759
This paper examines a problem that arises during global register allocation - rematerialization. If a value cannot be kept in a register, the allocator should recognize when it is cheaper to recompute the value (rematerialize it) than to store and reload it. Chaitin's original graph-coloring allocator handled simple instances of this problem correctly. This paper details a general solution to the problem and presents experimental evidence that shows its importance. Our approach is to tag individual values in the procedure's SSA graph with information specifying how it should be spilled. We use a variant of Wegman and Zadeck's sparse simple constant algorithm to propagate tags throughout the graph. The allocator then splits live ranges into values with different tags. This isolates those values that can be easily rematerialized from values that require general spilling. We modify the base allocator to use this information when estimating spill costs and introducing spill code. Our presentation focuses on rematerialization in the context of Chaitin's allocator;however, the problem arises in any global allocator. We believe that our approach will work in other allocators - while the details of implementation will vary, the key insights should carry over directly.
Even though impressive progress has been made in the area of optimizing and parallelizing programs with arrays, the application of similar techniques to programs with pointer data structures has remained difficult. In...
详细信息
ISBN:
(纸本)0897914759
Even though impressive progress has been made in the area of optimizing and parallelizing programs with arrays, the application of similar techniques to programs with pointer data structures has remained difficult. In this paper we introduce a new approach that leads to improved analysis and transformation of programs with recursively-defined pointer data structures. Our approach is based on a mechanism for the Abstract Description of Data Structures (ADDS), which makes explicit the important properties, such as dimensionality, of pointer data structures. Numerous examples demonstrate that ADDS definitions are both natural to specify and flexible enough to describe complex, cyclic pointer data structures. We discuss how an abstract data structure description can improve program analysis by presenting an analysis approach that combines an alias analysis technique, path matrix analysis, with information available from an ADDS declaration. Given this improved alias analysis technique, we provide a concrete example of applying a software pipelining transformation to loops involving pointer data structures.
Many loop nests in scientific codes contain a parallelizable outer loop but have an inner loop for which the number of iterations varies between different iterations of the outer loop. When running this kind of loop n...
详细信息
ISBN:
(纸本)0897914759
Many loop nests in scientific codes contain a parallelizable outer loop but have an inner loop for which the number of iterations varies between different iterations of the outer loop. When running this kind of loop nest on a SIMD machine, the SIMD-inherent restriction to single program counter common to all processors will cause a performance degradation relative to comparable MIMD implementations. This problem is not due to limited parallelism or bad load balance, it is merely a problem of control flow. This paper presents a loop transformation, which we call loop flattening, that overcomes this limitation by letting each processor advance to the next loop iteration containing useful computation, if there is such an iteration for the given processor. We study a concrete example derived from a molecular dynamics code and compare performance results for flattened and unflattened versions of this kernel on two SIMD machines, the CM-2 and the DECmpp 12000. We then evaluate loop flattening from the compiler's perspective in terms of applicability, cost, profitability, and safety. We conclude with arguing that loop flattening, whether performed by the programmer or by the compiler, introduces negligible overhead and can significantly improve the performance of scientific codes for solving irregular problems.
暂无评论