Architectural simulators are essential tools for computer architecture and systems research and development. Simulators, however, are becoming frustratingly slow, because they must now model increasingly complex micro...
详细信息
ISBN:
(纸本)9781581134148
Architectural simulators are essential tools for computer architecture and systems research and development. Simulators, however, are becoming frustratingly slow, because they must now model increasingly complex micro-architectures running realistic workloads. Previously, we developed a technique called fast-forwarding, which applied partial evaluation and memoization to improve the performance of detailed architectural simulations by as much as an order of magnitude [14]. While writing a detailed processor simulator is difficult, implementing fast-forwarding is even more complex. This paper describes Facile, a domain-specific language for writing detailed, accurate micro-architecture simulators. Architectural descriptions written in Facile can be compiled, using partial evaluation techniques, into fast-forwarding simulators that achieve significant performance improvements with far less programmer effort. Facile and its compiler make this performance-enhancing technique accessible to computer architects.
The deployment of Java as a concurrent programminglanguage has created a critical need for high-performance, concurrent, and incremental multiprocessor garbage collection. We present the Recycler, a fully concurrent ...
详细信息
ISBN:
(纸本)9781581134148
The deployment of Java as a concurrent programminglanguage has created a critical need for high-performance, concurrent, and incremental multiprocessor garbage collection. We present the Recycler, a fully concurrent pure reference counting garbage collector that we have implemented in the Jalapeno Java virtual machine running on shared memory multiprocessors. While a variety of multiprocessor collectors have been proposed and some have been implemented, experimental data is limited and there is little quantitative basis for comparison between different algorithms. We present measurements of the Recycler and compare it against a non-concurrent but parallel load-balancing mark-and-sweep collector (that we also implemented in Jalapeno), and evaluate the classical tradeoff between response time and throughput. When processor or memory resources are limited, the Recycler runs at about 90% of the speed of the mark-and-sweep collector. However, with an extra processor to run collection and with a moderate amount of memory headroom, the Recycler is able to operate without ever blocking the mutators and achieves a maximum measured mutator delay of only 2.6 milliseconds for our benchmarks. End-to-end execution time is usually within 5%.
The proceedings contains 30 papers. Topics discussed include runtime techniques, pointer analysis, program correctness, compilation for parallel hardware, high level transforms, java programs and java optimization.
The proceedings contains 30 papers. Topics discussed include runtime techniques, pointer analysis, program correctness, compilation for parallel hardware, high level transforms, java programs and java optimization.
Known algorithms for pointer analysis are "global" in the sense that-they perform an exhaustive analysis of a program or program component. In this paper we introduce a demand-driven approach for pointer ana...
详细信息
ISBN:
(纸本)9781581134148
Known algorithms for pointer analysis are "global" in the sense that-they perform an exhaustive analysis of a program or program component. In this paper we introduce a demand-driven approach for pointer analysis. Specifically, we describe a demand-driven flow-insensitive, subset-based, context-insensitive points-to analysis. Given a list of pointer variables (a query), our analysis performs just enough computation to determine the points-to sets for these query variables. Using deductive reachability formulations of both the exhaustive and the demand-driven analyses, we prove that our algorithm is correct. We also show that our analysis is optimal in the sense that it does not do more work than necessary. We illustrate the feasibility and efficiency of our analysis with an implementation of demand-driven points-to analysis for computing the call-graphs of C programs with function pointers. The performance of our system varies substantially across benchmarks - the main factor is how much of the points-to graph must be computed to determine the call-graph. For some benchmarks, only a small part of the points-to graph is needed (e.g povray, emacs and gee), and here we see more than a 10x speedup. For other benchmarks (e.g. burlap and gimp), we need to compute most (> 95%) of the points-to graph, and here the demand-driven algorithm is considerably slower, because using the demand-driven algorithm is a slow method of computing the full points-to graph.
In this paper, we evaluate the benefits achievable from pointer analysis and other memory disambiguation techniques for C/C++ programs, using the framework of the production compiler for the Intel ® Itanium™ pr...
详细信息
ISBN:
(纸本)9781581134148
In this paper, we evaluate the benefits achievable from pointer analysis and other memory disambiguation techniques for C/C++ programs, using the framework of the production compiler for the Intel ® Itanium™ processor. Most of the prior work on memory disambiguation has primarily focused on pointer analysis, and either presents only static estimates of the accuracy of the analysis (such as average points-to set size), or provides performance data in the context of certain individual optimizations. In contrast, our study is based on a complete memory disambiguation framework that uses a whole set of techniques including pointer analysis. Further, it presents how various compiler analyses and optimizations interact with the memory disambiguator, evaluates how much they benefit from disambiguation, and measures the eventual impact on the performance of the program. The paper also analyzes the types of disambiguation queries that are typically received by the disambiguator, which disambiguation tec hniques prove most effective in resolving them, and what type of queries prove difficult to be resolved. The study is based on empirical data collected for the SPEC CINT2000 C/C++ programs, running on the Itanium processor.
A major problem for writing extensible software arises when recursively defined datatypes and operations on these types have to be extended simultaneously without modifying existing code. This paper introduces Extensi...
详细信息
ISBN:
(纸本)9781581134155
A major problem for writing extensible software arises when recursively defined datatypes and operations on these types have to be extended simultaneously without modifying existing code. This paper introduces Extensible Algebraic Datatypes with Defaults which promote a simple programming pattern to solve this well known problem. We show that it is possible to encode extensible algebraic datatypes in an object-oriented language, using a new design pattern for extensible visitors. Extensible algebraic datatypes have been successfully applied in the implementation of an extensible Java compiler. Our technique allows for the reuse of existing components in compiler extensions without the need for any adaptations.
Lula is a system for computer-assisted stage lighting design and control. Whereas other systems for the same purpose are usually the results of long chains of incremental improvements of historic concepts. Lula repres...
详细信息
Lula is a system for computer-assisted stage lighting design and control. Whereas other systems for the same purpose are usually the results of long chains of incremental improvements of historic concepts. Lula represents a complete redesign. Whereas other systems focus on control aspects of lighting. Lula focuses on design and generates control information from it. This approach gives significantly more flexibility to the lighting designer and shortens the design process itself. Lula's design and implementation draw from a number of disciplines in advanced programming. It is written in Scheme and runs atop PLT Scheme, and benefits from its high-level GUI library. Lula uses an algebraic model for lighting looks based on just three combinators. It employs Functional Reactive programming for all dynamic aspects of lighting, and is programmable via a functional-reactive domain-specific language. Lula is an actual product and has users who have neither interest in nor knowledge of functional programming.
The high performance implementation of Java Virtual Machines (JVM) and just-in-time (JIT) compilers is directed toward adaptive compilation optimizations on the basis of online runtime profile information. This paper ...
详细信息
ISBN:
(纸本)1581133359
The high performance implementation of Java Virtual Machines (JVM) and just-in-time (JIT) compilers is directed toward adaptive compilation optimizations on the basis of online runtime profile information. This paper describes the design and implementation of a dynamic optimization framework in a production-level Java JIT compiler. Our approach is to employ a mixed mode interpreter and a three level optimizing compiler, supporting quick, full, and special optimization, each of which has a different set of tradeoffs between compilation overhead and execution speed. a lightweight sampling profiler operates continuously during the entire program's exectuion. When necessary, detailed information on runtime behavior is collected by dynmiacally generating instrumentation code which can be installed to and uninstalled from the specified recompilation target code. Value profiling with this instrumentation mechanism allows fully automatic code specialization to be performed on the basis of specific parameter values or global data at the highest optimization level. The experimental results show that our approach offers high performance and a low code expansion ratio in both program startup and steady state measurements in comparison to the compile-only approach, and that the code specialization can also contribute modest performance improvement
We present a new technique for removing unnecessary synchronization operations from statically compiled Java programs. Our approach improves upon current efforts based on escape analysis, as it can eliminate synchroni...
详细信息
ISBN:
(纸本)9781581131994
We present a new technique for removing unnecessary synchronization operations from statically compiled Java programs. Our approach improves upon current efforts based on escape analysis, as it can eliminate synchronization opt rations even on objects that escape their allocating threads. It makes use of a compact, equivalence-class-based representation that eliminates the need for fixed point operations during the analysis. We describe and evaluate the performance of an implementation in the Marmot native Java compiler. For the benchmark programs examined, the optimization removes 100% of the dynamic synchronization operations in single-threaded programs, and 0-99% in multi-threaded programs, at a low cost in additional compilation time and code growth.
Functional Reactive programming, or FRP. is a general framework for programming hybrid systems in a high-level, declarative manner. The key ideas in FRP are its notions of behaviors and events. Behaviors are time-vary...
详细信息
ISBN:
(纸本)9781581131994
Functional Reactive programming, or FRP. is a general framework for programming hybrid systems in a high-level, declarative manner. The key ideas in FRP are its notions of behaviors and events. Behaviors are time-varying, reactive value while events are time-ordered sequences of discrete-time event occurrences. FRP is the essence of Fran, a domain-specific language embedded in Haskell for programming reactive animations, but FRP is now also being used in vision, robotics and other control systems applications. In this paper we explore the formal semantics of FRP and how it relates to an implementation based on streams that represent (and therefore only approximate) continuous behaviors. We show that, in the limit as the sampling interval goes to zero, the implementation is faithful to the formal, continuous semantics, but only when certain constraints on behaviors are observed. We explore the nature of these constraints, which vary amongst the FRP primitives. Our results show both the pou er and limitations of this approach to languagedesign and implementation. As an example of a limitation, we show that streams are incapable of representing instantaneous predicate events over behaviors.
暂无评论