Existing research understates the benefits that can be obtained from inlining and cloning, especially when guided by profile information. Our implementation of inlining and cloning yields excellent results on average ...
ISBN:
(纸本)9780897919074
Existing research understates the benefits that can be obtained from inlining and cloning, especially when guided by profile information. Our implementation of inlining and cloning yields excellent results on average and very rarely lowers performance. We believe our good results can be explained by a number of factors: inlining at the intermediate-code level removes most technical restrictions on what can be inlined; the ability to inline across files and incorporate profile information enables us to choose better inline candidates; a high-quality back end can exploit the scheduling and register allocation opportunities presented by larger subroutines; an aggressive processor architecture benefits from more predictable branch behavior; and a large instruction cache mitigates the impact of code expansion. We describe the often dramatic impact of our inlining and cloning on performance: for example, the implementations of our inlining and cloning algorithms in the HP-UX 10.20 compilers boost SPECint95 performance on a PA8000-based workstation by a factor of 1.32.
This paper describes the implementation of a prototype expert system that provides standard cell design advice in the areas of performance, manufacturability, testability, and overall design quality, from a netlist de...
ISBN:
(纸本)9780818607813
This paper describes the implementation of a prototype expert system that provides standard cell design advice in the areas of performance, manufacturability, testability, and overall design quality, from a netlist description of an application specific integrated circuit (ASIC) *** system is currently being developed on a Symbolics Lisp Machine using a hybrid AI programminglanguage, called Proteus. The language integrates both known and novel AI programming techniques into a robust vehicle for the implementation of knowledge-based *** representation schemes, inference mechanisms, and knowledge acquisition techniques will be presented, including representative examples from within the knowledge base.
The aim of ÆMINIUM is to study the implications of having a concurrent-by-default programminglanguage. This includes languagedesign, runtime system, performance and software engineering *** conduct our study th...
详细信息
ISBN:
(纸本)9781450327848
The aim of ÆMINIUM is to study the implications of having a concurrent-by-default programminglanguage. This includes languagedesign, runtime system, performance and software engineering *** conduct our study through the design of the concurrent-by-default ÆMINIUM programminglanguage. ÆMINIUM leverages the permission flow of object and group permissions through the program to validate the program's correctness and to automatically infer a possible parallelization strategy via a dataflow graph. ÆMINIUM supports not only fork-join parallelism but more general dataflow patterns of *** this paper we present a formal system, called μÆMINIUM, modeling the core concepts of ÆMINIUM. μÆMINIUM's static type system is based on Featherweight Java with ÆMINIUM-specific extensions. Besides checking for correctness ÆMINIUM's type system it also uses the permission flow to compute a potential parallel execution strategy for the program. μÆMINIUM's dynamic semantics use a concurrent-by-default evaluation approach. Along with the formal system we present its soundness *** provide a full description of the implementation along with the description of various optimization techniques we used. We implemented ÆMINIUM as an extension of the Plaid programminglanguage, which has first-class support for permissions built-in. The ÆMINIUM implementation and all case studies are publicly available under the General Public *** use various case studies to evaluate ÆMINIUM's applicability and to demonstrate that ÆMINIUM parallelized code has performance improvements compared to its sequential counterpart. We chose to use case studies from common domains or problems that are known to benefit from parallelization, to show that ÆMINIUM is powerful enough to encode them. We demonstrate through a webserver application, which evaluates ÆMINIUM's impact on latency-bound applications, that ÆMINIUM can achieve a 70% performance improvement over the sequential counterp
Object-oriented programminglanguages allow inter-object aliasing. Although necessary to construct linked data structures and networks of interacting objects, aliasing is problematic in that an aggregate object's ...
详细信息
ISBN:
(纸本)9781581130058
Object-oriented programminglanguages allow inter-object aliasing. Although necessary to construct linked data structures and networks of interacting objects, aliasing is problematic in that an aggregate object's state can change via an alias to one of its components, without the aggregate being aware of any *** types form a static type system that indicates object ownership. This provides a flexible mechanism to limit the visibility of object references and restrict access paths to objects, thus controlling a system's dynamic topology. The type system is shown to be sound, and the specific aliasing properties that a system's object graph satisfies are formulated and proven invariant for well-typed programs.
We present a novel approach to dynamic datarace detection for multithreaded object-oriented programs. Past techniques for on-the-fly datarace detection either sacrificed precision for performance, leading to many fals...
详细信息
ISBN:
(纸本)9781581134636
We present a novel approach to dynamic datarace detection for multithreaded object-oriented programs. Past techniques for on-the-fly datarace detection either sacrificed precision for performance, leading to many false positive datarace reports, or maintained precision but incurred significant overheads in the range of 3x to 30x. In contrast, our approach results in very few false positives and runtime overhead in the 13% to 42% range, making it both efficient and precise. This performance improvement is the result of a unique combination of complementary static and dynamic optimization techniques.
Recent shared-memory parallel computer systems offer the exciting possibility of customizing memory coherence protocols to fit an application's semantics and sharing patterns. Custom protocols have been used to ac...
ISBN:
(纸本)9780897917957
Recent shared-memory parallel computer systems offer the exciting possibility of customizing memory coherence protocols to fit an application's semantics and sharing patterns. Custom protocols have been used to achieve message-passing performance---while retaining the convenient programming model of a global address space---and to implement high-level language constructs. Unfortunately, coherence protocols written in a conventional language such as C are difficult to write, debug, understand, or modify. This paper describes Teapot, a small, domain-specific language for writing coherence protocols. Teapot uses continuations to help reduce the complexity of writing protocols. Simple static analysis in the Teapot compiler eliminates much of the overhead of continuations and results in protocols that run nearly as fast as hand-written C code. A Teapot specification can be compiled both to an executable coherence protocol and to input for a model checking system, which permits the specification to be verified. We report our experiences coding and verifying several protocols written in Teapot, along with measurements of the overhead incurred by writing a protocol in a higher-level language.
This paper describes a lightweight yet powerful approach for writing distributed applications using shared variables. Our approach, called SHAREHOLDER, is inspired by the flexible and intuitive model of information ac...
详细信息
ISBN:
(纸本)9781581130058
This paper describes a lightweight yet powerful approach for writing distributed applications using shared variables. Our approach, called SHAREHOLDER, is inspired by the flexible and intuitive model of information access common to the World Wide Web. The distributed applications targeted by our approach all share a weak consistency model and loose transaction semantics, similar to a user's model of accessing email, bulletin boards, chat rooms, etc. on the Internet. The SHAREHOLDER infrastructure has several advantages. Its highly object-oriented view of shared variables simplifies their initialization and configuration. A shared variable's distribution mechanism is specified through an associated configuration object, and the programmer does not need to write any extra code to implement the sharing mechanism. These configuration objects can be initialized at run-time, allowing tremendous flexibility in dynamic control of distribution of shared variables. Finally, the programmer can treat shared variables and local variables interchangeably, thus simplifying conversion of a serial application into a distributed application.
This paper describes the distributed memory implementation of a shared memory parallel functional language. The language is Id, an implicitly parallel, mostly functional language that is currently evolving into a dial...
详细信息
ISBN:
(纸本)9780897917704
This paper describes the distributed memory implementation of a shared memory parallel functional language. The language is Id, an implicitly parallel, mostly functional language that is currently evolving into a dialect of Haskell. The target is a distributed memory machine, because we expect these to be the most widely available parallel platforms in the future. The difficult problem is to bridge the gap between the shared memory language model and the distributed memory machine model. The language model assumes that all data is uniformly accessible, whereas the machine has a severe memory hierarchy: a processor's access to remote memory (using explicit communication) is orders of magnitude slower than its access to local memory. Thus, avoiding communication is crucial for good performance. The Id language, and its general dataflow-inspierd compilation to multithreaded code are described elsewhere. In this paper, we focus on our new parallel runtime system and its features for avoiding communication and for tolerating its latency when necessary: multithreading, scheduling and load balancing; the distributed heap model and distributed coherent cacheing, and parallel garbage collection. We have completed the first implementation, and we present some preliminary performance mearsurements.
We consider the problem of supporting compacting garbage collection in the presence of modern compiler optimizations. Since our collector may move any heap object, it must accurately locate, follow, and update all poi...
ISBN:
(纸本)9780897914758
We consider the problem of supporting compacting garbage collection in the presence of modern compiler optimizations. Since our collector may move any heap object, it must accurately locate, follow, and update all pointers and values derived from pointers. To assist the collector, we extend the compiler to emit tables describing live pointers, and values derived from pointers, at each program location where collection may occur. Significant results include identification of a number of problems posed by optimizations, solutions to those problems, a working compiler, and experimental data concerning table sizes, table compression, and time overhead of decoding tables during collection. While gc support can affect the code produced, our sample programs show no significant changes, the table sizes are a modest fraction of the size of the optimized code, and stack tracing is a small fraction of total gc time. Since the compiler enhancements are also modest, we conclude that the approach is practical.
Introducing a geometrical approach to propositional divalent logic with the truth-tables of the Boolean connectives as the geometrical objects, we can make a simple operational implementation in APL of basic inference...
ISBN:
(纸本)9780901865359
Introducing a geometrical approach to propositional divalent logic with the truth-tables of the Boolean connectives as the geometrical objects, we can make a simple operational implementation in APL of basic inference methods. This is the basis for a logic programming tool used for preliminary product design at the Danish company Bang&Olufsen.
暂无评论