We present a novel approach to dynamic datarace detection for multithreaded object-oriented programs. Past techniques for on-the-fly datarace detection either sacrificed precision for performance, leading to many fals...
详细信息
ISBN:
(纸本)9781581134636
We present a novel approach to dynamic datarace detection for multithreaded object-oriented programs. Past techniques for on-the-fly datarace detection either sacrificed precision for performance, leading to many false positive datarace reports, or maintained precision but incurred significant overheads in the range of 3x to 30x. In contrast, our approach results in very few false positives and runtime overhead in the 13% to 42% range, making it both efficient and precise. This performance improvement is the result of a unique combination of complementary static and dynamic optimization techniques.
The current practice of mapping computations to custom hardware implementations requires programmers to assume the role of hardware designers. In tuning the performance of their hardware implementation, designers manu...
详细信息
ISBN:
(纸本)9781581134636
The current practice of mapping computations to custom hardware implementations requires programmers to assume the role of hardware designers. In tuning the performance of their hardware implementation, designers manually apply loop transformations such as loop unrolling. designers manually apply loop transformations. For example, loop unrolling is used to expose instruction-level parallelism at the expense of more hardware resources for concurrent operator evaluation. Because unrolling also increases the amount of data a computation requires, too much unrolling can lead to a memory bound implementation where resources are idle. To negotiate inherent hardware space-time trade-offs, designers must engage in an iterative refinement cycle, at each step manually applying transformations and evaluating their impact. This process is not only error-prone and tedious but also prohibitively expensive given the large search spaces and with long synthesis times. This paper describes an automated approach to hardware design space exploration, through a collaboration between parallelizing compiler technology and high-level synthesis tools. We present a compiler algorithm that automatically explores the large design spaces resulting from the application of several program transformations commonly used in application-specific hardware designs. Our approach uses synthesis estimation techniques to quantitatively evaluate alternate designs for a loop nest computation. We have implemented this design space exploration algorithm in the context of a compilation and synthesis system called DEFACTO, and present results of this implementation on five multimedia kernels. Our algorithm derives an implementation that closely matches the performance of the fastest design in the design space, and among implementations with comparable performance, selects the smallest design. We search on average only 0.3% of the design space. This technology thus significantly raises the level of abstraction fo
The Microsoft .NET Common language Runtime provides a shared type system, intermediate language and dynamic execution environment for the implementation and inter-operation of multiple source languages. In this paper ...
详细信息
ISBN:
(纸本)9781581134148
The Microsoft .NET Common language Runtime provides a shared type system, intermediate language and dynamic execution environment for the implementation and inter-operation of multiple source languages. In this paper we extend it with direct support for parametric polymorphism (also known as generics), describing the design through examples written in an extended version of the C# programminglanguage, and explaining aspects of implementation by reference to a prototype extension to the runtime. Our design is very expressive, supporting parameterized types, polymorphic static, instance and virtual methods, "F-bounded" type parameters, instantiation at pointer and value types, polymorphic recursion, and exact run-time types. The implementation takes advantage of the dynamic nature of the runtime, performing justin-time type specialization, representation-based code sharing and novel techniques for efficient creation and use of Nn-time types. Early performance results are encouraging and suggest that programmers will not need to pay an overhead for using generics, achieving performance almost matching hand-specialized code.
We discuss the design and implementation of a compiler that translates formulas representing signal processing transforms into efficient C or Fortran programs. The formulas are represented in a language that we call S...
详细信息
ISBN:
(纸本)9781581134148
We discuss the design and implementation of a compiler that translates formulas representing signal processing transforms into efficient C or Fortran programs. The formulas are represented in a language that we call SPL, an acronym from Signal Processing language. The compiler is a component of the SPIRAL system which makes use of formula transformations and intelligent search strategies to automatically generate optimized digital signal processing (DSP) libraries. After a discussion of the translation and optimization techniques implemented in the compiler, we use SPL formulations of the fast Fourier transform (FFT) to evaluate the compiler. Our results show that SPIRAL, which can be used to implement many classes of algorithms, produces programs that perform as well as "hard-wired" systems like FFTW.
We present an extension of field analysis (see [4]) called related field analysis which is a general technique for proving relationships between two or more fields of an object. We demonstrate the feasibility and appl...
详细信息
ISBN:
(纸本)9781581134148
We present an extension of field analysis (see [4]) called related field analysis which is a general technique for proving relationships between two or more fields of an object. We demonstrate the feasibility and applicability of related field analysis by applying it to the problem of removing array bounds checks. For array bounds check removal, we define a pair of related fields to be an integer field and an array field for which the integer field has a known relationship to the length of the array. This related field information can then be used to remove array bounds checks from accesses to the array field. Our results show that related field analysis can remove an average of 50% of the dynamic array bounds checks on-a wide range of applications. We describe the implementation of related field analysis in the Swift optimizing compiler for Java, as well as the optimizations that exploit the results of related field analysis.
Most:programminglanguages use static scope rules for associating uses of identifiers with their declarations. Static scope helps catch errors at compile time, and it can be implemented efficiently. Some popular langu...
详细信息
ISBN:
(纸本)9781581134148
Most:programminglanguages use static scope rules for associating uses of identifiers with their declarations. Static scope helps catch errors at compile time, and it can be implemented efficiently. Some popular languages - Perl, Tcl, TeX, and Postscript - offer dynamic scope, because dynamic scope works well for variables that "customize" the execution environment, for example. Programmers must simulate dynamic scope to implement this kind of usage in statically scoped languages. This paper describes the design and implementation of imperative language constructs for introducing and referencing dynamically scoped variables-dynamic variables for short. The design is a minimalist one, because dynamic variables are best used sparingly, much like exceptions. The facility does, however, cater to the typical uses for dynamic scope, and it provides a cleaner mechanism for so-called thread-local variables. A particularly simple implementation suffices for languages without exception handling. For languages with exception handling, a more efficient implementation builds on existing compiler infrastructure. Exception handling can be viewed as a control construct with dynamic scope. Likewise, dynamic variables are a data construct with dynamic scope.
Region-based memory management systems structure memory by grouping objects in regions under program control. Memory is reclaimed by deleting regions, freeing all objects stored therein. Our compiler for C with region...
详细信息
ISBN:
(纸本)9781581134148
Region-based memory management systems structure memory by grouping objects in regions under program control. Memory is reclaimed by deleting regions, freeing all objects stored therein. Our compiler for C with regions, RC, prevents unsafe region deletions by keeping a count of references to each region. Using type annotations that make the structure of a program's regions more explicit, we reduce the overhead of reference counting from a maximum of 27% to a maximum of 11% on a suite of realistic benchmarks. We generalise these annotations in a region type system whose main novelty is the use of existentially quantified abstract regions to represent pointers to objects whose region is partially or totally unknown. A distribution of RC is available at http://***/(similar to)dgay/***. gz.
This paper presents the design and implementation of Event-driven State-machines programming (ESP) - a language for programmable devices. In traditional languages, like C, using event-driven state-machines forces a tr...
详细信息
ISBN:
(纸本)9781581134148
This paper presents the design and implementation of Event-driven State-machines programming (ESP) - a language for programmable devices. In traditional languages, like C, using event-driven state-machines forces a tradeoff that requires giving up ease of development and reliability to achieve high performance ESP is designed to provide all of these three properties simultaneously. ESP provides a comprehensive set of features to support development of compact and modular programs. The ESP compiler compiles the programs into two targets - a C file that can be used to generate efficient firmware For the device;and a specification that can be used by a verifier like SPIN to extensively test the firmware. As a case study, we reimplemented VMMC firmware that;runs on Myrinet network interface cards using ESP. We found that ESP simplifies the task of programming with event-driven state machines. It required an order of magnitude fewer lines of code than the previous implementation. We also found that model-checking verifiers like SPIN can be used to effectively debug the firmware. Finally. our measurements indicate that, the performance overhead of using ESP is relatively small.
Asynchronous exceptions, such as timeouts, are important for robust, modular programs, but are extremely difficult to program with - so much so that most programminglanguages either heavily restrict them or ban them ...
详细信息
ISBN:
(纸本)9781581134148
Asynchronous exceptions, such as timeouts, are important for robust, modular programs, but are extremely difficult to program with - so much so that most programminglanguages either heavily restrict them or ban them altogether. We extend our earlier work, in which we added synchronous exceptions to Haskell, to support asynchronous exceptions too. Our design introduces scoped combinators for blocking and unblocking asynchronous interrupts, along with a somewhat surprising semantics for operations that can suspend. Uniquely, we also give a formal semantics for our system.
We present a new pointer and escape analysis. Instead of analyzing the whole program, the algorithm incrementally analyzes only those parts of the program that may deliver useful results. An analysis policy monitors t...
详细信息
ISBN:
(纸本)9781581134148
We present a new pointer and escape analysis. Instead of analyzing the whole program, the algorithm incrementally analyzes only those parts of the program that may deliver useful results. An analysis policy monitors the analysis results to direct the incremental investment of analysis resources to those parts of the program that offer the highest expected optimization return. Our experimental results show that almost all of the objects are allocated at a small number of allocation sites and that an incremental analysis of a small region of the program surrounding each site can deliver almost all of the benefit of a whole-program analysis. Our analysis policy is usually able to deliver this benefit at a fraction of the whole-program analysis cost.
暂无评论