Dynamic binary instrumentation (DBI) frameworks make it easy to build dynamic binary analysis (DBA) tools such as checkers and profilers. Much of the focus on DBI frameworks has been on performance;little attention ha...
详细信息
ISBN:
(纸本)9781595936332
Dynamic binary instrumentation (DBI) frameworks make it easy to build dynamic binary analysis (DBA) tools such as checkers and profilers. Much of the focus on DBI frameworks has been on performance;little attention has been paid to their capabilities. As a result, we believe the potential of DBI has not been fully exploited. In this paper we describe Valgrind, a DBI framework designed for building heavyweight DBA tools. We focus on its unique support for shadow values - a powerful but previously little-studied and difficult-to-implement DBA technique, which requires a tool to shadow every register and memory value with another value that describes it. This support accounts for several crucial design features that distinguish Valgrind from other DBI frameworks. Because of these features, lightweight tools built with Valgrind run comparatively slowly, but Valgrind can be used to build more interesting, heavyweight tools that are difficult or impossible to build with other DBI frameworks such as Pin and DynamoRIO.
Since the introduction of object-oriented programming few programminglanguages have attempted to provide programmers with more than objects and classes, i.e., more than two levels. Those that did, almost exclusively ...
详细信息
Since the introduction of object-oriented programming few programminglanguages have attempted to provide programmers with more than objects and classes, i.e., more than two levels. Those that did, almost exclusively aimed at describing language properties - i. e., their metaclasses exert linguistic control on language concepts and mechanisms often in order to make the language extensible. In terms of supporting logical domain classification levels, however, they are still limited to two levels. In this paper we conservatively extend the object-oriented programming paradigm to feature an unbounded number of domain classification levels. We can therefore avoid the introduction of accidental complexity into programs caused by accommodating multiple domain levels within only two programming levels. We present a corresponding languagedesign featuring "deep instantiation" and demonstrate its features with a running example. Finally, we outline the implementation of our compiler prototype and discuss the potentials of further developing our languagedesign.
A general class of program analyses are a combination of context-free and regular language reachability. We define regularly annotated set constraints, a constraint formalism that captures this class. Our results exte...
详细信息
A general class of program analyses are a combination of context-free and regular language reachability. We define regularly annotated set constraints, a constraint formalism that captures this class. Our results extend the class of reachability problems expressible naturally in a single constraint formalism, including such diverse applications as interprocedural dataflow analysis, precise type-based flow analysis, and pushdown model checking.
Performance of stencil computations can be significantly improved through smart implementations that improve memory locality, computation reuse, or parallelize the computation. Unfortunately, efficient implementations...
详细信息
ISBN:
(纸本)9781595936332
Performance of stencil computations can be significantly improved through smart implementations that improve memory locality, computation reuse, or parallelize the computation. Unfortunately, efficient implementations are hard to obtain because they often involve non-traditional transformations, which means that they cannot be produced by optimizing the reference stencil with a compiler. In fact, many stencils are produced by code generators that were tediously handcrafted. In this paper, we show how stencil implementations can be produced with sketching. Sketching is a software synthesis approach where the programmer develops a partial implementation a sketch-and a separate specification of the desired functionality given by a reference (unoptimized) stencil. The synthesizer then completes the sketch to behave like the specification, filling in code fragments that are difficult to develop manually. Existing sketching systems work only for small finite programs, i.e., programs that can be represented as small Boolean circuits. In this paper, we develop a sketching synthesizer that works for stencil computations, a large class of programs that, unlike circuits, have unbounded inputs and outputs, as well as an unbounded number of computations. The key contribution is a reduction algorithm that turns a stencil into a circuit, allowing us to synthesize stencils using an existing sketching synthesizer.
Divide-and-conquer algorithms are suitable for modern parallel machines, tending to have large amounts of inherent parallelism and working well with caches and deep memory hierarchies. Among others, list homomorphisms...
详细信息
ISBN:
(纸本)9781595936332
Divide-and-conquer algorithms are suitable for modern parallel machines, tending to have large amounts of inherent parallelism and working well with caches and deep memory hierarchies. Among others, list homomorphisms are a class of recursive functions on lists, which match very well with the divide-and-conquer paradigm. However, direct programming with list homomorphisms is a challenge for many programmers. In this paper, we propose and implement a novel system that can automatically derive cost-optimal list homomorphisms from a pair of sequential programs, based on the third homomorphism theorem. Our idea is to reduce extraction of list homomorphisms to derivation of weak right inverses. We show that a weak right inverse always exists and can be automatically generated from a wide class of sequential programs. We demonstrate our system with several nontrivial examples, including the maximum prefix sum problem, the prefix sum computation, the maximum segment sum problem, and the line-of-sight problem. The experimental results show practical efficiency of our automatic parallelization algorithm and good speedups of the generated parallel programs.
C-Rules is a business rules management system developed by Constraint Technologies International(1) (CTI), designed for use in transportation problems. Users define rules describing various aspects of a problem, such ...
详细信息
C-Rules is a business rules management system developed by Constraint Technologies International(1) (CTI), designed for use in transportation problems. Users define rules describing various aspects of a problem, such as solution costs and legality, which are then queried from a host application, typically an optimising solver. At its core, C-Rules provides a functional expression language which affords users both power and flexibility when formulating rules. In this paper we will describe our experiences of using functional programming both at the end-user level, as well as at the implementation level. We highlight some of the benefits we, and the product's users, have enjoyed from the decision to base our rule system on features such as: higher-order functions, referential transparency, and static, polymorphic typing. We also outline some of our experiences in using Haskell to build an efficient compiler for the core language.
Future mainstream microprocessors will likely integrate specialized accelerators, such as GPUs, onto a single die to achieve better performance and power efficiency. However, it remains a keen challenge to program suc...
详细信息
Future mainstream microprocessors will likely integrate specialized accelerators, such as GPUs, onto a single die to achieve better performance and power efficiency. However, it remains a keen challenge to program such a heterogeneous multi-core platform, since these specialized accelerators feature ISAs and functionality that are significantly different from the general purpose CPU cores. In this paper, we present EXOCHI: (1) Exoskeleton Sequencer (EXO), an architecture to represent heterogeneous accelerators as ISA-based MIMD architecture resources, and a shared virtual memory heterogeneous multithreaded program execution model that tightly couples specialized accelerator cores with general purpose CPU cores, and (2) C for Heterogeneous Integration (CHI), an integrated C/C++ programming environment that supports accelerator-specific inline assembly and domain-specific languages. The CHI compiler extends the OpenMP pragma for heterogeneous multithreading programming, and produces a single fat binary with code sections corresponding to different instruction sets. The runtime can judiciously spread parallel computation across the heterogeneous cores to optimize performance and power. We have prototyped the EXO architecture on a physical heterogeneous platform consisting of an Intel (R) Core (TM) 2 Duo processor and an 8-core 32-thread Intel (R) Graphics Media Accelerator X3000. In addition, we have implemented the CHI integrated programming environment with the Intel (R) C++ Compiler, runtime toolset, and debugger. On the EXO prototype system, we have enhanced a suite of production-quality media kernels for video and image processing to utilize the accelerator through the CHI programming interface, achieving significant speedup (1.41X to 10.97X) over execution on the IA32 CPU alone.
This paper proposes to combine two seemingly opposed programming models for building massively concurrent network services: the event-driven model and the multithreaded model. The result is a hybrid design that offers...
详细信息
This paper proposes to combine two seemingly opposed programming models for building massively concurrent network services: the event-driven model and the multithreaded model. The result is a hybrid design that offers the best of both worlds-the ease of use and expressiveness of threads and the flexibility and performance of events. This paper shows how the hybrid model can be implemented entirely at the application level using concurrency monads in Haskell, which provides type-safe abstractions for both events and threads. This approach simplifies the development of massively concurrent software in a way that scales to real-world network services. The Haskell implementation supports exceptions, symmetrical multiprocessing, software transactional memory, asynchronous I/O mechanisms and application-level network protocol stacks. Experimental results demonstrate that this monad-based approach has good performance: the threads are extremely lightweight (scaling to ten million threads), and the I/O performance compares favorably to that of Linux NPTL.
Operators for delimiting control and for capturing composable continuations litter the landscape of theoretical programminglanguage research. Numerous papers explain their advantages, how the operators explain each o...
详细信息
Operators for delimiting control and for capturing composable continuations litter the landscape of theoretical programminglanguage research. Numerous papers explain their advantages, how the operators explain each other (or don't), and other aspects of the operators' existence. Production programminglanguages, however, do not support these operators, partly because their relationship to existing and demonstrably useful constructs-such as exceptions and dynamic binding-remains relatively unexplored. In this paper, we report on our effort of translating the theory of delimited and composable control into a viable implementation for a production system. The report shows how this effort involved a substantial design element, including work with a formal model, as well as significant practical exploration and engineering. The resulting version of PLT Scheme incorporates the expressive combination of delimited and composable control alongside dynamic-wind, dynamic binding, and exception handling. None of the additional operators subvert the intended benefits of existing control operators, so that programmers can freely mix and match control operators.
Embedded systems pose unique challenges to Java application developers and virtual machine designers. Chief among these challenges is the memory footprint of both the virtual machine and the applications that run with...
详细信息
ISBN:
(纸本)9781595936332
Embedded systems pose unique challenges to Java application developers and virtual machine designers. Chief among these challenges is the memory footprint of both the virtual machine and the applications that run within it. With the rapidly increasing set of features provided by the Java language, virtual machine designers are often forced to build custom implementations that make various tradeoffs between the footprint of the virtual machine and the subset of the Java language and class libraries that are supported. In this paper, we present the Exo VM, a system in which an application is initialized in a fully featured virtual machine, and then the code, data, and virtual machine features necessary to execute it are packaged into a binary image. Key to this process is feature analysis, a technique for computing the reachable code and data of a Java program and its implementation inside the VM simultaneously. The Exo VM reduces the need to develop customized embedded virtual machines by reusing a single VM infrastructure and automatically eliding the implementation of unused Java features on a per-program basis. We present a constraint-based instantiation of the analysis technique, an implementation in IBM's J9 Java VM, experiments evaluating our technique for the EEMBC benchmark suite, and some discussion of the individual costs of some of Java's features. Our evaluation shows that our system can reduce the non-heap memory allocation of the virtual machine by as much as 75%. We discuss VM and languagedesign decisions that our work shows are important in targeting embedded systems, supporting the long-term goal of a common VM infrastructure spanning from motes to large servers.
暂无评论