With power-related concerns becoming dominant aspects of hardware and software design, significant research effort has been devoted towards system power minimization. Among run-time power-management techniques, dynami...
详细信息
ISBN:
(纸本)9781581136623
With power-related concerns becoming dominant aspects of hardware and software design, significant research effort has been devoted towards system power minimization. Among run-time power-management techniques, dynamic voltage scaling (DVS) has emerged as an important approach, with the ability to provide significant power savings. DVS exploits the ability to control the power consumption by varying a processor's supply voltage (V) and clock frequency (f). DVS controls energy by scheduling different parts of the computation to different (V, f) pairs;the goal is to minimize energy while meeting performance needs. Although processors like the Intel XScale and Transmeta Crusoe allow software DVS control, such control has thus far largely been used at the process/task level under operating system control. This is mainly because the energy and time overhead for switching DVS modes is considered too large and difficult to manage within a single program. In this paper we explore the opportunities and limits of compile-time DVS scheduling. We derive an analytical model for the maximum energy savings that can be obtained using DVS given a few known program and processor parameters. We use this model to determine scenarios where energy consumption benefits from compile-time DVS and those where there is no benefit. The model helps us extrapolate the benefits of compile-time DVS into the future as processor parameters change. We then examine how much of these predicted benefits can actually be achieved through optimal settings of DVS modes. This is done by extending the existing Mixed-integer Linear Program (MILP) formulation for this problem by accurately accounting for DVS energy switching overhead, by providing finer-grained control on settings and by considering multiple data categories in the optimization. Overall, this research provides a comprehensive view of compile-time DVS management, providing both practical techniques for its immediate deployment as well theoretical
Full precision in garbage collection implies retaining only those heap allocated objects that will actually be used in the future. Since full precision is not computable in general, garbage collectors use safe (i.e., ...
详细信息
Full precision in garbage collection implies retaining only those heap allocated objects that will actually be used in the future. Since full precision is not computable in general, garbage collectors use safe (i.e., conservative) approximations such as reachability from a set of root references. Ambiguous roots collectors (commonly called `conservative') can be overly conservative because they overestimate the root set, and thereby retain unexpectedly large amounts of garbage. We consider two more precise collection schemes for Java virtual machines (JVMs). One uses a type analysis to obtain a type-precise root set (only those variables that contain references);the other adds a live variable analysis to reduce the root set to only the live reference variables. Even with the Java programminglanguage's strong typing, it turns out that the JVM specification has a feature that makes type-precise root sets difficult to compute. We explain the problem and ways in which it can be solved. Our experimental results include measurements of the costs of the type and liveness analyses at load time, of the incremental benefits at run time of the liveness analysis over the type analysis alone, and of various map sizes and counts. We find that the liveness analysis often produces little or no improvement in heap size, sometimes modest improvements, and occasionally the improvement is dramatic. While further study is in order, we conclude that the main benefit of the liveness analysis is preventing bad surprises.
We specify a polymorphic type system for an applied lambda calculus that refines the string type with a subtype hierarchy derived from language containment. It enables us to find a language for each string-type expres...
详细信息
ISBN:
(纸本)1581139993
We specify a polymorphic type system for an applied lambda calculus that refines the string type with a subtype hierarchy derived from language containment. It enables us to find a language for each string-type expression such that the value of the expression is a member of that language. Type inference for this system infers language inclusion constraints that can be viewed as a context-free grammar with a non-terminal for each string-valued expression. Then we present two algorithms that solve language inclusion constraints with respect to a fixed context-free reference grammar. The solutions are sound but incomplete because the general problem of context-free language inclusion is undecidable. Both algorithms are derived from Earley's parsing algorithm for context-free languages. Taking the two parts together enables us to answer questions like: Is the value of a string-type expression derivable from a given nonterminal in the reference grammar? Copyright 2005acm.
In this paper we design a language and runtime support for isolation-only, multithreaded transactions (called tasks). Tasks allow isolation to be declared instead of having to be encoded using the low-level synchroniz...
详细信息
ISBN:
(纸本)1595930906
In this paper we design a language and runtime support for isolation-only, multithreaded transactions (called tasks). Tasks allow isolation to be declared instead of having to be encoded using the low-level synchronization constructs. The key concept of our design is the use of a type system to support rollback-free and safe runtime execution of tasks. We present a first-order type system which can verify information for the concurrency controller. We use an operational semantics to formalize and prove the type soundness result and an isolation property of tasks. The semantics uses a specialized concurrency control algorithm, that is based on access versioning. Copyright 2005acm.
In today's software industry, large-scale, multi-language codebases are the norm. This brings substantial challenges in developing automated tools for code maintenance tasks such as API migration or dead code clea...
详细信息
In today's software industry, large-scale, multi-language codebases are the norm. This brings substantial challenges in developing automated tools for code maintenance tasks such as API migration or dead code cleanup. Tool builders often find themselves caught between two less-than-ideal tooling options: (1) language-specific code rewriting tools or (2) generic, lightweight match-replace transformation tools with limited expressiveness. The former leads to tool fragmentation and a steep learning curve for each language, while the latter forces developers to create ad-hoc, throwaway scripts to handle realistic tasks. To fill this gap, we introduce a new declarative domain-specific language (DSL) for expressing interdependent multi-language code transformations. Our key insight is that we can increase the expressiveness and applicability of lightweight match-replace tools by extending them to support for composition, ordering, and flow. We implemented an open-source tool for our language, called PolyglotPiranha, and deployed it in an industrial setting. We demonstrate its effectiveness through three case studies, where it deleted 210K lines of dead code and migrated 20K lines, across 1611 pull requests. We compare our DSL against state-of-the-art alternatives, and show that the tools we developed are faster, more concise, and easier to maintain.
This paper describes a general framework for representing iteration-reordering transformations. These transformations can be both matrix-based and non-matrix-based. Transformations are defined by rules for mapping dep...
详细信息
ISBN:
(纸本)0897914759
This paper describes a general framework for representing iteration-reordering transformations. These transformations can be both matrix-based and non-matrix-based. Transformations are defined by rules for mapping dependence vectors, rules for mapping loop bound expressions, and rules for creating new initialization statements. The framework is extensible, and can be used to represent any iteration-reordering transformation. Mapping rules for several common transformations are included in the paper.
Lolli is a logic programminglanguage based on the asynchronous propositions of intuitionistic linear logic. It uses a backward chaining, backtracking operational semantics. In this paper we extend Lolli with the rema...
详细信息
ISBN:
(纸本)1595930906
Lolli is a logic programminglanguage based on the asynchronous propositions of intuitionistic linear logic. It uses a backward chaining, backtracking operational semantics. In this paper we extend Lolli with the remaining connectives of intuitionistic linear logic restricted to occur inside a monad, an idea taken from the concurrent logical framework (CLF). The resulting language, called LolliMon, has a natural forward chaining, committed choice operational semantics inside the monad, while retaining Lolli's semantics outside the monad. LolliMon thereby cleanly integrates both concurrency and saturation with logic programming search. We illustrate its expressive power through several examples including an implementation of the pi-calculus, a call-by-need lambda-calculus, and several saturating algorithms presented in logical form. Copyright 2005acm.
When dealing with dynamic, untrusted content, such as on the Web, software behavior must be sandboxed, typically through use of a language like JavaScript. However, even for such specially-designed languages, it is di...
详细信息
More and more server workloads are becoming Web-based. In these Web-based workloads, most of the memory objects are used only during one transaction. We study the effect of the memory management approaches on the perf...
详细信息
ISBN:
(纸本)9781605583921
More and more server workloads are becoming Web-based. In these Web-based workloads, most of the memory objects are used only during one transaction. We study the effect of the memory management approaches on the performance of such Web-based applications on two modern multicore processors. In particular, using six PHP applications, we compare a general-purpose allocator (the default allocator of the PHP runtime) and a region-based allocator, which can reduce the cost of memory management by not supporting per-object free. The region-based allocator achieves better performance for all workloads on one processor core due to its smaller memory management cost. However, when using eight cores, the region-based allocator suffers from hidden costs of increased bus traffics and the performance is reduced for many workloads by as much as 27.2% compared to the default allocator. This is because the memory bandwidth tends to become a bottleneck in systems with multicore processors. We propose a new memory management approach, defrag-dodging, to maximize the performance of the Web-based workloads on multicore processors. In our approach, we reduce the memory management cost by avoiding defragmentation overhead in the malloc and free functions during a transaction. We found that the transactions in Web-based applications are short enough to ignore heap fragmentation, and hence the costs of the defragmentation activities in existing general-purpose allocators outweigh their benefits. By comparing our approach against the region-based approach, we show that a per-object free capability can reduce bus traffic and achieve higher performance on multicore processors. We demonstrate that our defrag-dodging approach improves the performance of all the evaluated applications on both processors by up to 11.4% and 51.5% over the default allocator and the region-based allocator, respectively.
Automatic object inlining [19, 20] transforms heap data structures by fusing parent and child objects together. It can improve runtime by reducing object allocation and pointer dereference costs. We report continuing ...
详细信息
Automatic object inlining [19, 20] transforms heap data structures by fusing parent and child objects together. It can improve runtime by reducing object allocation and pointer dereference costs. We report continuing work studying object inlining optimizations. In particular, we present a new semantic derivation of the correctness conditions for object inlining, and program analysis which extends our previous work. And we present an object inlining transformation, focusing on a new algorithm which optimizes class field layout to minimize code expansion. Finally, we detail a fuller evaluation on eleven programs and libraries (including Xpdf, the 25,000 line Portable Document Format (PDF) file browser) that utilizes hardware measures of impact on the memory system. We show that our analysis scales effectively to large programs, finding many inlinable fields (45 in xpdf) at acceptable cost, and we show that, on some programs, it finds nearly all fields for which object inlining is correct, and averages 40% of such fields across our benchmarks. We implement our analyses in an advanced analysis infrastructure, and we show that, compared to traditional 1-CFA, that infrastructure provides better results and lower and more scalable cost. Across all programs, analysis identified about 30% of objects as inlinable on average. Our transformation increases code size by only 20% while inlining this 30% of fields. Inlining these objects eliminated on average 28% of field reads, 58% of object creations, 12% of all loads. Further, the optimized programs have significantly improved memory reference behavior, producing 25% fewer L1 data cache misses and 25% fewer read stalls. On average the runtime improved by 14%.
暂无评论