In this paper we present a modular interprocedural pointer analysis algorithm based on access-paths for C programs. We argue that access paths can reduce the overhead of representing context-sensitive transfer functio...
ISBN:
(纸本)9781581131994
In this paper we present a modular interprocedural pointer analysis algorithm based on access-paths for C programs. We argue that access paths can reduce the overhead of representing context-sensitive transfer functions and effectively distinguish non-recursive heap objects. And when the modular analysis paradigm is used together with other techniques to handle type casts and function pointers, we are able to handle significant programs like those in the SPECcint92 and SPECcint95 suites. We have implemented the algorithm and tested it on a Pentium II 450 PC running Linux. The observed resource consumption and performance improvement are very encouraging.
Functional programming presents several important advantages in the design, analysis and implementation of parallel algorithms: It discourages iteration and encourages *** supports persistence and hence easy *** encou...
详细信息
ISBN:
(纸本)9781605587943
Functional programming presents several important advantages in the design, analysis and implementation of parallel algorithms: It discourages iteration and encourages *** supports persistence and hence easy *** encourages higher-order aggregate *** is well suited for defining cost models tied to the programminglanguage rather than the *** can avoid false *** can use cheaper weak consistency *** most importantly, it supports safe deterministic *** fact functional programming supports a level of abstraction in which parallel algorithms are often as easy to design and analyze as sequential algorithms. The recent widespread advent of parallel machines therefore presents a great opportunity for functional programminglanguages. However, any changes will require significant education at all levels and involvement of the functional programming *** this talk I will discuss an approach to designing and analyzing parallel algorithms in a strict functional and fully deterministic setting. Key ideas include a cost model defined in term of analyzing work and span, the use of divide-and-conquer and contraction, the need for arrays (immutable) to achieve asymptotic efficiency, and the power of (deterministic) randomized algorithms. These are all ideas I believe can be taught at any level.
GUM is a portable, parallel implementation of the Haskell functional language. Despite sustained research interest in parallel functional programming, GUM is one of the first such systems to be made publicly *** is me...
ISBN:
(纸本)9780897917957
GUM is a portable, parallel implementation of the Haskell functional language. Despite sustained research interest in parallel functional programming, GUM is one of the first such systems to be made publicly *** is message-based, and portability is facilitated by using the PVM communications harness that is available on many multi-processors. As a result, GUM is available for both shared-memory (Sun SPARCserver multiprocessors) and distributed-memory (networks of workstations) architectures. The high message-latency of distributed machines is ameliorated by sending messages asynchronously, and by sending large packets of related data in each *** performance figures demonstrate absolute speedups relative to the best sequential compiler technology. To improve the performance of a parallel Haskell program GUM provides tools for monitoring and visualising the behaviour of threads and of processors during execution.
SKILL is a programminglanguage that supports both command entry and procedural customization in OpusTM design FrameworkTM. After briefly considering some related work, we examine the requirements that motivate the pr...
ISBN:
(纸本)9780897913638
SKILL is a programminglanguage that supports both command entry and procedural customization in OpusTM design FrameworkTM. After briefly considering some related work, we examine the requirements that motivate the provision of a programminglanguage available to the user and describe some of the technical characteristics of the languagedesign and implementation. Finally, we describe our experience with the language and outline future work. A number of programming examples are appended.
Traditional storage systems provide a simple read/write interface, which is inadequate for low-locality update-intensive workloads because it limits the disk scheduling flexibility and results in inefficient use of bu...
详细信息
ISBN:
(纸本)9781450307598
Traditional storage systems provide a simple read/write interface, which is inadequate for low-locality update-intensive workloads because it limits the disk scheduling flexibility and results in inefficient use of buffer memory and raw disk bandwidth. This paper describes an update-aware disk access interface that allows applications to explicitly specify disk update requests and associate with such requests call-back functions that will be invoked when the requested disk blocks are brought into memory. Because call-back functions offer a continuation mechanism after retrieval of requested blocks, storage systems supporting this interface are given more flexibility in scheduling pending disk update requests. In particular, this interface enables a simple but effective technique called Batching mOdifications with Sequential Commit (BOSC), which greatly improves the sustained throughput of a storage system under low-locality update-intensive workloads. In addition, together with a space-efficient low-latency disk logging technique, BOSC is able to deliver the same durability guarantee as synchronous disk updates. Empirical measurements show that the random update throughput of a BOSC-based B+ tree is more than an order of magnitude higher than that of the same B+ tree implementation on a traditional storage system.
The aim of ÆMINIUM is to study the implications of having a concurrent-by-default programminglanguage. This includes languagedesign, runtime system, performance and software engineering *** conduct our study th...
详细信息
ISBN:
(纸本)9781450327848
The aim of ÆMINIUM is to study the implications of having a concurrent-by-default programminglanguage. This includes languagedesign, runtime system, performance and software engineering *** conduct our study through the design of the concurrent-by-default ÆMINIUM programminglanguage. ÆMINIUM leverages the permission flow of object and group permissions through the program to validate the program's correctness and to automatically infer a possible parallelization strategy via a dataflow graph. ÆMINIUM supports not only fork-join parallelism but more general dataflow patterns of *** this paper we present a formal system, called μÆMINIUM, modeling the core concepts of ÆMINIUM. μÆMINIUM's static type system is based on Featherweight Java with ÆMINIUM-specific extensions. Besides checking for correctness ÆMINIUM's type system it also uses the permission flow to compute a potential parallel execution strategy for the program. μÆMINIUM's dynamic semantics use a concurrent-by-default evaluation approach. Along with the formal system we present its soundness *** provide a full description of the implementation along with the description of various optimization techniques we used. We implemented ÆMINIUM as an extension of the Plaid programminglanguage, which has first-class support for permissions built-in. The ÆMINIUM implementation and all case studies are publicly available under the General Public *** use various case studies to evaluate ÆMINIUM's applicability and to demonstrate that ÆMINIUM parallelized code has performance improvements compared to its sequential counterpart. We chose to use case studies from common domains or problems that are known to benefit from parallelization, to show that ÆMINIUM is powerful enough to encode them. We demonstrate through a webserver application, which evaluates ÆMINIUM's impact on latency-bound applications, that ÆMINIUM can achieve a 70% performance improvement over the sequential counterp
This paper presents the design and implementation of a compiler algorithm that effectively optimizes programs for energy usage using dynamic voltage scaling (DVS). The algorithm identifies program regions where the CP...
详细信息
ISBN:
(纸本)9781581136623
This paper presents the design and implementation of a compiler algorithm that effectively optimizes programs for energy usage using dynamic voltage scaling (DVS). The algorithm identifies program regions where the CPU can be slowed down with negligible performance loss. It is implemented as a source-to-source level transformation using the SUIF2 compiler infrastructure. Physical measurements on a high-performance laptop show that total system (i.e., laptop) energy savings of up to 28% can be achieved with performance degradation of less than 5% for the SPECfp95 benchmarks. On average, the system energy and energy-delay product are reduced by 11% and 9%, respectively, with a performance slowdown of 2%. It was also discovered that the energy usage of the programs using our DVS algorithm is within 6% from the theoretical lower bound. To the best of our knowledge, this is one of the first work that evaluates DVS algorithms by physical measurements.
There is a growing utilization gap between modern hardware and modern programminglanguages for data analysis. Due to power and other constraints, recent processor design has sought improved performance through increa...
详细信息
ISBN:
(纸本)9781450311823
There is a growing utilization gap between modern hardware and modern programminglanguages for data analysis. Due to power and other constraints, recent processor design has sought improved performance through increased SIMD and multi-core parallelism. At the same time, high-level, dynamically typed languages for data analysis have become popular. These languages emphasize ease of use and high productivity, but have, in general, low performance and limited support for exploiting hardware parallelism. In this paper, we describe Riposte, a new runtime for the R language, which bridges this gap. Riposte uses tracing, a technique commonly used to accelerate scalar code, to dynamically discover and extract sequences of vector operations from arbitrary R code. Once extracted, we can fuse traces to eliminate unnecessary memory traffic, compile them to use hardware SIMD units, and schedule them to run across multiple cores, allowing us to fully utilize the available parallelism on modern shared-memory machines. Our evaluation shows that Riposte can run vector R code near the speed of hand-optimized C, 5-50x faster than the open source implementation of R, and can also linearly scale to 32 cores for some tasks. Across 12 different workloads we achieve an overall average speed-up of over 150x without explicit programmer parallelization.
We present the functional language CDuce, discuss some design issues, and show its adequacy for working with XML documents. Distinctive features of CDuce are a powerful pattern matching, first class functions, overloa...
详细信息
ISBN:
(纸本)9781581137569
We present the functional language CDuce, discuss some design issues, and show its adequacy for working with XML documents. Distinctive features of CDuce are a powerful pattern matching, first class functions, overloaded functions, a very rich type system (arrows, sequences, pairs, records, intersections, unions, differences), precise type inference for patterns and error localization, and a natural interpretation of types as sets of values. We also outline some important implementation issues; in particular, a dispatch algorithm that demonstrates how static type information can be used to obtain very efficient compilation schemas..
The disadvantages of unconstrained shared-memory multi-threading in Java, especially with regard to latency and determinism in real-time systems, have given rise to a variety of language extensions that place restrict...
详细信息
The disadvantages of unconstrained shared-memory multi-threading in Java, especially with regard to latency and determinism in real-time systems, have given rise to a variety of language extensions that place restrictions on how threads allocate, share, and communicate memory, leading to order-of-magnitude reductions in latency and jitter. However, each model makes different trade-offs with respect to expressiveness, efficiency, enforcement, and latency, and no one model is best for all applications. In this paper we present Flexible Task Graphs (Flexotasks), a single system that allows different isolation policies and mechanisms to be combined in an orthogonal manner, subsuming four previously proposed models as well as making it possible to use new combinations best suited to the needs of particular applications. We evaluate our implementation on top of the IBM Web-Sphere Real Time Java virtual machine using both a microbenchmark and a 30 KLOC avionics collision detector. We show that Flexotasks are capable of executing periodic threads at 10 KHz with a standard deviation of 1.2 mu s and that it achieves significantly better performance than RTSJ's scoped memory constructs while remaining impervious to interference from global garbage collection.
暂无评论