As a value flows across the boundary between interoperating languages, it must be checked and converted to fit the types and representations of the target language. For simple forms of data, the checks and coercions c...
详细信息
As a value flows across the boundary between interoperating languages, it must be checked and converted to fit the types and representations of the target language. For simple forms of data, the checks and coercions can be immediate, for higher order data, such as functions and objects. some must be delayed until the value is used in a particular way. Typically, these coercions and checks are implemented by an ad-hoc mixture of wrappers, reflection, and dynamic predicates. We observe that 1) the wrapper and reflection operations fit the profile of mirrors, 2) the checks correspond to contracts, and 3) the timing and shape of mirror operations coincide with the timing and shape of contract operations. Based on these insights, we present a new model of interoperability that builds on the ideas of mirrors and contracts, and we describe an interoperable implementation of Java and Scheme that is guided by the model.
A critical optimization in the domain of linear signal transforms, such as the discrete Fourier transform (DFT), is loop merging, which increases data locality and reuse and thus performance. In particular, this inclu...
详细信息
A critical optimization in the domain of linear signal transforms, such as the discrete Fourier transform (DFT), is loop merging, which increases data locality and reuse and thus performance. In particular, this includes the conversion of shuffle operations into array reindexings. To date, loop merging is well understood only for the DFT, and only for Cooley-Tukey FFT based algorithms, which excludes DFT sizes divisible by large primes. In this paper, we present a formal loop merging framework for general signal transforms and its implementation within the SPIRAL code generator. The framework consists of Sigma-SPL, a mathematical language to express loops and index mappings;a rewriting system to merge loops in E-SPL;and a compiler that translates Sigma-SPL into code. We apply the framework to DFT sizes that cannot be handled using only the Cooley-Tukey FFT and compare our method to FFTW 3.0.1 and the vendor library Intel MKL 7.2.1. Compared to FFTW our generated code is a factor of 2-4 faster under equal implementation conditions (same algorithms, same unrolling threshold). For some sizes we show a speed-up of a factor of 9 using Bluestein's algorithm. Further, we give a detailed comparison against the Intel vendor library MKL;our generated code is between 2 times faster and 4.5 times slower.
Object abstraction supports the separation of what operations are provided by systems and components from how the operations are implemented, and is essential in enabling the construction of complex systems from compo...
详细信息
Object abstraction supports the separation of what operations are provided by systems and components from how the operations are implemented, and is essential in enabling the construction of complex systems from components. Unfortunately, clear and modular implementations have poor performance when expensive query operations are repeated, while efficient implementations that incrementally maintain these query results are much more difficult to develop and to understand, because the code blows up significantly, and is no longer clear or modular. This paper describes a powerful and systematic method that first allows the "what" of each component to be specified in a clear and modular fashion and implemented straightforwardly in an object-oriented language;then analyzes the queries and updates, across object abstraction, in the straightforward implementation: and finally derives the sophisticated and efficient "how;, of each component by incrementally maintaining the results of repeated expensive queries with respect to updates to their parameters. Our implementation and experimental results for example applications in query optimization, role-based access control, etc. demonstrate tire effectiveness and benefit of the method.
The past decade of experience has demonstrated that the generic programming methodology is highly effective for the design, implementation, and use of large-scale software libraries. The fundamental principle of gener...
详细信息
ISBN:
(纸本)3540291385
The past decade of experience has demonstrated that the generic programming methodology is highly effective for the design, implementation, and use of large-scale software libraries. The fundamental principle of generic programming is the realization of interfaces for entire sets of components, based on their essential syntactic and semantic requirements, rather than for any particular components. Many programminglanguages have features for describing interfaces between software components, but none completely support the approach used in generic programming. We have recently developed G, a languagedesigned to provide first-class language support for generic programming and large-scale libraries. In this paper, we present an overview of g and analyze the interdependence between language features and library design in light of a complete implementation of the Standard Template Library using G. In addition, we discuss important issues related to modularity and encapsulation in large-scale libraries and how language support for validation of components in isolation can prevent many common problems in component integration.
programming network processors is challenging. To sustain high line rates, network processors have extremely tight memory access and instruction budgets. Achieving desired performance has traditionally required hand-c...
详细信息
programming network processors is challenging. To sustain high line rates, network processors have extremely tight memory access and instruction budgets. Achieving desired performance has traditionally required hand-coded assembly. Researchers have recently proposed high-level programminglanguages for packet processing, but the challenges of compiling these languages into code that is competitive with hand-tuned assembly remain unanswered. This paper describes the Shangri-La compiler, which accepts a packet program written in a C-like high-level language and applies scalar and specialized optimizations to generate a highly optimized binary. Hot code paths identified by profiling are mapped across processing elements to maximize processor utilization. Since our compilation target has no hardware caches, software-controlled caches are generated for frequently accessed application data structures. Packet handling optimizations significantly reduce per-packet memory access and instruction counts. Finally, a custom stack model maps stack frames to the fastest levels of the target processor's heterogeneous memory hierarchy. Binaries generated by the compiler were evaluated on the Intel IXP2400 network processor with eight packet processing cores and eight threads per core. Our results show the importance of both traditional and specialized optimization techniques for achieving the maximum forwarding rates on three network applications, L3-Switch, MPLS and Firewall.
AspectJ, an aspect-oriented extension of Java, is becoming increasingly popular. However, not much work has been directed at optimising compilers for AspectJ. Optimising AOP languages provides many new and interesting...
详细信息
AspectJ, an aspect-oriented extension of Java, is becoming increasingly popular. However, not much work has been directed at optimising compilers for AspectJ. Optimising AOP languages provides many new and interesting challenges for compiler writers, and this paper identifies and addresses three such challenges. First, compiling around advice efficiently is particularly challenging. We provide a new code generation strategy for around advice, which (unlike previous implementations) both avoids the use of excessive inlining and the use of closures. We show it leads to more compact code, and can also improve run-time performance. Second, woven code sometimes includes run-time tests to determine whether advice should execute. One important case is the cflow pointcut which uses information about the dynamic calling context. Previous techniques for cflow were very costly in terms of both time and space. We present new techniques to minimise or eliminate the overhead of cflow using both intra- and inter-procedural analyses. Third, we have addressed the general problem of how to structure an optimising compiler so that traditional analyses can be easily adapted to the AOP setting. We have implemented all of the techniques in this paper in abc, our AspectBench Compiler for AspectJ, and we demonstrate significant speedups with empirical results. Some of our techniques have already been integrated into the production AspectJ compiler, ajc 1.2.
We describe and document our experience from developing an AMD64 backend for the HiPE (High Performance Erlang) native code compiler. We consider implementation alternatives and critically examine design choices for o...
详细信息
ISBN:
(纸本)1595930906
We describe and document our experience from developing an AMD64 backend for the HiPE (High Performance Erlang) native code compiler. We consider implementation alternatives and critically examine design choices for obtaining an efficient AMD64 backend. In particular, we consider in detail how other functional language implementors can migrate their existing x86 backends to the AMD64 architecture, a platform which is becoming increasingly important these days. We mention backend components that can be shared between x86 and AMD64, and those that better be different for achieving high performance on AMD64. Finally, we measure the performance of several different alternatives in the hope that this information can save development effort for others who intend to engage in a similar feat. Copyright 2005 acm.
Unanticipated changes to complex software systems can introduce anomalies such as duplicated code, suboptimal inheritance relationships and a proliferation of run-tirne downcasts. Refactoring to eliminate these anomal...
详细信息
Unanticipated changes to complex software systems can introduce anomalies such as duplicated code, suboptimal inheritance relationships and a proliferation of run-tirne downcasts. Refactoring to eliminate these anomalies may not be an option, at least in certain stages of software evolution. Classboxes are modules that restrict the visibility of changes to selected clients only, thereby offering more freedom in the way unanticipated changes may be implemented, and thus reducing the need for convoluted design anomalies. In this paper we demonstrate how classboxes can be implemented in statically-typed languages like Java. We also present an extended case study of Swing, a Java GUI package built on top of AWT, and we document the ensuing anomalies that Swing introduces. We show how Classbox/J, a prototype implementation of classboxes for Java, is used to provide a cleaner implementation of Swing using local refinement rather than subclassing.
AspectJ, an aspect-oriented extension of Java, is becoming increasingly popular. However, not much work has been directed at optimising compilers for AspectJ. Optimising AOP languages provides many new and interesting...
详细信息
AspectJ, an aspect-oriented extension of Java, is becoming increasingly popular. However, not much work has been directed at optimising compilers for AspectJ. Optimising AOP languages provides many new and interesting challenges for compiler writers, and this paper identifies and addresses three such challenges. First, compiling around advice efficiently is particularly challenging. We provide a new code generation strategy for around advice, which (unlike previous implementations) both avoids the use of excessive inlining and the use of closures. We show it leads to more compact code, and can also improve run-time performance. Second, woven code sometimes includes run-time tests to determine whether advice should execute. One important case is the cflow pointcut which uses information about the dynamic calling context. Previous techniques for cflow were very costly in terms of both time and space. We present new techniques to minimise or eliminate the overhead of cflow using both intra- and inter-procedural analyses. Third, we have addressed the general problem of how to structure an optimising compiler so that traditional analyses can be easily adapted to the AOP setting. We have implemented all of the techniques in this paper in abc, our AspectBench Compiler for AspectJ, and we demonstrate significant speedups with empirical results. Some of our techniques have already been integrated into the production AspectJ compiler, ajc 1.2.1. Copyright 2005 acm.
It is now well established that the device scaling predicted by Moore's Law is no longer a viable option for increasing the clock frequency of future uniprocessor systems at the rate that had been sustained during...
详细信息
It is now well established that the device scaling predicted by Moore's Law is no longer a viable option for increasing the clock frequency of future uniprocessor systems at the rate that had been sustained during the last two decades. As a result, future systems are rapidly moving from uniprocessor to multiprocessor configurations, so as to use parallelism instead of frequency scaling as the foundation for increased compute capacity. The dominant emerging multi processor structure for the future is a Non- Uniform Cluster Computing (NUCC) system with nodes that are built out of multi-core SMP chips with non-uniform memory hierarchies, and interconnected in horizontally scalable cluster configurations such as blade servers. Unlike previous generations of hardware evolution, this shift will have a major impact on existing software. Current 00 language facilities for concurrent and distributed programming are inadequate for addressing the needs of NUCC systems because they do not Support the notions of non-uniform data access within a node, or of tight coupling of distributed nodes. We have designed a modern object-oriented programminglanguage, X10, for high performance, high productivity programming of NUCC systems. A member of the partitioned global address space family of languages, X10 highlights the explicit reification of locality in the form of places;lightweight activities embodied in async, future, foreach, and ateach constructs;a construct for termination detection (finish);the use of lock-free synchronization (atomic blocks);and the manipulation of cluster-wide global data structures. We present an overview of the X10 programming model and language, experience with our reference implementation, and results from some initial productivity comparisons between the X10 and JAVA (TM) languages.
暂无评论