The emerging hardware support for thread-level speculation opens new opportunities to parallelize sequential programs beyond the traditional limits. By speculating that many data dependences are unlikely during runtim...
详细信息
The emerging hardware support for thread-level speculation opens new opportunities to parallelize sequential programs beyond the traditional limits. By speculating that many data dependences are unlikely during runtime, consecutive iterations of a sequential loop can be executed speculatively in parallel. Runtime parallelism is obtained when the speculation is correct. To take full advantage of this new execution model, a program needs to be programmed or compiled in such a way that it exhibits high degree of speculative thread-level parallelism. We propose a comprehensive cost-driven compilation framework to perform speculative parallelization. Based on a misspeculation cost model, the compiler aggressively transforms loops into optimal speculative parallel loops and selects only those loops whose speculative parallel execution is likely to improve program performance. The framework also supports and uses enabling techniques such as loop unrolling, software value prediction and dependence profiling to expose more speculative parallelism. The proposed framework was implemented on the ORC compiler. Our evaluation showed that the cost-driven speculative parallelization was effective. Our compiler was able to generate good speculative parallel loops in ten Spec2000Int benchmarks, which currently achieve an average 8% speedup. We anticipate an average 15.6% speedup when all enabling techniques are in place.
The Java programminglanguage is acheiving greater acceptance in high-end embedded systems such as cellphones and PDAs. However current embedded implementations of Java impose tight constraints on functionality, while...
详细信息
ISBN:
(纸本)0769520855
The Java programminglanguage is acheiving greater acceptance in high-end embedded systems such as cellphones and PDAs. However current embedded implementations of Java impose tight constraints on functionality, while requiring significant storage space. In addition, they require that a JVM be ported to each such platform. We demonstrate the first Java-to-C compilation strategy that is suitable for a wide range of embedded systems, thereby enabling broad use of Java on embedded platforms. This strategy removes many of the constraints on functionality and reduces code size without sacrificing performance. The compilation framework described is easily retargetable, and is also applicable to bare-bones embedded systems with no operating system or JVM. On an average, we found the size of the generated executables to be over 25 times smaller than those generated by a cutting-edge Java-to-native-code compiler while providing performance comparable to the best of various Java implementation strategies.
This paper describes a computer architecture, Spatial Computation (SC), which is based on the translation of high-level language programs directly into hardware structures. SC program implementations are completely di...
详细信息
This paper describes a computer architecture, Spatial Computation (SC), which is based on the translation of high-level language programs directly into hardware structures. SC program implementations are completely distributed, with no centralized control. SC circuits are optimized for wires at the expense of computation units. In this paper we investigate a particular implementation of SC: ASH (Application-Specific Hardware). Under the assumption that computation is cheaper than communication, ASH replicates computation units to simplify interconnect, building a system which uses very simple, completely dedicated communication channels. As a consequence, communication on the datapath never requires arbitration;the only arbitration required is for accessing memory. ASH relies on very simple hardware primitives, using no associative structures, no multiported register files, no scheduling logic, no broadcast, and no clocks. As a consequence, ASH hardware is fast and extremely power efficient. In this work we demonstrate three features of ASH: (1) that such architectures can be built by automatic compilation of C programs;(2) that distributed computation is in some respects fundamentally different from monolithic superscalar processors;and (3) that ASIC implementations of ASH use three orders of magnitude less energy compared to high-end superscalar processors, while being on average only 33% slower in performance (3.5x worst-case).
Cache miss stalls hurt performance because of the large gap between memory and processor speeds - for example, the popular server benchmark SPEC JBB2000 spends 45% of its cycles stalled waiting for memory requests on ...
详细信息
Cache miss stalls hurt performance because of the large gap between memory and processor speeds - for example, the popular server benchmark SPEC JBB2000 spends 45% of its cycles stalled waiting for memory requests on the Itanium(R) 2 processor. Traversing linked data structures causes a large portion of these stalls. Prefetching for linked data structures remains a major challenge because serial data dependencies between elements in a linked data structure preclude the timely materialization of prefetch addresses. This paper presents Mississippi Delta (MS Delta), a novel technique for prefetching linked data structures that closely integrates the hardware performance monitor (HPM), the garbage collector's global view of heap and object layout, the type-level metadata inherent in type-safe programs, and JIT compiler analysis. The garbage collector uses the HPM's data cache miss information to identify cache miss intensive traversal paths through linked data structures, and then discovers regular distances (deltas) between these linked objects. JIT compiler analysis injects prefetch instructions using deltas to materialize prefetch addresses. We have implemented MS Delta in a fully dynamic profile-guided optimization system: the StarJIT dynamic compiler [1] and the ORP Java virtual machine [9]. We demonstrate a 28-29% reduction in stall cycles attributable to the high-latency cache misses targeted by MS Delta and a speedup of 11-14% on the cache miss intensive SPEC JBB2000 benchmark.
作者:
Birka, AErnst, MDMIT
Comp Sci & Artificial Intelligence Lab Cambridge MA 02139 USA
This paper describes a type system that is capable of expressing and enforcing immutability constraints. The specific constraint expressed is that the abstract state of the object to which an immutable reference refer...
详细信息
This paper describes a type system that is capable of expressing and enforcing immutability constraints. The specific constraint expressed is that the abstract state of the object to which an immutable reference refers cannot be modified using that reference. The abstract state is (part of) the transitively reachable state: that is, the state of the object and all state reachable from it by following references. The type system permits explicitly excluding fields or objects from the abstract state of an object. For a statically type-safe language, the type system guarantees reference immutability. If the language is extended with immutability downcasts, then run-time checks enforce the reference immutability constraints. In order to better understand the usability and efficacy of the type system, we have implemented an extension to Java, called Javari, that includes all the features of our type system. Javari is interoperable with Java and existing JVMs. It can be viewed as a proposal for the semantics of the Java const keyword, though Javari's syntax uses readonly instead. This paper describes the design and implementation of Javari, including the type-checking rules for the language. This paper also discusses experience with 160,000 lines of Javari code. Javari was easy to use and provided a number of benefits, including detecting errors in well-tested code.
The genesis of a research effort to develop a Java-based process-oriented simulation framework is described. A key enabler to the framework is an efficient co-routine mechanism implemented within the context of a sing...
详细信息
The genesis of a research effort to develop a Java-based process-oriented simulation framework is described. A key enabler to the framework is an efficient co-routine mechanism implemented within the context of a single Java thread. A design for such a co-routine mechanism is described and some initial results of an implementation within the IBM Jikes Reference Virtual Machine are given.
ICE (Intermediate Code Engine) and ICE/T (ICE/Translator) are compiler back ends that execute on a Java Virtual Machine (JVM). They allow the student to complete a working compiler quickly and can execute on any platf...
详细信息
ISBN:
(纸本)9781450377942
ICE (Intermediate Code Engine) and ICE/T (ICE/Translator) are compiler back ends that execute on a Java Virtual Machine (JVM). They allow the student to complete a working compiler quickly and can execute on any platform that supplies a JVM. ICE is a quadruple interpreter that executes ICE code directly, and includes an assembler, which a builder can use to side-step most symbol management issues. ICE/T is a translator that accepts ICE assembly code as input, and generates an equivalent Java class file as output. This paper advocates the use of these tools in compiler implementation courses.
We identify three design principles for reflection and metaprogramming facilities in object oriented programminglanguages. Encapsulation: meta-level facilities must encapsulate their implementation. Stratification: m...
详细信息
ISBN:
(纸本)9781581138313
We identify three design principles for reflection and metaprogramming facilities in object oriented programminglanguages. Encapsulation: meta-level facilities must encapsulate their implementation. Stratification: meta-level facilities must be separated from base-level functionality. Ontological correspondence: the ontology of meta-level facilities should correspond to the ontology of the language they manipulate. Traditional/mainstream reflective architectures do not follow these precepts. In contrast, reflective APIs built around the concept of mirrors are characterized by adherence to these three principles. Consequently, mirror-based architectures have significant advantages with respect to distribution, deployment and general purpose metaprogramming.
The Java programminglanguage is acheiving greater acceptance in high-end embedded systems such as cellphones and PDAs. However, current embedded implementations of Java impose tight constraints on functionality, whil...
详细信息
ISBN:
(纸本)9780769520858
The Java programminglanguage is acheiving greater acceptance in high-end embedded systems such as cellphones and PDAs. However, current embedded implementations of Java impose tight constraints on functionality, while requiring significant storage space. In addition, they require that a JVM be ported to each such platform. We demonstrate the first Java-to-C compilation strategy that is suitable for a wide range of embedded systems, thereby enabling broad use of Java on embedded platforms. This strategy removes many of the constraints on functionality and reduces code size without sacrificing performance. The compilation framework described is easily retargetable, and is also applicable to bare-bones embedded systems with no operating system or JVM. On an average, we found the size of the generated executables to be over 25 times smaller than those generated by a cutting-edge Java-to-native-code compiler, while providing performance comparable to the best of various Java implementation strategies.
暂无评论