<正>Designing a java processor supporting horizontal multithreading has been becoming more attractive as network computing gains *** from the traditional superscalar processors that issue multiple instructions from ...
详细信息
<正>Designing a java processor supporting horizontal multithreading has been becoming more attractive as network computing gains *** from the traditional superscalar processors that issue multiple instructions from a single instruction stream to exploit the instruction level parallelism(ILP),the horizontal multithreading java processors issue multiple instructions (bytecodes) from multiple threads in parallel to exploit not only the ILP but the thread level parallelism(TLP).Such processors have multiple dispatch slots and require the instruction fetch unit to supply instructions with much higher bandwidth than superscalar *** a traditional superscalar cache architecture in a horizontal multithreading java processor results in high cache miss ratio caused by the interference among the *** paper investigates multibank instruction cache architecture for horizontal multithreading java processor to meet the requirements of the high instruction fetch *** order to evaluate the cache performance as well as the horizontal multithreading java processor performance,we developed a trace driven *** simulator consists of a trace generator that generates the java bytecode execution traces and an architectural simulator that reads the traces and evaluates the performance of the instruction cache and the overall performance of the java processor. Our simulation results show that the performance improvements are obtained by the low cache miss ratio and the high instruction fetch bandwidth of the proposed cache *** IPC performance is about 19 when both the number of slots and the number of banks are 8,about 5 times better than one bank cache.
Designing a java processor supporting horizontal multithreading has been becoming more attractive as network computing gains importance. Different from the traditional superscalar processors that issue multiple instru...
详细信息
ISBN:
(纸本)0780378407
Designing a java processor supporting horizontal multithreading has been becoming more attractive as network computing gains importance. Different from the traditional superscalar processors that issue multiple instructions from a single instruction stream to exploit the instruction level parallelism (ILP), the horizontal multithreading java processors issue multiple instructions (bytecodes) from multiple threads in parallel to exploit not only the ILP but the thread level parallelism (TLP). Such processors have multiple dispatch slots and require the instruction fetch unit to supply instructions with much higher bandwidth than superscalar processors. Using a traditional superscalar cache architecture in a horizontal multithreading java processor results in high cache miss ratio caused by the interference among the threads. This paper investigates multibank instruction cache architecture for horizontal multithreading java processor to meet the requirements of the high instruction fetch bandwidth. In order to evaluate the cache performance as well as the horizontal multithreading java processor performance, we developed a trace driven simulator. The simulator consists of a trace generator that generates the java bytecode execution traces and an architectural simulator that reads the traces and evaluates the performance of the instruction cache and the overall performance of the java *** simulation results show that the performance improvements are obtained by the low cache miss ratio and the high instruction fetch bandwidth of the proposed cache architecture. The IPC performance is about 19 when both the number of slots and the number of banks are 8, about 5 times better than one bank cache.
java is becoming the main software platform for consumer and embedded devices such as mobile phones, PDAs, TV set-top boxes, and in-vehicle systems. Since many of these systems are memory constrained, it is extremely ...
详细信息
ISBN:
(纸本)9781581137125
java is becoming the main software platform for consumer and embedded devices such as mobile phones, PDAs, TV set-top boxes, and in-vehicle systems. Since many of these systems are memory constrained, it is extremely important to keep the memory footprint of java applications under *** goal of this work is to enable the execution of java applications using a smaller heap footprint than that possible using current embedded JVMs. We propose a set of memory management strategies to reduce heap footprint of embedded java applications that execute under severe memory constraints. Our first contribution is a new garbage collector, referred to as the Mark-Compact-Compress (MCC) collector, that allows an application to run with a heap smaller than its footprint. An important characteristic of this collector is that it compresses objects when heap compaction is not sufficient for creating space for the current allocation request. In addition to employing compression, we also consider a heap management strategy and associated garbage collector, called MCL (Mark-Compact-Lazy Allocate), based on lazy allocation of object portions. This new collector operates like the conventional Mark-Compact (MC) collector, but takes advantage of the observation that many java applications create large objects, of which only a small portion is actually used. In addition, we also combine MCC and MCL, and present MCCL (Mark-Compact-Compress-Lazy Al-locate), which outperforms both MCC and *** have implemented these collectors using KVM, and performed extensive experiments using a set of ten embedded java applications. We have found our new garbage collection strategies to be useful in two main aspects. First, they reduce the minimum heap size necessary to execute an application without out-of-memory exception. Second, our strategies reduce the heap occupancy. That is, at a given time, they reduce the heap memory requirement of the application being executed. We have also conducted expe
To date, systems offering multitasking for the java™ programming language either use one process or one class loader for each application. Both approaches are unsatisfactory. Using operating system processes...
详细信息
ISBN:
(纸本)9781581132007
To date, systems offering multitasking for the java™ programming language either use one process or one class loader for each application. Both approaches are unsatisfactory. Using operating system processes is expensive, scales poorly and does not fully exploit the protection features inherent in a safe language. Class loaders replicate application code, obscure the type system, and non-uniformly treat 'trusted' and 'untrusted' classes, which leads to subtle, but nevertheless, potentially harmful forms of undesirable inter-application *** this paper we propose a novel, simple yet powerful solution. The new model improves on existing designs in terms of resource utilization while offering strong isolation among applications. The approach is applicable both on high-end servers and on small devices. The main idea is to maintain only one copy of every class, regardless of how many applications use it. Classes are transparently and automatically modified, so that each application has a separate copy of its static fields. Two prototypes are described and selected performance data is analyzed. Various aspects of the proposed architectural changes to the java virtual machine are discussed.
To date, systems offering multitasking for the java(TM) programming language either use one process or one class loader for each application. Both approaches are unsatisfactory. Using operating system processes is exp...
详细信息
To date, systems offering multitasking for the java(TM) programming language either use one process or one class loader for each application. Both approaches are unsatisfactory. Using operating system processes is expensive, scales poorly and does not fully exploit the protection features inherent in a safe language. Class loaders replicate application code, obscure the type system, and non-uniformly treat 'trusted' and 'untrusted' classes, which leads to subtle, but nevertheless, potentially harmful forms of undesirable inter-application interaction. In this paper we propose a novel, simple yet powerful solution. The new model improves on existing designs in terms of resource utilization while offering strong isolation among applications. The approach is applicable both on high-end servers and on small devices. The main idea is to maintain only one copy of every class, regardless of how many applications use it. Classes are transparently and automatically modified, so that each application has a separate copy of its static fields. Two prototypes are described and selected performance data is analyzed. Various aspects of the proposed architectural changes to the java virtual machine are discussed.
We describe a mechanically checked proof of a property of a small system of java programs involving an unbounded number of threads and synchronization, via monitors. We adopt the output of the javac compiler as the se...
详细信息
We describe a mechanically checked proof of a property of a small system of java programs involving an unbounded number of threads and synchronization, via monitors. We adopt the output of the javac compiler as the semantics and verify the system at the bytecode level under an operational semantics for the JVM. We assume a sequentially consistent memory model and atomicity at the bytecode level. Our operational semantics is expressed in ACL2, a Lisp-based logic of recursive functions. Our proofs are checked with the ACL2 theorem prover. The proof involves reasoning about arithmetic;infinite loops;the creation and modification of instance objects in the heap, including threads;the inheritance of fields from superclasses;pointer chasing and smashing;the invocation of instance methods (and the concomitant dynamic method resolution);use of the start method on thread objects;the use of monitors to attain synchronization between threads;and consideration of all possible interleavings (at the bytecode level) over an unbounded number of threads. Readers familiar with monitor-based proofs of mutual exclusion will recognize our proof as fairly classical. The novelty here comes from (i) the complexity of the individual operations on the abstract machine;(ii) the dependencies between java threads, heap objects, and synchronization;(iii) the bytecode-level interleaving;(iv) the unbounded number of threads;(v) the presence in the heap of incompletely initialized threads and other objects;and (vi) the proof engineering permitting automatic mechanical verification of code-level theorems. We discuss these issues. The problem posed here is also put forth as a benchmark against which to measure other approaches to formally proving properties of multithreaded java programs.
Dynamic memory management has been an important part of a large class of computer programs and with the recent popularity of object oriented programming languages, more specifically java, high performance dynamic memo...
详细信息
Dynamic memory management has been an important part of a large class of computer programs and with the recent popularity of object oriented programming languages, more specifically java, high performance dynamic memory management algorithms continue to be of great importance. In this paper, an analysis of java programs, provided by the SPECjvm98 benchmark suite, and their behavior, as this relates to fragmentation, is performed. Based on this analysis, a new model is proposed which allows the estimation of the total internal fragmentation that java systems will incur prior to the programs execution. The proposed model can also accommodate any variation of segregated lists implementation. A comparison with a previously introduced fragmentation model is performed as well as a comparison with actual fragmentation values that were extracted from SPECjvm98. Finally the idea of a test-bed application that will use the proposed model to provide to programmers/developers the ability to know, prior to a programs execution, the fragmentation and memory utilization of their programs, is also introduced. With this application at hand developers as well as designers of applications could better assess the stability, efficiency as well reliability of their applications at compile time. (C) 2002 Elsevier Science Inc. All rights reserved.
java has become the most important language in the Internet area, but its execution performance is severely limited by the true data dependency inherited from the stack architecture defined by the Sun's java Virtu...
详细信息
java has become the most important language in the Internet area, but its execution performance is severely limited by the true data dependency inherited from the stack architecture defined by the Sun's java virtual machine (JVM). To enhance the performance of the JVM, a stack operations folding mechanism for the picojava-II processor was proposed by Sun Microsystems to fold 42.3% stack push/pop instructions. A systematic folding algorithm-Producer, Operator, and Consumer (POC) folding model was proposed in the earlier research to eliminate up to 82.9% of stack push/pop instructions. The remaining push and pop instructions cannot be folded due to the sequential checking characteristic of the POC folding model. A new folding algorithm-enhanced POC (EPOC) folding model is proposed in this paper to further fold the remaining push and pop instructions. In the EPOC folding model, stack push/pop instructions are folded with the proposed Stack Reorder Buffer (SROB) architecture. With a small SROB size of 584 bits, almost all of the stack push/pop instructions can be folded with the precise exception handling capability. Statistical data shows that 98.8% of the stack push/pop instructions can be folded, and the average execution performance speedup of a 4-foldable processor with a 7-byte instruction buffer is 1.74 as compared to a traditional single-pipelined stack machine without folding. (C) 2002 Elsevier Science B.V. All rights reserved.
This paper presents the JAFARDD (a java Architecture based on a Folding Algorithm, with Reservation stations, Dynamic translation, and Dual processing) processor. JAFARDD dynamically translates java stack-dependent by...
详细信息
This paper presents the JAFARDD (a java Architecture based on a Folding Algorithm, with Reservation stations, Dynamic translation, and Dual processing) processor. JAFARDD dynamically translates java stack-dependent bytecodes to RISC-style stack-independent instructions to facilitate the use of a general-purpose RISC core. JAFARDD enables the exploitation of instruction level parallelism among the translated instructions by the use of bytecode folding coupled with Tomasulo's algorithm. We detail the JAFARDD architecture and the global architecture design principles observed while designing each pipeline module. We also illustrate the flow of the java bytecodes through each of the processing phases. Benchmarking of JAFARDD using SPECjvm98 has shown a performance improvement between 1.10 and 2.25.
The java programming language is being increasingly used for application development for mobile and embedded devices. Limited energy and memory resources are important constraints for such systems. Compression is an u...
详细信息
The java programming language is being increasingly used for application development for mobile and embedded devices. Limited energy and memory resources are important constraints for such systems. Compression is an useful and widely employed mechanism to reduce the memory requirements of the system. As the leakage energy of a memory system increases with its size and because of the increasing contribution of leakage to overall system energy, compression also has a significant effect on reducing energy consumption. However, storing compressed data/instructions has a performance and energy overhead associated with decompression at runtime. The underlying compression algorithm, the corresponding implementation of the decompression and the ability to reuse decompressed information critically impact this overhead. In this paper, we explore the influence of compression on overall memory energy using a commercial embedded java virtual machine (JVM) and a customized compression algorithm. Our results show that compression is effective in reducing energy even when considering the runtime decompression overheads for most applications. Further, we show a mechanism that selectively compresses portions of the memory to enhance energy savings. Finally, a scheme for clustering the code and data to,improve the reuse of the decompressed data is presented.
暂无评论