Ever since java was introduced world wide, the execution performance has always been a problem. As one of the solutions, a bytecode instruction folding process far java processors was developed in a Picojava model and...
详细信息
ISBN:
(纸本)0769507808;0769507816
Ever since java was introduced world wide, the execution performance has always been a problem. As one of the solutions, a bytecode instruction folding process far java processors was developed in a Picojava model and a Producer, Operator and Consumer (POC) model, but if could not handle certain types of instruction sequences. In this paper, a new instruction folding scheme based on a new, advanced POC model is proposed and demonstrates improvement in bytecode execution. The proposed POC model is able to detect and fold all possible instruction sequence types dynamically in hardware, including a sequence that is separated by other bytecode instructions. SPEC JMV98 benchmark results shaw that the proposed POC model-based folder can save more than 90% of folding operations. In addition, a design of the proposed POC model-based folding process in hardware is much smaller and more efficient than traditional folding mechanisms.
Chip-multiprocessors are an emerging trend for embedded systems. In this article, we introduce a real-time java multiprocessor called JopCMP. It is a symmetric shared-memory multiprocessor, and consists of up to eight...
详细信息
Chip-multiprocessors are an emerging trend for embedded systems. In this article, we introduce a real-time java multiprocessor called JopCMP. It is a symmetric shared-memory multiprocessor, and consists of up to eight java Optimized processor (JOP) cores, an arbitration control device, and a shared memory. All components are interconnected via a system on chip bus. The arbiter synchronizes the access of multiple CPUs to the shared main memory. In this article, three different arbitration policies are presented, evaluated, and compared with respect to their real-time and average-case performance: a fixed priority, a fair-based, and a time-sliced arbiter. Tasks running on different CPUs of a chip-multiprocessor (CMP) influence each others' execution times when accessing a shared memory. Therefore, the system needs an arbiter that is able to limit the worst-case execution time of a task running on a CPU, even though tasks executing simultaneously on other CPUs access the main memory. Our research shows that timing analysis is in fact possible for homogeneous multiprocessor systems with a shared memory. The timing analysis of tasks, executing on the CMP using time-sliced memory arbitration, leads to viable worst-case execution time bounds. The time-sliced arbiter divides the memory access time into equal time slots, one time slot for each CPU. This memory arbitration scheme allows for a calculation of upper bounds of java application worst-case execution times, depending on the number of CPUs, the time slot size, and the memory access time. Examples of worst-case execution time calculation are presented, and the analyzed results of a real-world application task are compared to measured execution time results. Finally, we evaluate the tradeoffs when using a time-predictable solution compared to using average-case optimized chip-multiprocessors, applying three different benchmarks. These experiments are carried out by executing the programs on the CMP prototype.
<正>Designing a java processor supporting horizontal multithreading has been becoming more attractive as network computing gains *** from the traditional superscalar processors that issue multiple instructions from ...
详细信息
<正>Designing a java processor supporting horizontal multithreading has been becoming more attractive as network computing gains *** from the traditional superscalar processors that issue multiple instructions from a single instruction stream to exploit the instruction level parallelism(ILP),the horizontal multithreading java processors issue multiple instructions (bytecodes) from multiple threads in parallel to exploit not only the ILP but the thread level parallelism(TLP).Such processors have multiple dispatch slots and require the instruction fetch unit to supply instructions with much higher bandwidth than superscalar *** a traditional superscalar cache architecture in a horizontal multithreading java processor results in high cache miss ratio caused by the interference among the *** paper investigates multibank instruction cache architecture for horizontal multithreading java processor to meet the requirements of the high instruction fetch *** order to evaluate the cache performance as well as the horizontal multithreading java processor performance,we developed a trace driven *** simulator consists of a trace generator that generates the java bytecode execution traces and an architectural simulator that reads the traces and evaluates the performance of the instruction cache and the overall performance of the java processor. Our simulation results show that the performance improvements are obtained by the low cache miss ratio and the high instruction fetch bandwidth of the proposed cache *** IPC performance is about 19 when both the number of slots and the number of banks are 8,about 5 times better than one bank cache.
We propose a multithreaded java microcontroller--called Komodo microcontroller--with a new hardware event handling mechanism that allows handling of simultaneous overlapping events with hard real-time requirements. Re...
详细信息
ISBN:
(纸本)9780769504254
We propose a multithreaded java microcontroller--called Komodo microcontroller--with a new hardware event handling mechanism that allows handling of simultaneous overlapping events with hard real-time requirements. Real-time java threads are used as interrupt service threads (ISTs) instead of interrupt service routines (ISRs). Our proposed Komodo microcontroller supports multiple ISTs with zero-cycle context switching overhead. We evaluate the basic architectural attributes using real-time event parameters of an autonomous guided vehicle. When calculating the maximum vehicle speed without violating the real-time constraints, ISTs dominate ISRs by a speed increase of 28%.
This paper describes a RTSJ-based application program interface which aims to provide advanced concurrent realtime calculation structure, facilitating real-time embedded system development. Further, development and op...
详细信息
This paper describes a RTSJ-based application program interface which aims to provide advanced concurrent realtime calculation structure, facilitating real-time embedded system development. Further, development and optimization(the time/footprint requirement) of real-time java applications to a specific java processor are carried out. The use of the proposed API is illustrated in the paper by means of a case study that implements a crane control system. This case study highlights the benefits and advantages of the proposed API.
Designing a java processor supporting horizontal multithreading has been becoming more attractive as network computing gains importance. Different from the traditional superscalar processors that issue multiple instru...
详细信息
ISBN:
(纸本)0780378407
Designing a java processor supporting horizontal multithreading has been becoming more attractive as network computing gains importance. Different from the traditional superscalar processors that issue multiple instructions from a single instruction stream to exploit the instruction level parallelism (ILP), the horizontal multithreading java processors issue multiple instructions (bytecodes) from multiple threads in parallel to exploit not only the ILP but the thread level parallelism (TLP). Such processors have multiple dispatch slots and require the instruction fetch unit to supply instructions with much higher bandwidth than superscalar processors. Using a traditional superscalar cache architecture in a horizontal multithreading java processor results in high cache miss ratio caused by the interference among the threads. This paper investigates multibank instruction cache architecture for horizontal multithreading java processor to meet the requirements of the high instruction fetch bandwidth. In order to evaluate the cache performance as well as the horizontal multithreading java processor performance, we developed a trace driven simulator. The simulator consists of a trace generator that generates the java bytecode execution traces and an architectural simulator that reads the traces and evaluates the performance of the instruction cache and the overall performance of the java *** simulation results show that the performance improvements are obtained by the low cache miss ratio and the high instruction fetch bandwidth of the proposed cache architecture. The IPC performance is about 19 when both the number of slots and the number of banks are 8, about 5 times better than one bank cache.
暂无评论