For call intensive programs, function calls are major bottlenecks during program execution since they usually force register contents to be spilled into memory. Such register to memory spills are much more pronounced ...
详细信息
Adiabatic process in thermodynamics transfers energy across zero temperature difference. The adiabatic CMOS design style attempts to switch a transistor to transfer energy across its source and drain while the voltage...
详细信息
ISBN:
(纸本)0769518893
Adiabatic process in thermodynamics transfers energy across zero temperature difference. The adiabatic CMOS design style attempts to switch a transistor to transfer energy across its source and drain while the voltage difference is zero. We define an adiabatic micro-architecture that pushes instructions across zero IPC gradient. The IPC gradient can be zero across time: for the same stage IPC over time does not vary, or across space: adjacent pipeline stages have zero variance. The reason to consider adiabatic micro-architectures is that the energy for a given computation can be shown to be minimum for an adiabatic micro-architecture. An adiabatic compiler, really a back-end, is defined to be a compiler to support an adiabatic micro-architecture achieve its goals. The minimal support provided by an adiabatic compiler includes a static estimation of program ILP. We add new passes to the MachineSUIF compiler, to flag instruction groups that can potentially walk through a superscalar pipeline as a group. Hence, these instruction groups offer a fairly robust model of superscalar microarchitecture ILP. A compile time scheduling analysis can also generate instruction slack values. The slack indicates the program region within which an instruction can be scheduled. We also present a dispatch stage dynamic scheduling algorithm that utilizes the compiler annotated slacks to reschedule instructions with the explicit objective of minimizing the dispatch stage IPC variance. In other words, the proposed dispatch stage is adiabatic. Preliminary experimental results demonstrate an average reduction of 4.16% in IPC variance over SPEC2000 benchmarks with the adiabatic compiler and microarchitecture. The preliminary evaluation also shows the average processor dispatch stage energy reduction of 3.9% over the same SPEC2000 benchmarks. We expect to add similar IPC smoothening control knobs at instruction fetch and issue stages as well in the future, which should result in a more signifi
Full web browsing with smart phones requires a high-performance JavaScript engine since JavaScript execution with a mobile CPU is slow. So, mobile JavaScript engines employ a just-in-time compiler (JITC), which transl...
详细信息
This paper introduces the RaPTEX toolchain and its use for rapid prototyping and evaluation of embedded communication systems. This toolchain is unique for several reasons. First, by using static code analysis techniq...
详细信息
ISBN:
(纸本)9781605589213
This paper introduces the RaPTEX toolchain and its use for rapid prototyping and evaluation of embedded communication systems. This toolchain is unique for several reasons. First, by using static code analysis techniques, it is able to predict both the typical case and bounds for resource usage, such as computational, memory (both static and dynamic), and energy requirements. Typical software toolchains report only on partial memory requirements (only code and static data, but not stack memory) and ignore other the important resources. Second, it provides a graphical user interface with configurable software building blocks which allows easy creation and customization of protocol stacks. Third, it targets low-cost, low-energy hardware, allowing the creation of low-cost systems. We demonstrate the RaPTEX toolchain by evaluating different design options for an experimental ultrasonic communication system for biotelemetry in extremely shallow waters. The power, size, mass and cost constraints of this application make it critical to pack as much processing into the available resources as possible. The RaPTEX toolchain analyzes resource use, enabling the system to run safely closer to the edge of the resource envelope. The toolchain also helps users with the rapid prototyping of communication protocols for embedded systems by providing users with quick feedback on resource requirements. We demonstrate the use and output of the toolchain. We compare the accuracy of its predictions against measurements of actual execution of the actual underwater communication system on real hardware. Copyright 2010 ACM.
Explosive growth in the availability of various kinds of data in both commercial and scientific domains have resulted in an unprecedented need to develop novel data-driven, knowledge discovery techniques. Data mining ...
详细信息
ISBN:
(纸本)9781605589213
Explosive growth in the availability of various kinds of data in both commercial and scientific domains have resulted in an unprecedented need to develop novel data-driven, knowledge discovery techniques. Data mining is one such data-centric application. It consists of methods to discover interesting, nontrivial, and useful patterns hidden within massive amounts of data. Researchers from both academia and industry have recognized that the challenges of data mining applications will help shape the future of multi-core processor and parallelizing compiler designs. However, relatively little has been done to understand the performance characteristics of these applications on modern multi-core processors. The exponential growth of on-chip resources make it critical to exploit parallelism at all granularities for improving the performance of data mining applications. In this paper, we examine the instruction-level, memory-level and thread-level parallelism available in data mining applications. We observe that (i) data mining applications have a slightly different instruction mix from SPEC integer applications, and this difference can potentially lead to different ILP extraction;ii) although many data mining applications suffer from data cache miss penalty, similar to SPEC integer applications, different techniques must be developed to enable effective prefetching due to the existance of complex and irregular data structures, such as hash tables;(iii) although data mining applications have large amount of thread-level parallelism, efficient extraction of such parallelism depends on on-chip cache performance;and (iv) the performance characteristics of data mining applications can vary at runtime, and thus techniques that dynamically tune the applications to adapt to such variations are desired. Copyright 2010 ACM.
We received a total of 25 submissions for INTERACT-8. Each member of the Program Committee was asked to evaluate and rank the papers based on his areas of expertise and interests. The paper reviews were done during De...
We received a total of 25 submissions for INTERACT-8. Each member of the Program Committee was asked to evaluate and rank the papers based on his areas of expertise and interests. The paper reviews were done during December 2003. With the semester ending and the holidays coming, it was rather hectic for all committee members. I would like to express my appreciation and thank all the Program Committee members and external referees for their efforts and timely reviews. The annual workshop on interaction between compilers and computer architectures (INTERACT) has been organized to promote new ideas and to present recent research and developments in compiler and architecture/micro-architecture techniques that enhance each other’s capabilities and performance. We are very pleased to include 12 papers in the final workshop program. We hope the program will generate excitement and research interest among the workshop participants, and this proceedings will further research and developments in the interactionbetweencompilers and computerarchitectures. The INTERACT workshop has entered its eighth year. I would like to thank all the contributors and participants, whose interest and support are the basis for the existence of this workshop. Special thanks are due to Antonio Gonzales and Eric Rotenberg, who have served on the Program Committee for the last five years. As the Program Chair I am truly impressed by their dedication. I would also like to thank the general chair, Gyungho Lee, for continuously securing the necessary funding to print the workshop proceedings.
This book constitutes the thoroughly refereed post-conference proceedings of the workshops held at the 37th International Symposium on computer Architecture, ISCA 2010, in Saint-Malo, France, in June 2010. The 28 revi...
ISBN:
(数字)9783642243226
ISBN:
(纸本)9783642243219
This book constitutes the thoroughly refereed post-conference proceedings of the workshops held at the 37th International Symposium on computer Architecture, ISCA 2010, in Saint-Malo, France, in June 2010. The 28 revised full papers presented were carefully reviewed and selected from the lectures given at 5 of these workshops. The papers address topics ranging from novel memory architectures to emerging application design and performance analysis and encompassed the following workshops: A4MMC, applications for multi- and many-cores; AMAS-BT, 3rd workshop on architectural and micro-architectural support for binary translation; EAMA, the 3rd workshop for emerging applications and many-core architectures; WEED, 2nd workshop on energy efficient design, as well as WIOSCA, the annualworkshop on the interactionbetween operating systems and computer architecture.
暂无评论