Dynamic software optimization methods are becoming increasingly popular for improving software performance and power. The first step in dynamic optimization consists of detecting frequently executed code, or "cri...
详细信息
ISBN:
(纸本)1581136765
Dynamic software optimization methods are becoming increasingly popular for improving software performance and power. The first step in dynamic optimization consists of detecting frequently executed code, or "critical regions." Previous critical region detectors have been targeted to desktop processors. We introduce a critical region detector targeted to embedded processors, with the unique features of being very size and power efficient, and being completely non-intrusive to the software's execution - features needed in timing-sensitive embedded systems. Our detector not only finds the critical regions, but also determines their relative frequencies, a potentially important feature for selecting among alternative dynamic optimization methods. Our detector uses a tiny cache coupled with a small amount of logic. We provide results of extensive explorations across seventeen embedded system benchmarks. We show that highly accurate results can be achieved with only a 0.02% power overhead and acceptable size overhead. Our detector is currently being used as part of a dynamic hardware/software partitioning approach, but is applicable to a wide-variety of situations. Copyright 2003 acm.
Ever increasing complexity and heterogeneity of SoC platforms require on-chip communication schemes beyond the currently omnipresent shared bus architectures. To prevent time consuming design changes late in the desig...
详细信息
ISBN:
(纸本)9781581137422
Ever increasing complexity and heterogeneity of SoC platforms require on-chip communication schemes beyond the currently omnipresent shared bus architectures. To prevent time consuming design changes late in the design flow, we propose the early exploration of the on-chip communication architecture to meet performance and cost requirements. Based on systemC 2.0.1 we have defined a modular exploration framework, which is able to capture the effect on performance for different on-chip networks like dedicated point-to-point, shared bus, and crossbar topologies. Monitoring of performance parameters like utilization, latency and throughput drives the mapping of the intermodule traffic to an efficient communication architecture. The effectiveness of our approach is demonstrated by the exemplary design of a high performance Network Processing Unit (NPU), which is compared against a commercial NPU device.
In this paper, we describe a new methodology based on game theory for minimizing the average power of a circuit during scheduling in behavioral synthesis. The problem of scheduling in data-path synthesis is formulated...
详细信息
ISBN:
(纸本)9781581137422
In this paper, we describe a new methodology based on game theory for minimizing the average power of a circuit during scheduling in behavioral synthesis. The problem of scheduling in data-path synthesis is formulated as an auction based non-cooperative finite game, for which solutions are developed based on the Nash equilibrium function. Each operation in the data-path is modeled as a player bidding for executing an operation in the given control cycle, with the estimated power consumption as the bid. Also, a combined scheduling and binding algorithm is developed using a similar approach in which the two tasks are modeled together such that the Nash equilibrium function needs to be applied only once to accomplish both the scheduling and binding tasks together. The combined algorithm yields further power reduction due to additional savings during binding. The proposed algorithms yield better power reduction than ILP-based methods with comparable run times and no increase in area overhead.
Growing demand for high performance in embedded systems is creating new opportunities to use speech recognition systems. In several ways, the needs of embedded computing differ from those of more traditional general-p...
详细信息
ISBN:
(纸本)9781581137422
Growing demand for high performance in embedded systems is creating new opportunities to use speech recognition systems. In several ways, the needs of embedded computing differ from those of more traditional general-purpose systems. Embedded systems have more stringent constraints on cost and power consumption that lead to design bottlenecks for many computationally-intensive applications. This paper characterizes the speech recognition process on handheld mobile devices and evaluates the use of modern architecture features and compiler techniques for performing real-time speech recognition. We evaluate the University of Colorado sonic speech recognition software on the IMPACT architectural simulator and compiler framework. Experimental results show that by using a strategic set of compiler optimization, a 500 MHz processor with moderate levels of instruction-level parallelism and cache resources can meet the real-time computing and power constraints of an advanced speech recognition application.
Video encoders are an important IP block in mobile multimedia systems. In this paper, we describe a low-cost low-power multi-standard (MPEG4, JPEG, and H.263) video/image encoder. The low-cost and low-power aspects ar...
详细信息
Video encoders are an important IP block in mobile multimedia systems. In this paper, we describe a low-cost low-power multi-standard (MPEG4, JPEG, and H.263) video/image encoder. The low-cost and low-power aspects are achieved by the right choice of algorithms and architectures. In the algorithm front, an embedded compression technique for reducing the size of loop memory has enabled a single-chip low-cost realization of the encoder. Further, the hardware components that accelerate the kernels of encoding are implemented as application specific instruction-set processors (ASIPs) thereby providing flexibility to address multi-standard encoding. The power and area estimates for the encoder for QCIF@15fps in 0.18 /spl mu/m CMOS technology are 30 mW and 20 mm/sup 2/ respectively including the loop memory.
Summary form only given. The article describes a state-centric abstraction for application users to interact with sensor networks. Just as in data-centric routing and storage where physical nodes are less important th...
详细信息
ISBN:
(纸本)9781581137422
Summary form only given. The article describes a state-centric abstraction for application users to interact with sensor networks. Just as in data-centric routing and storage where physical nodes are less important than the data itself, state-centric abstraction introduces "states" as a natural vocabulary to describe spatio-temporal physical phenomena that the sensor networks are typically designed for. Application programmers specify the computation as creation, sharing and transformation of states, which naturally map to descriptions in signal processing and control applications. We argue that due to the dynamic nature of sensor networks, programs written in state-centric abstractions are more invariant to constant changes in data stream configurations and make the resulting software more portable across multiple sensor network platforms. With the help of models of sensor collaboration, sensing and estimation, the state-centric specifications are mapped into collaborative processing tasks at compile time, and further maintained at run time, leveraging the data-centric caching and routing services. We use a multi-target tracking system as an example to show how state-centric programming models can raise the abstraction level for users to interact with sensor networks and help modularize the design.
暂无评论