Power consumption has been a major concern in designing microprocessors for portable systems such as notebook computers, hand-held computing and personal telecommunication devices. As these devices increase in popular...
详细信息
ISBN:
(纸本)1581133383
Power consumption has been a major concern in designing microprocessors for portable systems such as notebook computers, hand-held computing and personal telecommunication devices. As these devices increase in popularity and are used in a wider range of applications, a low power design becomes more critical. In this paper, we propose a new microarchitectural data cache design called region-based caching that can reduce power consumption. Power savings is achieved by re-organizing the the first level cache to more efficiently exploit memory reference characteristics produced by programming language semantics. These characteristics enable the cache to be partitioned by memory region (stack, global, heap), reducing power consumption, while retaining comparable performance to a conventional cache design. Applications from the MediaBench benchmark suite indicate that a design with two additional small region-based caches results in 66% reduction in average in energy-delay product.
In this paper we propose a new hardware mechanism, called Register Queues (RQs), which effectively decouples the architected register space from the physical registers. Using RQs, the compiler can allocate physical re...
详细信息
In this paper we propose a new hardware mechanism, called Register Queues (RQs), which effectively decouples the architected register space from the physical registers. Using RQs, the compiler can allocate physical registers to store live values in the software pipelined loop while minimizing the pressure placed on architected registers. We show that decoupling the architected register space from the physical register space can greatly increase the applicability of software pipelining, even as memory latencies increase. Res combine the major aspects of existing rotating register file and register connection techniques to generate efficient software pipeline schedules. Through the use of RQs, we can minimize the register pressure and code expansion caused by software pipelining. We demonstrate the effect of incorporating register queues and software pipelining with 983 loops taken from the Perfect Club, the SPEC suites, and the Livermore Kernels.
We present a hierarchical technique HCLIP to generate near-optimum layouts of CMOS cells in the two-dimensional (2-D) style. HCLIP is based on integer-linear programming and extends our previously published CLIP techn...
详细信息
We present a hierarchical technique HCLIP to generate near-optimum layouts of CMOS cells in the two-dimensional (2-D) style. HCLIP is based on integer-linear programming and extends our previously published CLIP technique to much larger cells and to 2-D cell-arrays. HCLIP partitions the circuit into clusters, generates minimum-width 1-D placements (chain covers) for each cluster and then selects one cover for each cluster such that the overall 2-D cell width and height is minimized. In doing so, HCLIP explores all diffusion sharing between transistor chains belonging to the selected covers. For width minimization, HCLIP yields 2-D layouts that have minimum width with respect to the given set of covers. For both width and height minimization, since HCLIP is approximate and can overestimate cell height, we analyze the theoretical worst-case approximation. Experimental results demonstrate that HCLIP still yields near-optimal layouts in most cases.
The existence of false paths represents a significant and computationally complex problem in the timing analysis of combinational and sequential circuits. In this paper we describe Propositional Satisfiability based a...
详细信息
We address the problem of CMOS cell width minimization in the general two-dimensional (2-D) layout style and propose a novel technique based on integer linear programming (ILP) to solve it exactly. We formulate a 0-1 ...
详细信息
ISBN:
(纸本)9780818675973
We address the problem of CMOS cell width minimization in the general two-dimensional (2-D) layout style and propose a novel technique based on integer linear programming (ILP) to solve it exactly. We formulate a 0-1 ILP model whose solution minimizes cell width along with the routing complexity across the diffusion rows. We present experimental results that evaluate the performance of two ILP solvers that have very different solution methods, and assess the effect of the number of rows on cell width. Run-times for optimal layouts are in seconds for cells with up to 20 transistors. For larger cells, we propose a practical circuit pre-processing scheme that dramatically reduces the run time with little or no loss in optimality.
Perfect quality represents 100% conformance to specifications. Complex systems make it difficult and expensive to assure conformance to specifications by post production testing alone. As a result quality assurance pr...
详细信息
Perfect quality represents 100% conformance to specifications. Complex systems make it difficult and expensive to assure conformance to specifications by post production testing alone. As a result quality assurance processes are moving upstream in the life cycle, i.e., statistical process control and design for manufacturability. These processes try to avoid defects or make it easier to detect defects. In software systems this move to upstream processes for quality control is still in its early stages. As with more traditional manufacturing systems, maturity of the software design and development process will dramatically reduce the cost of attaining conformance to specifications. Even so, it is difficult to imagine a 'bug free' complex software system. Downstream processes will continue to play an important part in the efforts to achieve defect-free software. This paper presents results of a survey which used the defect detection tool Purify to examine off-the-shelf software products in order to show that software errors continue to escape testing and threaten field failures. Errors detected are compiled and presented. All data were collected on C and C++ programs running in a UNIX operating system environment.
This paper describes a commercial software and hardware platform for telecommunications and multimedia processing. The software architecture loosely follows the CORBA and ODP standards of distributed computing and sup...
详细信息
ISBN:
(纸本)9781450373395
This paper describes a commercial software and hardware platform for telecommunications and multimedia processing. The software architecture loosely follows the CORBA and ODP standards of distributed computing and supports a number of application types on different hardware configurations. This paper is the result of lessons learned in the process of designing, building, and modifying an industrial telecommunications platform. In particular, the use of the trading function in the design of the system led to such benefits as support for the dynamic evolution of the system, the ability to dynamically add services and data types to a running system, support for heterogeneous systems, and a simple design performing well enough to handle traffic in excess of 40,000 busy-hour calls.
This paper presents an algorithm that allocates registers optimally for straight-line code running on a generic multi-issue computer. On such a machine, an optimal register allocation is one that minimizes the number ...
详细信息
ISBN:
(纸本)9780897916653
This paper presents an algorithm that allocates registers optimally for straight-line code running on a generic multi-issue computer. On such a machine, an optimal register allocation is one that minimizes the number of issue slots that the code requires. Optimal spill selection and load/store placement are used to minimize the number of additional issue slots needed, given a schedule for the non-memory reference instructions and a fixed number of available physical registers. The generic multi-issue machine model closely models the operation of vector and VLIW processors, and could be extended to model super-scalar processors. The algorithm uses dynamic programming to search the state space of feasible register allocations; implicit and explicit state pruning are used to make the problem tractable without sacrificing optimality. The optimal allocation produced by the algorithm for a substantial example is presented.
A general formula relating the waveform at the output of a CMOS inverter to the waveform at its input is derived. The formula is applied to three cases: a step input, a ramp input, and an exponential input. A one-dime...
详细信息
A general formula relating the waveform at the output of a CMOS inverter to the waveform at its input is derived. The formula is applied to three cases: a step input, a ramp input, and an exponential input. A one-dimensional functional dependence of the inverter propagation delay and output slew rate on circuit parameters is derived and an inverter macromodel for timing analysis is suggested and experimentally verified.
暂无评论