The energy dissipation associated with driving long wires accounts for a significant fraction of the overall system energy. This is particularly the case with the increasing importance of the inter-wire parasitic capa...
详细信息
The energy dissipation associated with driving long wires accounts for a significant fraction of the overall system energy. This is particularly the case with the increasing importance of the inter-wire parasitic capacitance in deep sub-micron technology. A closed form solution for estimating the energy dissipation of a data bus is presented that uses an elaborate parasitic wire model. This includes the distributed RLC effects of wires as well as the coupling between wires. We also propose a general class of coding techniques to reduce energy dissipation for data transmission by trading-off between computation and communication costs. An algorithm is presented to design efficient coding strategies to minimize energy. When the effects of interwire capacitance are taken into account, the best coding strategy is not to simply minimize transitions - an approach followed by previous research. Instead, Transition Pattern Coding (TPC) modifies the transition profile to minimize energy, and in many cases higher transition activity can result in lower energy. Results show that up to a factor of 2 reduction in energy.
The incremental, 'construct by correction' design methodology has become widespread in constraint-dominated DSM design. We study the problem of ECO for physical design domains in the general context of increme...
详细信息
The incremental, 'construct by correction' design methodology has become widespread in constraint-dominated DSM design. We study the problem of ECO for physical design domains in the general context of incremental optimization. We observe that an incremental design methodology is typically built from a full optimizer that generates a solution for an initial instance, and an incremental optimizer that generates a sequence of solutions corresponding to a sequence of perturbed instances. Our hypothesis is that in practice, there can be a mismatch between the strength of the incremental optimizer and the magnitude of the perturbation between successive instances. When such a mismatch occurs, the solution quality will degrade - perhaps to the point where the incremental optimizer should be replaced by the full optimizer. We document this phenomenon for three distinct domains - partitioning, placement and routing - using leading industry and academic tools. Our experiments show that current CAD tools may not be correctly designed for ECO-dominated design processes. Thus, compatibility between optimizer and instance perturbation merits attention both as a research question and as a matter of industry design practice.
Communication designs form the fastest growing segment of the semiconductor market. Both network processors and wireless chipsets have been attracting a great deal of research attention, financial resources and design...
详细信息
Communication designs form the fastest growing segment of the semiconductor market. Both network processors and wireless chipsets have been attracting a great deal of research attention, financial resources and design efforts. However, further progress is limited by lack of adequate system methodologies and tools. Our goal in this tutorial is to provide impetus for development of communication design techniques and tools. The first part addresses network processors (NP) that we study from three viewpoints: application, architecture, and system software and compilation tools. In addition to summary of main issues and representative case studies, we identify main system design issues. The second part of the tutorial focuses on wireless design. The main emphasis is on platform-based design methodology that leverages on functional profiling, architecture exploration, and orthogonalization of concerns to facilitate low-power wireless communication systems. The highlight of the paper, an in-depth study of the state-of-the-art wireless design, PicoRadio, is used as explanatory design example.
Sub-micron feature sizes have resulted in a considerable portion of power to be dissipated on the buses, causing an increased attention on savings for power at the behavioral level and RT level of design. This paper a...
详细信息
Sub-micron feature sizes have resulted in a considerable portion of power to be dissipated on the buses, causing an increased attention on savings for power at the behavioral level and RT level of design. This paper addresses the problem of minimizing power dissipated in switching of the buses in data path synthesis. Unlike the previous approaches in which minimization of the power consumed in buses has not been considered until operation scheduling is completed, our approach integrates the bus binding problem into scheduling to exploit the impact of scheduling on reduction of power dissipated on the buses more fully and effectively. We accomplish this by formulating the problem into a flow problem in a network, and devising an efficient algorithm which iteractively finds maximum flow of minimum cost solutions in the network. Experimental results on a number of benchmark problems show that given resource and global timing constraints our designs are 22% power-efficient over the designs produced by a random-move based solution, and 18% power-efficient over the designs by a clock-step based optimal solution.
VLIW ASIPs provide an attractive solution for increasingly pervasive real-time multimedia and signal processing embedded applications. In this paper we propose an algorithm to support trade-off exploration during the ...
详细信息
VLIW ASIPs provide an attractive solution for increasingly pervasive real-time multimedia and signal processing embedded applications. In this paper we propose an algorithm to support trade-off exploration during the early phases of the design/specialization of VLIW ASIPs with clustered datapaths. For purposes of an early exploration step, we define a parameterized family of clustered datapaths D(m,n), where m and n denote interconnect capacity and cluster capacity constraints on the family. Given a kernel, the proposed algorithm explores the space of feasible clustered datapaths and returns: a datapath configuration;a binding and scheduling for the operations;and a corresponding estimate for the best achievable latency over the specified family. Moreover, we show how the parameters m and n, as well as a target latency optionally specified by the designer, can be used to effectively explore trade-offs among delay, power/energy, and latency. Extensive empirical evidence is provided showing that the proposed approach is strikingly effective at attacking this complex optimization problem.
Noise effects such as power supply and crosstalk can significantly affect the performance of deep submicron designs. These delay effects are highly input pattern dependent. Existing path selection and timing analysis ...
详细信息
Noise effects such as power supply and crosstalk can significantly affect the performance of deep submicron designs. These delay effects are highly input pattern dependent. Existing path selection and timing analysis techniques cannot capture the effects of noise on cell/interconnect delays. Therefore, the selected critical paths may not be the longest paths and predicted circuit performance might not reflect the worst-case circuit delay. In this paper, we propose a path selection technique that can consider power supply noise effects on the propagation delays. Next, for the selected critical paths, we propose a pattern generation technique for dynamic timing analysis such that the patterns produce the worst-case power supply noise effects on the delays of these paths. Our experimental results demonstrate the difference in estimated circuit performance for the case when power supply noise effects are considered vs. when these effects are ignored. Thus, they validate the need for considering power supply noise effects on delays during path selection and dynamic timing analysis.
In this paper, we quantify the impact of global interconnect optimization techniques that address such design objectives as delay, peak noise, delay uncertainty due to noise, power, and cost. In doing so, we develop a...
详细信息
In this paper, we quantify the impact of global interconnect optimization techniques that address such design objectives as delay, peak noise, delay uncertainty due to noise, power, and cost. In doing so, we develop a new system-performance simulation model as a set of studies within the MARCO GSRC Technology Extrapolation (GTX) system. We model a typical point-to-point global interconnect and focus on accurate assessment of both circuit and design technology with respect to such issues as inductance, signal line shielding, dynamic delay, buffer placement uncertainty and repeater staggering. We demonstrate, for example, that optimal wire sizing models need to consider inductive effects - and that use of more accurate {-1,3} worst-case capacitive coupling noise switch factors substantially increases peak noise estimates compared to traditional {0,2} bounds. We also find that optimal repeater sizes are significantly smaller than conventional models would suggest, especially when considering energy-delay issues.
Deep submicron technology scaling has two major ramifications on the design process. First, reduced feature size significantly increases wire delay, thus resulting in critical paths being dominated by global interconn...
详细信息
Deep submicron technology scaling has two major ramifications on the design process. First, reduced feature size significantly increases wire delay, thus resulting in critical paths being dominated by global interconnect rather than gate delays. Second, ultra high level of integration mandates design of systems-on-chip that encompass numerous intra-synchronous blocks with decreased functional granularity and increased communication demands. To address these issues we have developed an on-chip bus network design methodology and corresponding set of tools which, for the first time, close the synthesis loop between system and physical design. The approach has three components: a communication profiler, a bus network designer, and a fast approximate floorplanner. The communication profiler collects run-time information about the traffic between system cores. The bus network design component optimizes the bus network structure by coordinating information from the other two components. The floorplanner aims at creating a feasible floorplan and to communicate information about the most constrained parts of the network.
In this paper, we propose a unified approach to partitioning, floorplanning, and retiming for effective and efficient performance optimization. The integration enables the partitioner to exploit more realistic geometr...
详细信息
In this paper, we propose a unified approach to partitioning, floorplanning, and retiming for effective and efficient performance optimization. The integration enables the partitioner to exploit more realistic geometric delay model provided by the underlying floorplan. Simultaneous consideration of partitioning and retiming under the geometric delay model enables us to hide global interconnect latency effectively by repositioning FF along long wires. Under the proposed geometric embedding based performance driven partitioning problem, our GEO algorithm performs multi-level top-down partitioning while determining the location of the partitions. We adopt the concept of sequential arrival time [14] and develop sequential required time in our retiming based timing analysis engine. GEO performs cluster-move based iterative improvement on top of multi-level cluster hierarchy [4], where the gain function obtained from the timing analysis is based on the minimization of cutsize, wirelength, and sequential slack. In our comparison to (i) state-of-the-art partitioner hMetis [9] followed by retiming [11] and simulated annealing based slicing floorplanning [15], and (ii) state-of-the-art simultaneous partitioning with retiming HPM [7] followed by floorplanning [15], GEO obtains 35% and 23% better delay results while maintaining comparable cutsize, wirelength, and runtime results.
The finite-difference time-domain (FDTD) method of solving the full-wave Maxwell's equations has been recently extended to provide accurate and numerically stable operation for time steps exceeding the Courant lim...
详细信息
The finite-difference time-domain (FDTD) method of solving the full-wave Maxwell's equations has been recently extended to provide accurate and numerically stable operation for time steps exceeding the Courant limit. The elimination of an upper bound on the size of the time step was achieved using an alternating-implicit direction (ADI) time-stepping scheme. This greatly increases the computational efficiency of the FDTD method for classes of problems where the cell size of the three-dimensional space lattice is constrained to be much smaller than the shortest wavelength in the source spectrum. One such class of problems is the analysis of high-speed VLSI interconnects where full-wave methods are often needed for the accurate analysis of parasitic electromagnetic wave phenomena. In this paper, we present an enhanced FDTD-ADI formulation which permits the modeling of realistic lossy materials such as semiconductor substrates and metal conductors as well as artificial lossy materials needed for perfectly matched layer (PML) absorbing boundary conditions. Simulations using our generalized FDTD-ADI formulation are presented to demonstrate the accuracy and extent to which the computational burden is reduced by the ADI scheme.
暂无评论