Floating point multiplication is one of the most frequently used arithmetic operations in a wide variety of applications, but the high power consumption of the ieee-754 standard floating point multiplier prohibits its...
详细信息
Floating point multiplication is one of the most frequently used arithmetic operations in a wide variety of applications, but the high power consumption of the ieee-754 standard floating point multiplier prohibits its implementation in many low power systems, such as wireless sensors and other battery-powered embedded systems, and limits performance scaling in high performance systems, such as CPUs and GPGPUs for scientific computation. This paper presents a low-power accuracy-configurable floating point multiplier based on Mitchell's Algorithm. Post-layout SPICE simulations in a 45nm process show same-delay power reductions up to 26X for single precision and 49X for double precision compared to their ieee-754 counterparts. Functional simulations on six CPU and GPU benchmarks show significantly better power reduction vs. quality degradation trade-offs than existing bit truncation schemes.
Advanced IC process technology nodes (28,20,14nm and below) have relied on the synergy of process-aware physical design and physical verification methodologies with design-aware yield engineering (on the manufacturing...
详细信息
Advanced IC process technology nodes (28,20,14nm and below) have relied on the synergy of process-aware physical design and physical verification methodologies with design-aware yield engineering (on the manufacturing side), in order to fulfill ITRS scaling and performance requirements. These capabilities include not only additional design rules, or additional modeling capabilities, or incremental verification tools, but rather a qualitatively new set of DFM/DEM (design For Manufacturing - design Enabled Manufacturing) methodologies aimed at variability management, i.e. at characterization and remapping of systematic variability effects caused by design/process interaction. A typical example is a “correct by construction” router flow, augmented with yield detractor and yield enhancer patterns, implemented for 20nm high performance processor designs. In such a flow, timing (on the design side) and yield (on the manufacturing side) are co-optimized, in order to guarantee high yield and specified parametric performance in the first silicon run. In spite of current successes the incremental path down to 14nm will be disrupted, because of the hard physical limits simultaneously occurring in geometric scaling and electrical scaling, and the transition to 10 and 7 nm nodes will require a (re)volutionary DFM (design For Manufacturing) paradigm. Building on the state-of-the-art in design/technology co-optimization this work will review the three most likely design/process scenarios for 10 and 7nm design enablement, which could potentially allow the synthesis of the “design gap” and “patterning gap” altogether.
The thriving growth in mobile consumer electronics makes energy efficiency in the embedded system design an important and recurring theme. Phase Change Memory (PCM) has shown its potential in replacing DRAM as the mai...
详细信息
ISBN:
(纸本)9781479962792
The thriving growth in mobile consumer electronics makes energy efficiency in the embedded system design an important and recurring theme. Phase Change Memory (PCM) has shown its potential in replacing DRAM as the main memory option due to its (65%) reduced energy requirements. However, when considering the usage of PCM main memory, its write endurance becomes a critical issue, and wear leveling design is a common approach to resolve this issue. Even though the wear leveling design should stress operation efficiency and overhead reduction, existing wear leveling strategies designed for PCM main memory are usually dedicated to prolonging the lifetime of PCM. In this paper, we propose the perspective that, instead of valuing PCM lifetime exploitation as the first priority, we should turn to satisfy the product warranty period. To this end, further enhancement of operation efficiency and reduction of management overhead could be achieved. We thus propose a warranty-aware page management design to enhance the operation efficiency for managing the endurance issue in PCM. To show the effectiveness of the proposed design, we collected real traces on fiasco. OC by running SPEC2006 benchmarks with different write intensity workloads. The experiment results showed that our design reduced the overhead to one third of that of the state-of-the-art designs while still providing the same level of performance.
The real time anomaly detection in wide area monitoring of smart grids is critical to enhance the reliability of power systems. However, capturing the features of anomalous interruption and then detecting them at real...
详细信息
The real time anomaly detection in wide area monitoring of smart grids is critical to enhance the reliability of power systems. However, capturing the features of anomalous interruption and then detecting them at real time is difficult for large-scale smart grids, because the measurement data volume and complexity increases drastically with the exponential growth of data from the immense intelligent monitoring devices to be rolled out and the need for fast information retrieval from those mass data. Most of existing anomaly detection methods fail to handle it well. This paper proposes a spatial-temporal correlation based anomalous behavior model to capture the characteristics of anomaly such as transmission line outages in smart grid. Inspired by Ledoit-Wolf Shrinkage (LWS) method, we develop the real time anomaly detection (ReTAD) algorithm to overcome the issue of gigantic measurement data volume. The proposed algorithm is not only suitable for large number of power systems with high dimensional measurement data, but at the same time is also low computational complexity to apply for real time detection. Using 14-, 30, and 2383-bus systems, our experimental study demonstrates that our proposed ReTAD algorithm successfully detects the anomalous events at real time.
This paper presents the development of a Coloured Petri Net model for a concurrent application running on a heterogeneous multi/manycore node. The used software runtime (StarPu) allows the expression of the applicatio...
详细信息
This paper presents the development of a Coloured Petri Net model for a concurrent application running on a heterogeneous multi/manycore node. The used software runtime (StarPu) allows the expression of the application as a DAG (Directed Acyclic Graph) of tasks and the partition of the heterogeneous hardware in worker units. The CPN modelling allows the rapid evaluation of the suitability of the implemented scheduling algorithms for a given problem and supports the process of new algorithms design and implementation. The scheduler models were validated through runs on the real architecture.
The recent advances in thin-film thermoelectric (TE) materials have created opportunities for on-chip cooling and energy-harvesting with heat-fluxes >100W/cm 2 . However, it remains unclear how effective these mate...
详细信息
The recent advances in thin-film thermoelectric (TE) materials have created opportunities for on-chip cooling and energy-harvesting with heat-fluxes >100W/cm 2 . However, it remains unclear how effective these materials are in the context of realistic microprocessor floorplan and workloads. Moreover, these TE materials suffer from contact parasitics that can significantly impact their performance. To evaluate the workload dependent performance of on-chip TE devices, we developed a hierarchical simulation methodology that connects an architectural simulator and a power estimation tool with a thermal simulator capable of simulating TE devices. The well-known HotSpot thermal simulator is modified to incorporate TE equations along with contact parasitics in the TE module. SimpleScalar and McPAT were used to generate the runtime power of different functional units in an Out-of-Order processor across the SPEC2000 workloads. The power-map generated by McPAT is used by our TE enhanced HotSpot simulator to evaluate the cooling and harvesting capabilities of on-chip TE modules. Our results indicate that it is possible to obtain 11°C peak cooling at the hot-spots, or harvest upto 85mW of power from the hot-spots. We also show that on-chip TE devices can aid in boosting the clock frequency of the processor from 1200MHz to 1600MHz under iso-temperature comparison with the no-TE case. This framework also allows for the rapid design space exploration of TE module's material/physical parameters and the optimum placement options for the TE module on the chip floorplan.
Craig interpolation is a known method for expressing a target function f as a function of a given set of base functions G. The resulting interpolant represents the dependency function h, such that f = h(G). Generally,...
详细信息
Craig interpolation is a known method for expressing a target function f as a function of a given set of base functions G. The resulting interpolant represents the dependency function h, such that f = h(G). Generally, the set G contains enough base functions to enable the existence of multiple dependency functions whose quality mainly depends on which base functions were selected for reconstruction. The interpolation is not an optimisation problem and thus, often, it selects some random base functions and, particularly, omits others potentially required for an optimal implementation of the target function. Mainly, it is impossible to impose that the interpolant uses a specific base function. In this paper, we propose a method that forces a specific base function g i as a primary input of a dependency function. Such a dependency function is built as a Shannon expansion of two constrained Craig interpolants for the assignments of the primary inputs for which g i evaluates to 0 and 1, respectively. We also introduce a method that iteratively imposes a predefined set of base functions. In each iteration, we generate a new dependency function for use as the target function of the next iteration in order to force the use of a base function. We show that, unlike the standard Craig interpolation method, our carving method succeeds to impose the desired base functions with very high probability. It recomposes single-output logic circuits as their delay- or area-optimised implementations regardless of the input implementation. The proposed methods can be efficiently employed for rewriting circuits in some synthesis-based algorithms.
Considering an absence of formalization in educational content notation analysis and design on the contrary of other fields, where modeling plays irreplaceable role we have performed certain experiments. We have abstr...
详细信息
Considering an absence of formalization in educational content notation analysis and design on the contrary of other fields, where modeling plays irreplaceable role we have performed certain experiments. We have abstracted certain equivalence between graduate profile and a profession profile. Subsequently we have substituted this profiles as a system of a knowledge, ability, skills etc. Applied substitution between profiles and a system allows us to treat it in analogy with software engineering process - Unified Process which describes how requirements are turned into educational content. Such approach leads to models with zero redundancy which is very important for educational content even for computeraided education or distance learning. The results of our experiment experiment we have examined also by short short questionnaire.
暂无评论