Integrated circuits in modern systems-on-chip and microprocessors are typically operated with sufficient timing margins to mitigate the impact of rising process, voltage, and temperature (PVT) variations at advanced p...
详细信息
Integrated circuits in modern systems-on-chip and microprocessors are typically operated with sufficient timing margins to mitigate the impact of rising process, voltage, and temperature (PVT) variations at advanced process nodes. The widening margins required for ensuring robust computation inevitably lead to conservative designs with unacceptable energy-efficiency overheads. Reconciling the conflicting objectives imposed by variation mitigation and energy-efficient computing will require fundamental departures from conventional circuit and system design practices. This paper posits error-resilient general-purpose computing as an effective approach for achieving this. We review resilient techniques that exploit tolerance to timing errors to automatically compensate for variations and dynamically tune a system to its most efficient operating point. We present the Razor approach as a pioneering example of such a technique. We present silicon measurement results from multiple industrial and academic demonstration systems that employ Razor dynamic voltage and frequency management. In particular, we highlight the application of Razor to two specific platforms. The first is an ARM-based industrial prototype where Razor dynamic adaptation leads to 52% energy savings at 1 GHz operation. The second platform applies Razor for robust operation in the presence of radiation-induced Single Event Upsets. These efforts clearly demonstrate how energy-efficient compute engines can be designed by combining timing-error resiliency with optimizations across algorithms, circuits, and microarchitecture boundaries.
Augmented reality (AR) aims to implement applications, requiring high performance, while consuming low power on an all-day wearable, small form-factor, device. Luckily, many AR applications such as neural networks are...
详细信息
ISBN:
(纸本)9781665401449
Augmented reality (AR) aims to implement applications, requiring high performance, while consuming low power on an all-day wearable, small form-factor, device. Luckily, many AR applications such as neural networks are error-resilient (i.e., results are same with errors in computation or memory), providing an opportunity to utilize low-power circuit techniques when implementing their building blocks in hardware. Many of these neural networks require significant use of on-chip memory such as SRAM (a major building block in hardware accelerators) for weight storage. This work shows that up to 30% dynamic energy and 30% leakage energy savings can be achieved by reducing the supply voltage of these SRAMs beyond rated voltages (thus, introducing errors), without measurable loss in neural network accuracy. Additional energy saving opportunities (up to 6%) can be captured by circuit modifications to shape the error probabilities of SRAMs at low voltages and incrementally training the neural networks.
Arithmetic units inspired by approximate computations have seen a significant development in error-resilient applications, wherein accuracy can be traded off for enhanced performance. Most of the existing literature p...
详细信息
Arithmetic units inspired by approximate computations have seen a significant development in error-resilient applications, wherein accuracy can be traded off for enhanced performance. Most of the existing literature pertaining to approximate computations targets ASIC platforms. In this paper, we focus on exploiting the features of approximate computation to design efficient digital hardware for FPGA platforms. Specifically, we propose an FPGA implementation of an approximate multiplier unit based on the CORDIC algorithm. Contemporary FPGA-based approximate multiplier implementations report a lot of compromise in accuracy and a relatively higher implementation cost in terms of utilized resources, timing, and energy. We conduct a detailed Pareto analysis to determine the number of optimal computing stages for the proposed CORDIC-based approximate multiplier that justifies the accuracy-performance trade-offs. More importantly, we focus on the optimal logic distribution of the proposed multiplier circuit by restructuring the top-level Boolean network and translating it into a circuit netlist that can be efficiently mapped onto the inherent FPGA fabric of LUTs and Carry4 primitives. Our CORDIC-based implementations significantly improve the accuracy metrics while maintaining a suitable performance trade-off. The efficacy of our proposed multiplier is tested using two image-processing applications, namely, image blending and image smoothening. The obtained results show a substantial improvement over the existing state-of-the-art approximate multipliers.
暂无评论