Because of the impressive performance and success of artificial intelligence (AI)-based applications, filters as a primary part of digital signal processing systems are widely used, especially finite impulse response ...
详细信息
ISBN:
(纸本)9781665494663
Because of the impressive performance and success of artificial intelligence (AI)-based applications, filters as a primary part of digital signal processing systems are widely used, especially finite impulse response (FIR) filtering. Although they offer several advantages, such as stability, they are computationally intensive. Hence, in this paper, we propose a systematic methodology to efficiently implement computing in-memory (CIM) accelerators for FIR filters using various CMOS and post-CMOS technologies, referred to as ReFACE. ReFACE leverages a residue number system (RNS) to speed up the essential operations of digital filters, instead of traditional arithmetic implementation that suffers from the inevitable lengthy carry propagation chain. Moreover, the CIM architecture eliminates the off-chip data transfer by leveraging the maximum internal bandwidth of memory chips to realize a local and parallel computation on small residues independently. Taking advantage of both RNS and CIM results in significant power and latency reduction. As a proof-of-concept, ReFACE is leveraged to implement a 4-tap RNS FIR. The simulation results verified its superior performance with up to 85 x and 12x improvement in energy consumption and execution time, respectively, compared with an ASIC accelerator.
Various emerging applications such as internet-of-things (IoTs), wearable devices, and neural networks has imposed various challenges on memory design. Conventional memories such as static random access memory (SRAM),...
详细信息
ISBN:
(纸本)9781728124780
Various emerging applications such as internet-of-things (IoTs), wearable devices, and neural networks has imposed various challenges on memory design. Conventional memories such as static random access memory (SRAM), dynamic random access memory (DRAM), and flash memory fail to meet many of the requirements in the above applications. Particularly, power is one of the topmost design metrics to be considered. This paper will briefly introduce the recent trend in memory design for various emerging applications.
Deep Neural Networks (DNN) have emerged as a dominant algorithm for machine learning (ML). High performance and extreme energy efficiency are critical for deployments of DNN, especially in mobile platforms such as aut...
详细信息
ISBN:
(纸本)9781450362528
Deep Neural Networks (DNN) have emerged as a dominant algorithm for machine learning (ML). High performance and extreme energy efficiency are critical for deployments of DNN, especially in mobile platforms such as autonomous vehicles, cameras, and other devices of internet of things. However, DNNs lead to massive data movement and memory accesses, which prevents it from being integrated into always-on Internet-of-Things (IoT) devices. Recently, computing in-memory (CIM) architectures embeds analog computation circuits in/near the memory arrays. It significantly reduces data movement energy. This paper summarizes the most recent novel methods on the CIM architectures based on the time-domain computation. Compared with voltage-domain and frequency-domain analog computing method, time-domain computation provides more flexibility, higher accuracy and greater scalability for larger neural networks. Thereafter, the first in-memory binary weight network (BWN) processor based on pulse-width modulation in which the feature is stored in memory is also presented. This work significantly reduces memory accesses (4x), and achieves state-of-the-art peak energy efficiency of 119.7TOPS/W.
This work proposes a new generic Single-cycle Compute-in-memory (CiM) Accelerator for matrix computation named SCiMA. SCiMA is developed on top of the existing commodity Spin-Orbit Torque Magnetic Random-Access memory...
详细信息
ISBN:
(纸本)9781665484855
This work proposes a new generic Single-cycle Compute-in-memory (CiM) Accelerator for matrix computation named SCiMA. SCiMA is developed on top of the existing commodity Spin-Orbit Torque Magnetic Random-Access memory chip. Every sub-array's peripherals are transformed to realize a full set of single-cycle 2- and 3-input in-memory bulk bitwise functions specifically designed to accelerate a wide variety of graph and matrix multiplication tasks. We explore SCiMA's efficiency by selecting a complex matrix processing operation, i.e., calculating determinant as an essential and under-explored application in the CiM domain. The cross-layer device-to-architecture simulation framework shows the presented platform can reduce energy consumption by 70.43% compared with the most recent CiM designs implemented with the same memory technology. SCiMA also achieves up to 2.5x speedup compared with current CiM platforms.
computing in-memory (CIM), which directly performs in-situ operations at memory, is one of the promising solutions to overcome von Neumann bottleneck. Previous researchers have proposed an 8T-SRAM-based CIM structure ...
详细信息
ISBN:
(纸本)9781665459716
computing in-memory (CIM), which directly performs in-situ operations at memory, is one of the promising solutions to overcome von Neumann bottleneck. Previous researchers have proposed an 8T-SRAM-based CIM structure to perform dot product (DP) computations by analog charging/discharging operations. However, CIM structure may suffer from variations and aging effects such as BTI and HCI, which threat the reliability of CIM operation results. In this paper, we propose an aging-aware CIM operation framework which consists of an aging detection method and an aging tolerance technique. Specifically, we apply Dynamic Voltage Scaling (DVS) on affected CIM structure to compensate the current drop due to variations and aging effects. Experimental results show that our method can successfully calibrate dropped current and thus maintain the reliability of CIM operations.
暂无评论