This work discusses network architecture, network training method, and a quantization method for achieving a low-energy and high-energy-efficient system, aiming at equipping robots and automobiles such as Autonomous M...
详细信息
This work discusses network architecture, network training method, and a quantization method for achieving a low-energy and high-energy-efficient system, aiming at equipping robots and automobiles such as Autonomous Mobile Robot (AMR) with computation-in-memory (CiM) and performing 3D object detection with low energy consumption. Augmented Point Cloud VoxelNet (APCVN) is a network that improves inference accuracy by allowing a slight increase in computational complexity. Multi-Stage Quantization Aware Training (MSQAT) and U-Quantization (UQ) are a learning method and a quantization strategy, respectively, that improve the quantization tolerance of APCVN. Furthermore, quantitative calculations are conducted to estimate ADC energy consumption per inference, operations per second, energy efficiency, memory array area, memory capacity, and latency per inference in the assumed CiM system. Results show that by applying proposed methods, ADC energy consumption per inference is reduced by 8.7% and energy efficiency is improved by 1.6 times while maintaining high inference accuracy.
In this work, we propose a 1T1R ReRAM CiM architecture for Hyperdimensional Computing (HDC). The number of Source Lines and Bit Lines is reduced by introducing memory cells that are connected in series, which is espec...
详细信息
In this work, we propose a 1T1R ReRAM CiM architecture for Hyperdimensional Computing (HDC). The number of Source Lines and Bit Lines is reduced by introducing memory cells that are connected in series, which is especially advantageous when using a 3D implementation. The results of CiM operations contain errors, but HDC is robust against them, so that even if the XNOR operation has an error of 25%, the inference accuracy remains above 90%.
computation-in-memory(CIM) chips offer an energy-efficient approach to artificial intelligence computing workloads. Resistive random-access memory(RRAM)-based CIM chips have proven to be a promising solution for overc...
详细信息
computation-in-memory(CIM) chips offer an energy-efficient approach to artificial intelligence computing workloads. Resistive random-access memory(RRAM)-based CIM chips have proven to be a promising solution for overcoming the von Neumann bottleneck. In this paper, we review our recent studies on the architecture-circuit-technology co-optimization of scalable CIM chips and related hardware demonstrations. To further minimize data movements between memory and computing units, architecture optimization methods have been introduced. Then, we propose a device-architecture-algorithm co-design simulator to provide guidelines for designing CIM systems. A physics-based compact RRAM model and an array-level analog computing model were embedded in the simulator. In addition, a CIM compiler was proposed to optimize the on-chip dataflow. Finally, research perspectives are proposed for future development.
Bit Flipping Key Encapsulation (BIKE) is a code-based key encapsulation mechanism that utilizes Quasi-Cyclic Medium Density Parity-Check (QC-MDPC) codes, which is a promising candidate in the National Institute of Sta...
详细信息
ISBN:
(纸本)9783031697654;9783031697661
Bit Flipping Key Encapsulation (BIKE) is a code-based key encapsulation mechanism that utilizes Quasi-Cyclic Medium Density Parity-Check (QC-MDPC) codes, which is a promising candidate in the National Institute of Standards and Technology (NIST) Post-Quantum Cryptography (PQC) standardization process. Polynomial multiplication calculation is the most critical operation in BIKE, which limits the speed of key generation and encapsulation. The high degree of the polynomial not only requires millions of computations but also necessitates complex memory access. To address this issue, we propose a computation-in-memory (CIM) based accelerator architecture for polynomial multiplication operations in BIKE. To minimize the size of the CIM core while maintaining high computational throughput, we propose a folded mapping strategy and a one-memory-multiple-NAND architecture. As a result, a 128x128 array is sufficient for the bike1 parameters, with zero padding at the higher part of the multiplicand. Furthermore, we introduce a data flow scheme that integrates carry-less multiplication and polynomial reduction operations to improve computational efficiency. The post-layout simulation results in 28nm CMOS technology show that our fastest configuration design occupies an area of approximately 1.37 mm(2) with a low power consumption of around 14.17 mW. Compared with state-of-the-art hardware implementations, our proposed design improves the speed of polynomial multiplication by approximately 2.5x.
Deep neural network (DNN) has recently attracted tremendous attention in various fields. But the computing operation requirement and the memory bottleneck limit the energy efficiency of hardware implementations. Binar...
详细信息
Deep neural network (DNN) has recently attracted tremendous attention in various fields. But the computing operation requirement and the memory bottleneck limit the energy efficiency of hardware implementations. Binary quantization is proposed to relieve the pressure of hardware design. And the Computing-In-memory (CIM) is regarded as a promising method to resolve the memory wall challenge. However, the binary computing paradigm is mismatched with the CIM scheme, which incurs complex circuits and peripheral to realize binary operation in previous works. To overcome previous issues, this work presents Binary Representation computation-in-memory (BR-CIM) with several key features. (1) A lightweight computation unit is realized within the 6T SRAM array to accelerate binary computing and enlarge signal margin;(2) The reconfigurable computing scheme and mapping method support extendable bit precision to satisfy the accuracy requirement of various applications;(3) Simultaneous computing and weight loading is supported by column circuitry, which shortens the data loading latency;Several experiments are conducted to estimate algorithm accuracy, the computing latency, and power consumption. The energy efficiency reaches up to 1280 TOPs/W for binary representation. And the algorithm accuracy achieves 97.82%/76.4% on MNIST/CIFAR-100 dataset.
For edge AI applications, this paper overviews neuromorphic computing with CiM, computation-in-memory with non-volatile memories. AI accelerators like CiM will be heterogeneously integrated with traditional processors...
详细信息
ISBN:
(纸本)9781665456722
For edge AI applications, this paper overviews neuromorphic computing with CiM, computation-in-memory with non-volatile memories. AI accelerators like CiM will be heterogeneously integrated with traditional processors such as CPUs. To extremely suppress energy of edge AI, the heterogeneous integration of sensors like event-based sensors and CiM is promising. Approximate Computing for a wide range of fields such as system-level, circuit-level and device-level resolves the memory trade-off. By tolerating some degree of device errors, the performance, energy and cost of CiM are improved. This paper covers neural networks such as Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) as well as event driven Spiking Neural Network (SNN) and Reservoir Computing.
This paper proposes a comprehensive computation-in-memory (CiM) simulation platform. The platform has the capability to emulate the degradation of inference accuracy caused by device non-ideality. In this paper, the e...
详细信息
This paper proposes a comprehensive computation-in-memory (CiM) simulation platform. The platform has the capability to emulate the degradation of inference accuracy caused by device non-ideality. In this paper, the effect of non-ideality by assuming non-volatile memory devices such as PRAM, ReRAM, MRAM, and FeFET in CiM were investigated. As the device is non-ideality, multi-level cell operation, conductance variation during verify-program, data retention error, sense amplifier offset, read current fluctuation, and device failure are considered. First, the acceptable amount of a single non-ideality was investigated. The results address that the conductance shift has much more of an adverse impact than conductance variation on inference accuracy in CiM. In the second place, interaction among multiple device non-idealities were also comprehensively investigated. The platform reveals paradoxical results that the inference accuracy of the combination of uniform shift and low conductance asymmetric error is higher than that of a single low conductance asymmetric error under specific conditions.
This paper proposes an approach to enhance the efficiency of computation-in-memory (CiM) enabled neural networks. The proposed methods involve partial quantization of learning and inference processes within the neural...
详细信息
This paper proposes an approach to enhance the efficiency of computation-in-memory (CiM) enabled neural networks. The proposed methods involve partial quantization of learning and inference processes within the neural network to increase the training and inference speed while reducing energy and memory consumption. The impact of quantization due to the usage of CiM is evaluated based on inference accuracy. The effect of non-idealities incurred due to the employment of different memories such as resistive random-access memory on the network accuracy is documented and reported. The results indicate that a certain quantization bit precision threshold is necessary for weights, input/output data, and gradients to maintain an acceptable inference accuracy level. Notably, the experiments demonstrate a modest degradation of approximately 2.8% in inference accuracy compared to the neural network trained without using computation-in-memory, this accuracy trade-off is accompanied by a substantial memory footprint improvement, with memory usage reductions of 62% and 93% during the training and inference phase respectively.
In this paper, domain specific ReRAM-based computation-in-memory (CiM) design for simulated annealing (SA) is proposed. This paper reveals that the influence of bit precision and memory cell errors of ReRAM CiM on the...
详细信息
ISBN:
(纸本)9781665484855
In this paper, domain specific ReRAM-based computation-in-memory (CiM) design for simulated annealing (SA) is proposed. This paper reveals that the influence of bit precision and memory cell errors of ReRAM CiM on the accuracy for SA depends on the domains of combinatorial optimization problems, such as Max-Cut and Knapsack problems. It is found that Max-Cut problem has smaller circuit structure and is 3-bit higher tolerant of bit precision, but 4% lower bit-error rate (BER) tolerant, compared with Knapsack problem. In this paper, considering the requirements of bit precision and BER from each domain, examples of case studies and design strategy are presented such that ReRAM CiMs are best optimized in terms of reliability and array area of memory cells. Furthermore, the proposed best-optimized ReRAM CiM for Max-Cut problem improves the quality of SA by introducing approximate answers and avoiding local minimum.
Non-idealities of non-volatile (NVM) memory devices, DAC, ADC, and bit/word-line (BL/WL) failures of computation-in-memory (CiM) are investigated by a proposed comprehensive simulation platform. Quantization, variatio...
详细信息
ISBN:
(数字)9781665459792
ISBN:
(纸本)9781665459792
Non-idealities of non-volatile (NVM) memory devices, DAC, ADC, and bit/word-line (BL/WL) failures of computation-in-memory (CiM) are investigated by a proposed comprehensive simulation platform. Quantization, variation, shift, random defects, and short circuit are applied to weights of convolution layers and input/output data to analyze the non-idealities of elements in CiM. Non-idealities of NVM device assume multi-level cell operation, conductance variation, retention error, device defects, and short/open contact. DAC/ADC non-idealities assume noise, offset, and circuit defects. BL/WL non-idealities assume fabrication error and short-circuits of adjacent lines. The items to be focused on are identified that ADC/DAC noise, data retention of memory devices, and word-line fabrication yield are crucial.
暂无评论