Data transfer between a processor and memory frequently represents a bottleneck with respect to improving application-level performance. computing in memory (CiM), where logic and arithmetic operations are performed i...
详细信息
ISBN:
(纸本)9781450357043
Data transfer between a processor and memory frequently represents a bottleneck with respect to improving application-level performance. computing in memory (CiM), where logic and arithmetic operations are performed in memory, could significantly reduce both energy consumption and computational overheads associated with data transfer. Compact, low-power, and fast CiM designs could ultimately lead to improved application-level performance. This paper introduces a CiM architecture based on ferroelectric field effect transistors (FeFETs). The CiM design can serve as a general purpose, random access memory (RAM), and can also per-form Boolean operations ((N)AND, (N)OR, X(N)OR, INV) as well as addition (ADD) between words in memory. Unlike existing CiM designs based on other emerging technologies, FeFET-CiM accomplishes the aforementioned operations via a single current reference in the sense amplifier, which leads to more compact designs and lower power. Furthermore, the high I-on/I-off ratio of FeFETs enables an inexpensive voltage-based sense scheme. Simulation-based case studies suggest that our FeFET-CiM can achieve speed-ups (and energy reduction) of similar to 119X (similar to 1.6X) and similar to 1.97X (similar to 1.5X) over ReRAM and STT-RAM CiM designs with respect to in-memory addition of 32-bit words. Furthermore, our approach offers an average speedup of similar to 2.5X and energy reduction of similar to 1.7X when compared to a conventional (not in-memory) approach across a wide range of benchmarks.
作者:
Wang, JinkaiBai, YiningWang, HongyuHao, ZuoleiWang, GuandaZhang, KunZhang, YouguangLv, WeifengZhang, YueBeihang Univ
Fert Beijing Inst Sch Comp Sci & Engn State Key Lab Software Dev Environm Beijing 100191 Peoples R China Beihang Univ
Fert Beijing Inst Sch Comp Sci & Engn MIIT Key Lab Spintron Beijing 100191 Peoples R China Beihang Univ
Fert Beijing Inst Sch Integrated Circuit Sci & Engn MIIT Key Lab Spintron Beijing 100191 Peoples R China Beihang Univ
Sch Comp Sci & Engn State Key Lab Software Dev Environm Beijing 100191 Peoples R China Beihang Univ
Res Inst Shenzhen Key Lab Data Vitalizat Smart City Shenzhen 518057 Peoples R China Beihang Univ
Hefei Innovat Res Inst Nanoelect Sci & Technol Ctr Hefei 230013 Peoples R China
computing in memory (CIM) is a promising candidate for high throughput and energy-efficient data-driven applications, which mitigates the well-known memory bottleneck in Von Neumann architecture. In this paper, we pre...
详细信息
computing in memory (CIM) is a promising candidate for high throughput and energy-efficient data-driven applications, which mitigates the well-known memory bottleneck in Von Neumann architecture. In this paper, we present a reconfigurable bit-serial operation using toggle spin-orbit torque magnetic random access memory (TSOT-MRAM) to perform the computation completely in the bit-cell array instead of in a peripheral circuit. This bit-serial CIM (BSCIM) scheme achieves higher throughput and energy efficiency in CIM. First, basic Boolean logic operations are realized by utilizing the feature of TSOT device. A bit-cell array that implements the bit-serial operation is then built to provide the communication between column and row necessary for arithmetic operations, such as the carry propagation of addition and multiplication. Finally, we analyze the reliability of BSCIM scheme and demonstrate the performance advantage by performing convolution operations for 28 x 28 handwritten digit images in a BSCIM architecture. The results show that the delay and energy of BSCIM architecture are respectively reduced by 1.16-5.49 times and 1.12-1.43 times compared with the existing digital CIM architectures. Besides, its throughput and energy efficiency are also enhanced to 51.2 GOPS and 9.9 TOPS/W respectively.
The data transfer bottleneck in Von Neumann architecture owing to the separation between processor and memory hinders the development of high-performance computing. The computing in memory (CIM) concept is widely cons...
详细信息
The data transfer bottleneck in Von Neumann architecture owing to the separation between processor and memory hinders the development of high-performance computing. The computing in memory (CIM) concept is widely considered as a promising solution for overcoming this issue. In this article, we present a time-domain CIM (TD-CIM) scheme using spintronics, which can be applied to construct the energy-efficient convolutional neural network (CNN). Basic Boolean logic operations are implemented through recording the bit-line output at different moments. A multi-addend addition mechanism is then introduced based on the TD-CIM circuit, which can eliminate the cascaded full adders. To further optimize the compatibility of TD-CIM circuit for CNN, we also propose a quantization method that transforms floating-point parameters of pre-trained CNN models into fixed-point parameters. Finally, we build a TD-CIM architecture integrating with a highly reconfigurable array of field-free spin-orbit torque magnetic random access memory (SOT-MRAM) and evaluate its benefits for the quantized CNN. By performing digit recognition with the MNIST dataset, we find that the delay and energy are respectively reduced by 1.22.7 times and 2.4x10(3) -1.1x10(4) times compared with STT-CIM and CRAM based on spintronic memory. Finally, the recognition accuracy can reach 98.65% and 91.11% on MNIST and CIFAR10, respectively.
In this article, four novel approximate full-adder (AXFA) circuits based on the emerging magnetic tunnel junction (MTJ) device is proposed. The proposed magnetic FAs (MFAs) offer full nonvolatility, low area, and cons...
详细信息
In this article, four novel approximate full-adder (AXFA) circuits based on the emerging magnetic tunnel junction (MTJ) device is proposed. The proposed magnetic FAs (MFAs) offer full nonvolatility, low area, and considerably lower energy consumption than their previous counterparts. Also, two of the proposed MFAs have the advantage of single event upset (SEU) tolerance. Simulation results reveal that the proposed designs offer over 50% energy efficiency in comparison with the considered previous fully nonvolatile MFAs. Using the proposed adders in the design of an approximate Gaussian filter, we showed that the filtered noisy images have almost the same results as the accurate Gaussian filter. The proposed MFAs have an accurate carry-out output and an approximate sum output with an error distance of 2.
The computing-in-memory (CiM) approach is a promising option for addressing the processor-memory data transfer bottleneck while performing data-intensive applications. In this letter, we present a novel CiM architectu...
详细信息
The computing-in-memory (CiM) approach is a promising option for addressing the processor-memory data transfer bottleneck while performing data-intensive applications. In this letter, we present a novel CiM architecture based on spin-transfer torque magnetic random-access memory, which can work in computing and memory modes. In this letter, two spintronic devices are considered per cell to store the main data and its complement to address the reliability concerns during the read operation, which also provides a fascinating ability for performing reliable Boolean operations (all basic functions), binary/ternary content-addressable memory search operation, and multi-input majority function. Since the developed architecture can perform bitwise xnor operations in one cycle, a resistive-based accumulator has been designed to perform multi-input majority production to improve the structure for implementing fast and low-cost binary neural networks (BNNs). To this end, multiplication, accumulation, and passing through the activation function are accomplished in three cycles. The simulation result of exploiting the architecture in the BNN application indicates 86%-98% lower power-delay product than existing architectures.
computing in memory (CIM) technique is being researched for process multiply-and-accumulate operation which is used in deep neural networks efficiently. Conventional CIM architecture only supports binary neural networ...
详细信息
ISBN:
(纸本)9781665401746
computing in memory (CIM) technique is being researched for process multiply-and-accumulate operation which is used in deep neural networks efficiently. Conventional CIM architecture only supports binary neural network which has lower accuracy. Some approaches use capacitor for multi bit operation. However, because capacitor has large size, the area efficiency of CIM macro is degraded. This paper proposes CIM macro structure that supports multi bit operation using parasitic capacitance of transistor. The proposed structure does not need additional capacitor and thus, it can achieve area efficient multi bit operation.
computing-in-memory (CIM) is a promising method to overcome the well-known "Von Neumann Bottleneck" with computation insides memory, especially in edge artificial intelligence (AI) devices. In this paper, we...
详细信息
ISBN:
(纸本)9781665419130
computing-in-memory (CIM) is a promising method to overcome the well-known "Von Neumann Bottleneck" with computation insides memory, especially in edge artificial intelligence (AI) devices. In this paper, we proposed a 40nm 1Mb Multi-Level NOR-Flash cell based CIM (MLFlash-CIM) architecture with hardware and software co-design. Modeling of proposed MLFlash-CIM was analyzed with the consideration of cell variation, number of activated cells, integral non-linear (INL) and differential non-linear (DNL) of input driver, and quantization error of readout circuits. We also proposed a multi-bit neural network mapping method with 1/n top values and an adaptive quantization scheme to improve the inference accuracy. When applied to a modified VGG-16 Network with 16 layers, the proposed MLFlash-CIM can achieve 92.73% inference accuracy under CIFAR-10 dataset. This CIM structure also achieved a peak throughput of 3.277 TOPS and an energy efficiency of 35.6 TOPS/W for 4-bit multiplication and accumulation (MAC) operations.
This paper proposes built-in self-test (BIST) and built-in self-repair (BISR) strategies for computing in memory (CIM), including a novel test method and two repair schemes. They all focus on mitigating the impacts of...
详细信息
ISBN:
(纸本)9798350396249
This paper proposes built-in self-test (BIST) and built-in self-repair (BISR) strategies for computing in memory (CIM), including a novel test method and two repair schemes. They all focus on mitigating the impacts of inherent and inevitable CIM inaccuracy on convolution neural networks (CNNs). Regarding the proposed BIST strategy, it exploits the distributive law to achieve at-speed CIM tests without storing testing vectors or golden results. Besides, it can assess the severity of the inherent inaccuracies among CIM bitlines instead of only offering a pass/fail outcome. In addition to BIST, we propose two BISR strategies. First, we propose to slightly offset the dynamic range of CIM outputs toward the negative side to create a margin for negative noises. By not cutting CIM outputs off at zero, negative noises are preserved to cancel out positive noises statistically, and accuracy impacts are mitigated. Second, we propose to remap the bitlines of CIM according to our BIST outcomes. Briefly speaking, we propose to map the least noisy bitlines to be the MSBs. This remapping can be done in the digital domain without touching the CIM internals. Experiments show that our proposed BIST and BISR strategies can restore CIM to less than 1% Top-1 accuracy loss with slight hardware overhead.
This paper uses STT-MRAM architecture to implement computing in memory (CIM). In traditional CIM architecture, sensing amplifier often uses a fixed current to be a reference current. It can't adapt different sensi...
详细信息
ISBN:
(纸本)9781728125015
This paper uses STT-MRAM architecture to implement computing in memory (CIM). In traditional CIM architecture, sensing amplifier often uses a fixed current to be a reference current. It can't adapt different sensing margin with different turn on word-line and will make the sensing output error, when turning on too many word-line. We present an automatic reference current by using extra memory area to calculate the midden of high and low current. This reference current will change with sensing margin. Finally, we can sense high and low within the 16 word-lines.
With Von-Neumann computing struggling to match the energy-efficiency of biological systems, there is pressing need to explore alternative computing models. Recent experimental studies have revealed that Resistive Rand...
详细信息
ISBN:
(纸本)9781538684771
With Von-Neumann computing struggling to match the energy-efficiency of biological systems, there is pressing need to explore alternative computing models. Recent experimental studies have revealed that Resistive Random Access memory (RRAM) is promising alternative for DRAM. Resistive crossbar arrays possess many promising features that can not only enable high-density and low-power storage but also non Von-Neumann compute models. Most recent works focus on dot product operation with RRAM crossbar arrays, and therefore are not flexible to implement various logical functions. We propose a low-power dynamic computing in memory system which can implement various functions in Sum of Product (SOP) form in RRAM crossbar array architecture. We evaluate the proposed technique by performing simulation over wide range of MCNC benchmarks. Simulation results show 1.42X and 20X latency improvement as well as 2.6X and 12.6X power saving compared to static [9] and MAGIC [10] computing in memory methods.
暂无评论