检索结果-内蒙古大学图书馆

Programming Weights to analog in-memory computing Cores by Direct Minimization of the Matrix-Vector Multiplication Error

引用

IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS 2023年第4期13卷 1052-1061页

作者： Buechel, Julian Vasilopoulos, Athanasios Kersting, Benedikt Lammie, Corey Brew, Kevin Philip, Timothy Saulnier, Nicole Narayanan, Vijay Le Gallo, Manuel Sebastian, Abu IBM Res Zurich CH-8803 Ruschlikon Switzerland IBM Res Albany Albany NY 12203 USA IBM Res Yorktwon Hts Yorktwon Hts NY 10598 USA

Accurate programming of non-volatile memory (NVM) devices in analog in-memory computing (AIMC) cores is critical to achieve high matrix-vector multiplication (MVM) accuracy during deep learning inference workloads. In this paper, we propose a novel programming approach that directly minimizes the MVM error by performing stochastic gradient descent optimization with synthetic random input data. The MVM error is significantly reduced compared to the conventional unit-cell by unit-cell iterative programming. We demonstrate that the optimal hyperparameters in our method are agnostic to the weights being programmed, enabling large-scale deployment across multiple AIMC cores without further fine tuning. It also eliminates the need for high-resolution analog to digital converters (ADCs) to decipher the small unit-cell conductance during programming. We experimentally validate this approach by demonstrating an inference accuracy increase of 1.26% on ResNet-9. The experiments were performed using phase change memory (PCM)-based AIMC cores fabricated in 14nm CMOS technology.

关键词： Programming Economic indicators System-on-chip Performance evaluation Hardware Phase change materials Voltage analog in-memory computing device programming neural network inference

来源：评论

学校读者我要写书评

暂无评论

Triple-Tail Common-Mode Insensitive High-Speed Dynamic Comparator for analog in-memory computing Architectures 56

Triple-Tail Common-Mode Insensitive High-Speed Dynamic Compa...

引用

56th IEEE International Symposium on Circuits and Systems (ISCAS)

作者： Krishna, Komala Rashid, Ria Nambath, Nandakumar Indian Inst Technol Goa Sch Elect Sci Farmagudi Goa India

ISBN: (纸本)9781665451093

analog in-memory computing architectures demand high-speed analog-to-digital converters, for which a dynamic comparator is a crucial building block. Speed and commonmode insensitivity are the critical features of such dynamic comparators. Most of the reported dynamic comparators achieve high speed only for a narrow range of the input commonmode voltages. The comparators' performance degrades at the extremities of common-mode voltages. We propose a commonmode insensitive cascode cross-coupled dynamic comparator to overcome this drawback. The proposed comparator is designed, simulated, and compared with the state-of-the-art techniques in a 65nm CMOS technology. At 1.1V supply voltage, the proposed comparator shows a delay of 37 ps when the input difference is 10mV with a common-mode voltage of 400mV.

关键词： Dynamic comparator analog in-memory computing high-speed cascode cross-coupled pair

来源：评论

学校读者我要写书评

暂无评论

Time-domain Subtractive Readout Scheme for Scalable Capacitive analog in-memory computing 36

Time-domain Subtractive Readout Scheme for Scalable Capaciti...

引用

36th IEEE International System-on-Chip Conference (SOCC)

作者： Oshio, Reon Kuwahara, Takumi Kimura, Mutsumi Nakashima, Yasuhiko Nara Inst Sci & Technol NAIST Ikoma Nara Japan Ryukoku Univ Otsu Shiga Japan

ISBN: (纸本)9798350300116

In-memory computing is a promising architecture to meet the exploding demand for data-intensive workloads, including deep neural networks. In particular, analog in-memory computings (AIMCs) is a promising way to build matrix multiplication accelerators that take full advantage of data parallelism and reusability. However, most AIMCs use voltage readout circuits that have no benefit from CMOS scaling, which is an obstacle to improving computational density. We propose a method that combines capacitive AIMC and readout with near-memory time-subtraction, which is theoretically scalable concerning miniaturization and row/column parallelism and is adjustable with output resolution. We have evaluated the signed multi-bit dot product operation in post-layout simulation using circuits designed with a 180-nm process. Even with x 16 increase in row-parallelism (9 to 144), the time resolution required for readout was successfully reduced to a variation of 0.39%.

关键词： analog in-memory computing Capacitive In-memory computing Time-domain computing

来源：评论

学校读者我要写书评

暂无评论

Time-Multiplexed Flash ADC for Deep Neural Network analog in-memory computing 28

Time-Multiplexed Flash ADC for Deep Neural Network Analog in...

引用

28th IEEE International Conference on Electronics, Circuits, and Systems (IEEE ICECS)

作者： Boni, Andrea Frattini, Francesco Caselli, Michele Univ Parma Dept Engn & Architecture Parma Italy IMEC Leuven Belgium

ISBN: (纸本)9781728182810

This paper presents a Flash A/D converter to be integrated at the periphery of mixed-signal computing memories for convolutional neural networks. We investigate the feasibility of a true time-multiplexing, which allows to greatly relax the ADC requirements of area and aspect ratio, without sacrificing the data throughput of the memory array. The ADC, based on a strong-arm latched comparator combining built-in reference generation, body bias, and offset calibration, exhibits 29.8-dB SNDR at 3.2 GS/s with 1.5-mW power consumption, and a silicon area of 900 mu m(2). Integrated with the memory array, the converter enables up to 32-to-1 column multiplexing with 20 ns of A/D conversion latency.

关键词： analog in-memory computing AiMC Deep Neural Networks analog-to-Digital Converters Flash ADC

来源：评论

学校读者我要写书评

暂无评论

RRAM-based analog in-memory computing

RRAM-based Analog In-Memory Computing

引用

IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)

作者： Chen, Xiaoming Song, Tao Han, Yinhe Chinese Acad Sci Ctr Intelligent Comp Syst Inst Comp Technol Univ Chinese Acad Sci Beijing Peoples R China

ISBN: (纸本)9781665409599

Despite resistive random-access memories (RRAMs) have the ability of analog in-memory computing and they can be utilized to accelerate some applications (e.g., neural networks), the analog-digital interface consumes considerable overhead and may even counteract the benefits brought by RRAM-based in-memory computing. In this paper, we introduce how to reduce or eliminate the overhead of the analog-digital interface in RRAM-based neural network accelerators and linear solver accelerators. In the former, we create an analog inference flow and introduce a new methodology to accelerate the entire analog flow by using resistive content-addressable memories (RCAMs). Redundant analog-to-digital conversions are eliminated. In the latter, we provide an approach to map classical iterative solvers onto RRAM-based crossbar arrays such that the hardware can get the solution in O(1) time complexity without actual iterations, and thus, intermediate analog-to-digital conversions and digital-to-analog conversions are completely eliminated. Simulation results have proven the superiorities in the performance and energy efficiency of our approaches. The accuracy problem of RRAM-based analog computing will be a future research focus.

关键词： Resistive random-access memory neural network acceleration linear solver analog in-memory computing

来源：评论

学校读者我要写书评

暂无评论

A Fully Integrated System-on-Chip Design with Scalable Resistive Random-Access memory Tile Design for analog in-memory computing

ADVANCED INTELLIGENT SYSTEMS

引用

ADVANCED INTELLIGENT SYSTEMS 2022年第8期4卷

作者： Cai, Fuxi Yen, She-Hwa Uppala, Apurva Thomas, Luke Liu, Tianchi Fu, Peter Zhang, Xiaofeng Low, Ambrose Kamalanathan, Deepak Hsu, Joe Ayyagari-Sangamalli, Buvna Appl Mat Inc Corp Strategy & Dev Design Technol 3325 Scott Blvd Santa Clara CA 95054 USA Appl Mat Inc Semicond Prod Grp Special Devices R&D 3325 Scott Blvd Santa Clara CA 95054 USA

As the demands of big data applications and deep learning continue to rise, the industry is increasingly looking to artificial intelligence (AI) accelerators. analog in-memory computing (AiMC) with emerging nonvolatile devices enable good hardware solutions, due to its high energy efficiency in accelerating the multiply-and-accumulation (MAC) operation. Herein, an Applied Materials custom-designed system-on-chip (SoC) targeting AI applications with analog in-memory computing using resistive random-access memory (ReRAM) as the compute element is demonstrated. The first silicon achieves high energy efficiency in MAC operations. This chip is implemented with LeNet-1 neural network on ReRAM tiles and demonstrated by Modified National Institute of Standards and Technology (MNIST) classification with accuracy matching that predicted in the simulations. A simulation framework, AI Sim, is also developed to evaluate the system performance for large-scale application and guide the bitcell development and design choices.

关键词： analog in-memory computing one-transistor-one-resistor resistive random-access memory

来源：评论

学校读者我要写书评

暂无评论

analog or Digital In-memory computing? Benchmarking through Quantitative Modeling 42

Analog or Digital In-memory Computing? Benchmarking through ...

引用

42nd IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

作者： Sun, Jiacong Houshmand, Pouya Verhelst, Marian Katholieke Univ Leuven MICAS Leuven Belgium

ISBN: (纸本)9798350322255

In-memory computing (IMC) has emerged as a promising paradigm for energy-efficient, throughput-efficient and area-efficient machine learning at the edge. However, the differences in hardware architectures, array dimensions, and fabrication technologies among published IMC realizations have made it difficult to grasp their relative strengths. Moreover, previous studies have primarily focused on exploring and benchmarking the peak performance of a single IMC macro rather than full system performance on real workloads. This paper aims to address the lack of a quantitative comparison of analog in-memory computing (AIMC) and Digital In-memory computing (DIMC) processor architectures. We propose an analytical IMC performance model that is validated against published implementations and integrated into a system-level exploration framework for comprehensive performance assessments on different workloads with varying IMC configurations. Our experiments show that while DIMC generally has higher computational density than AIMC, AIMC with large macro sizes may have better energy efficiency than DIMC on convolutional-layers and pointwise-layers, which can exploit high spatial unrolling. On the other hand, DIMC with small macro size outperforms AIMC on depthwise-layers, which feature limited spatial unrolling opportunities inside a macro.

关键词： Machine learning quantitative modeling analog in-memory computing digital in-memory computing

来源：评论

学校读者我要写书评

暂无评论

CiMBA: Accelerating Genome Sequencing Through On-Device Basecalling via Compute-in-memory

引用

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2025年第6期36卷 1130-1145页

作者： Simon, William Andrew Boybat, Irem Kodra, Riselda Ferro, Elena Singh, Gagandeep Alser, Mohammed Jain, Shubham Tsai, Hsinyu Burr, Geoffrey W. Mutlu, Onur Sebastian, Abu Int Business Machines IBM CH-8803 Ruschlikon Switzerland Swiss Fed Inst Technol CH-1015 Lausanne Switzerland Adv Micro Devices AMD Santa Clara CA 95054 USA Georgia State Univ Atlanta GA 30302 USA Int Business Machines IBM Yorktown Hts NY 10598 USA Int Business Machines IBM Almaden San Jose CA 95120 USA Swiss Fed Inst Technol CH-8092 Zurich Switzerland

As genome sequencing is finding utility in a wide variety of domains beyond the confines of traditional medical settings, its computational pipeline faces two significant challenges. First, the creation of up to 0.5 GB of data per minute imposes substantial communication and storage overheads. Second, the sequencing pipeline is bottlenecked at the basecalling step, consuming >40% of genome analysis time. A range of proposals have attempted to address these challenges, with limited success. We propose to address these challenges with a Compute-in-memory Basecalling Accelerator (CiMBA), the first embedded (similar to 25 mm(2)) accelerator capable of real-time, on-device basecalling, coupled with analog (AL)-Dorado, a new family of analog focused basecalling DNNs. Our resulting hardware/software co-design greatly reduces data communication overhead, is capable of a throughput of 4.77 million bases per second, 24x that required for real-time operation, and achieves 17 x /27x power/area efficiency over the best prior basecalling embedded accelerator while maintaining a high accuracy comparable to state-of-the-art software basecallers.

关键词： Sequential analysis Accuracy Genomics Bioinformatics Real-time systems Decoding Pipelines Graphics processing units Computer architecture Nonvolatile memory Genome sequencing analog in-memory computing edge computing

来源：评论

学校读者我要写书评

暂无评论

Device Specifications for Neural Network Training with analog Resistive Cross-Point Arrays Using Tiki-Taka Algorithms

ADVANCED INTELLIGENT SYSTEMS

引用

ADVANCED INTELLIGENT SYSTEMS 2025年第5期7卷

作者： Byun, Jinho Kim, Seungkun Kim, Doyoon Lee, Jimin Ji, Wonjae Kim, Seyoung Pohang Univ Sci & Technol POSTECH Dept Mat Sci & Engn Pohang 37673 South Korea

Recently, specialized training algorithms for analog cross-point array-based neural network accelerators have been introduced to counteract device non-idealities such as update asymmetry and cycle-to-cycle variation, achieving software-level performance in neural network training. However, a quantitative analysis of how these algorithms affect the relaxation of device specifications is yet to be conducted. This study provides a detailed analysis by elucidating the device prerequisites for training with the Tiki-Taka algorithm versions 1 (TTv1) and 2 (TTv2), which leverage the dynamics between multiple arrays to compensate for device non-idealities. A multiparameter simulation is conducted to assess the impact of device non-idealities, including asymmetry, retention, number of pulses, and cycle-to-cycle variation, on neural network training. Using pattern-recognition accuracy as a performance metric, the required device specifications for each algorithm are revealed. The results demonstrate that the standard stochastic gradient descent algorithm requires stringent device specifications. Conversely, TTv2 permits more lenient device specifications than the TTv1 across all examined non-idealities. The analysis provides guidelines for the development, optimization, and utilization of devices for high-performance neural network training using Tiki-Taka algorithms. This study investigates the device specifications required for neural network training using analog resistive cross-point arrays with the training algorithms. By demonstrating the robustness against non-ideal update characteristics with these algorithms, it quantitatively shows how hardware-aware training can relax device specifications. It could pave the way for successful implementation of analog deep learning accelerators with actual *** (c) 2024 WILEY-VCH GmbH

关键词： analog in-memory computing deep learning accelerator device specification neural network Tiki-Taka algorithm

来源：评论

学校读者我要写书评

暂无评论

Effect of Gamma Radiation on TaOx ECRAM

引用

IEEE TRANSACTIONS ON NUCLEAR SCIENCE 2025年第4期72卷 1292-1302页

作者： Faruque, Hossain Mansur Resalat Bennett, Christopher H. Oh, Sangheon Jalbert, Andrew J. Zutter, Brian Siath, Max Neuendank, Jereme Spear, Matthew Xiao, T. Patrick Hughart, David R. Agarwal, Sapan Barnaby, Hugh J. Li, Yiyang Talin, A. Alec Marinella, Matthew J. Arizona State Univ Sch Elect Comp & Energy Engn Tempe AZ 85287 USA Sandia Natl Labs Albuquerque NM 87185 USA Sandia Natl Labs Livermore CA 94550 USA Univ Michigan Dept Mat Sci & Engn Ann Arbor MI 48109 USA

Electrochemical random access memory (ECRAM) is an emerging three-terminal nonvolatile memory (NVM) with highly controllable channel conductance which is promising for use as an analog memory (or synapse) in analog in-memory computing (IMC) systems. Energy-efficient analog IMC computing is particularly desirable for power-constrained, high-radiation environments such as satellites. However, little is known about the suitability of ECRAM for use in a total ionizing dose (TID) environment. This work investigates the effect of Co-60 gamma radiation on the channel conductance and noise-two properties critical for analog IMC systems-of a TaOx-based ECRAM up to 17.3 Mrad(SiO2) for both low- and high-channel-conductance state devices. A transient increase in conductance is observed in response to radiation which consists of two elements: an immediate increase in conductivity due to photocurrent and a secondary increase in conductivity, which has a slower rise and saturation and can persist for hours after exposure. This secondary, persistent photoconductivity is attributed to charging caused by hole trapping. These transient effects would not likely occur in a space environment due to the low dose rate compared with this experiment. No permanent change is found in the low conductance state (LCS) following exposure and the minor shift in the high conductance change would be less significant than the regular retention decay in this state. A permanent increase in the random telegraph noise is observed, possibly due to increased traps created in the channel. This work demonstrates that TaOx-based ECRAM is suitable for use in spaceborne analog IMC systems that are subject to significant TID.

关键词： Nonvolatile memory Logic gates Noise Ions Laboratories Temperature measurement Switches Total ionizing dose Thermal stability Photoconductivity analog in-memory computing electrochemical random access memory (ECRAM) nonvolatile memories (NVMs) persistent photoconductivity (PPC) total ionizing dose (TID)

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：