Bias-scalable analogcomputing is attractive for implementing machine learning (ML) processors with distinct power-performance specifications. For instance, ML implementations for server workloads are focused on highe...
详细信息
Bias-scalable analogcomputing is attractive for implementing machine learning (ML) processors with distinct power-performance specifications. For instance, ML implementations for server workloads are focused on higher computational throughput for faster training, whereas ML implementations for edge devices are focused on energy-efficient inference. In this paper, we demonstrate the implementation of bias-scalable approximateanalogcomputing circuits using the generalization of the margin-propagation principle called shape-based analogcomputing (S-AC). The resulting S-AC core integrates several near-memory compute elements, which include: (a) non-linear activation functions;(b) inner-product compute circuits;and (c) a mixed-signal compressive memory, all of which can be scaled for performance or power while preserving its functionality. Using measured results from prototypes fabricated in a 180nm CMOS process, we demonstrate that the performance of computing modules remains robust to transistor biasing and variations in temperature. In this paper, we also demonstrate the effect of bias-scalability and computational accuracy on a simple ML regression task.
This paper presents a current-domain compute-in-memory (CIM) architecture for acceleration of Artificial Intelligence (AI) edge inferencing. A novel multiply-and-accumulate (MAC) scheme is introduced by exploiting the...
详细信息
ISBN:
(纸本)9798350304206
This paper presents a current-domain compute-in-memory (CIM) architecture for acceleration of Artificial Intelligence (AI) edge inferencing. A novel multiply-and-accumulate (MAC) scheme is introduced by exploiting the R-2R resistor ladder as a binary-weighted current recombiner. The area and power of the proposed scheme scale linearly as numerical precision increases for both input activation and weight. Computation latency is maintained single cycle. A prototype in 22nm FD-SOI CMOS process achieves 2.2ns system latency, 56TOPS/W energy efficiency and 4TOPS/mm(2) area efficiency with 6-bit input activation and 8-bit weight.
暂无评论