Emerging computation-in-memory (CIM) paradigm offers processing and storage of data at the same physical location, thus alleviating critical memory-processor communication bottlenecks suffered by conventional von-Neum...
详细信息
Emerging computation-in-memory (CIM) paradigm offers processing and storage of data at the same physical location, thus alleviating critical memory-processor communication bottlenecks suffered by conventional von-Neumann architecture. Storage of data in a CIM architecture is analog in nature and therefore computation is performed in analog domain i.e. inputs and outputs are analog values. Since the outside computing environment is digital, analog-to-digital converters (ADC) are utilized to perform the output data conversion. However, ADC designs are bulky, power-hungry circuits that are prone to design variations and therefore, play an important role in determining the computing efficiency of CIM architectures. In this paper, we present a scalable and reliable integrate and fire circuit ADC (SRIF-ADC) design for CIM architectures, suitable for stringent power and area constraints. We devise a technique to stabilize the node receiving analog inputs that allows more rows to be activated at the same time, thereby increasing the operand size of input vectors. This allows better scalability in terms of higher parallelism of operations. We employ a self-timed variation-aware design approach and design measures to drastically reduce read disturb of memristor devices that address reliability issues related to the ADC design. In addition, we present a compact, built-in sample-and-hold circuit to replace the large-sized capacitance and built-in weighting technique to alleviate the need for post-processing. For multiply-and-accumulate (MAC) operation, our simulation results show that we can improve the computational parallelism by 3X as well as ADC conversion speed and energy efficiency are improved by 2X and 11.6X, respectively, compared to the state-of-the-art design.
Technological and architectural improvements have been constantly required to sustain the demand of faster and cheaper computers. However, CMOS down-scaling is suffering from three technology walls: leakage wall, reli...
详细信息
Technological and architectural improvements have been constantly required to sustain the demand of faster and cheaper computers. However, CMOS down-scaling is suffering from three technology walls: leakage wall, reliability wall, and cost wall. On top of that, a performance increase due to architectural improvements is also gradually saturating due to three well-known architecture walls: memory wall, power wall, and instruction-level parallelism (ILP) wall. Hence, a lot of research is focusing on proposing and developing new technologies and architectures. In this article, we present a comprehensive classification of memory-centric computing architectures;it is based on three metrics: computation location, level of parallelism, and used memory technology. The classification not only provides an overview of existing architectures with their pros and cons but also unifies the terminology that uniquely identifies these architectures and highlights the potential future architectures that can be further explored. Hence, it sets up a direction for future research in the field.
Memristive devices based on the Valence Change Mechanism (VCM) are promising devices for storage class memory, neuromorphic computing and logic-in-memory (LIM) applications. They are suited for such a wide range of ap...
详细信息
This paper proposes fast and small-area FeFET-based voltage-sensing analog computation-in-memory (CiM) for hyperdimensional computing (HDC) by eliminating large-scale digital circuit overhead. In both training and inf...
详细信息
ISBN:
(数字)9781665484855
ISBN:
(纸本)9781665484862
This paper proposes fast and small-area FeFET-based voltage-sensing analog computation-in-memory (CiM) for hyperdimensional computing (HDC) by eliminating large-scale digital circuit overhead. In both training and inference of HDC, MAP (bit-wise XOR, bit-wise majority rule, and 1-bit shift) of hypervectors (HVs) is operated by Partially added Text HV FeFET CiM and Text HV FeFET. In inference, Similarity search FeFET CiM obtains classification result. By taking an example of Language classification problem, the proposed voltage-sensing FeFET CiM for HDC encodes HV in training phase by 5,000 times faster and smaller area than the conventional method.
The state-of-the-art computing systems based on traditional von-Neumann architectures are facing von-Neumann bottlenecks(VNB), which has large impact on computing speed and energy consumption of current computing syst...
详细信息
ISBN:
(纸本)9781728193694
The state-of-the-art computing systems based on traditional von-Neumann architectures are facing von-Neumann bottlenecks(VNB), which has large impact on computing speed and energy consumption of current computing system for big-data applications. computation-in-memory (CIM) architecture is prominent candidate for computation to break VNB. In this paper, we propose a 9T (P9T) SRAM bit-cell that is energy efficient and at the same time, it enhances the read/write margin. The simulation result shows that the P9T SRAM design increases Read SNM (RSNM), Write SNM (WSNM), Dynamic noise margin (DNM) and I-on/I-off by 25.44%, 19.44%, 56.41% and 102.5%, respectively at 1.8V supply voltage over Conventional 8T (C8T) SRAM cell in a 180nm CMOS technology. The P9T SRAM design decrease read and write energy per operation by 60.85% and 22.67%, respectively over the C8T SRAM bit-cell. Finally for illustration of beyond von-Neumann computation, the In-memory Boolean computation (IMBC) operation has been demonstrated using P9T SRAM cell, wbieb shows energy efficieney improvement of 28% over IMBC operation in C8T SRAM bit-cell.
Memristive devices have been developed initially for memories, where they function as a non-volatile memory element. While the different kinds of memristive devices have their own operational as well as reliability co...
详细信息
ISBN:
(纸本)9781538695043
Memristive devices have been developed initially for memories, where they function as a non-volatile memory element. While the different kinds of memristive devices have their own operational as well as reliability concerns, the specifications are clearly set by the memory application (e. g., embedded, standalone, or storage-class type). More recently, memristive devices became of high interest for enabling new hardware concepts for a variety of neuromorphic systems, ranging from machine learning, computation-in-memory, to full brain emulating systems. These applications are based on different functionalities of the memristive devices (as e. g. analog multi-level programming), and set different - but often not yet fully explored-reliability requirements. Interestingly, however, some of these neuromorphic circuits are more resilient to device failure, while major memory reliability threats as stochasticity, variability and noise even may become assets for building self-learning and predictive systems.
The infamous memory-processor bottleneck has motivated the search for logic-in-memory architectures. In this paper, we demonstrate how the transitive closure problem can be solved through in-memory computing within a ...
详细信息
ISBN:
(纸本)9781450357999
The infamous memory-processor bottleneck has motivated the search for logic-in-memory architectures. In this paper, we demonstrate how the transitive closure problem can be solved through in-memory computing within a 3D crosspoint memory. The proposed architecture requires only two layers of 1-diode 1-resistor (1D1R) interconnects and external feedback loops.
Conventional content addressable memory (BCAM and TCAM) uses specialized 10T/16T bit cells that are significantly larger than 6T SRAM cells. A new BCAM/TCAM is proposed that can operate with standard push-rule 6T SRAM...
详细信息
Conventional content addressable memory (BCAM and TCAM) uses specialized 10T/16T bit cells that are significantly larger than 6T SRAM cells. A new BCAM/TCAM is proposed that can operate with standard push-rule 6T SRAM cells, reducing array area by 2-5x and allowing reconfiguration of the SRAM as a CAM. In this way, chip area and overall capacitance can be reduced, leading to higher energy efficiency for search operations. In addition, the configurable memory can perform bit-wise logical operations: "AND" and "NOR" on two or more words stored within the array. Thus, the configurable memory with CAM and logical function capability can be used to off-load specific computational operations to the memory, improving system performance and efficiency. Using a 6T 28 nm FDSOI SRAM bit cell, the 64x64 (4 kb) BCAM achieves 370 MHz at 1 V and consumes 0.6 fJ/search/bit. A logical operation between two 64 bit words achieves 787 MHz at 1 V.
暂无评论