Computing-in-memory (CIM) architecture is a promising approach to breaking the bottleneck in von Neumann' architecture. To shed light on large matrix operations in flash-based CIM with ultrahigh bit density (4-5 b...
详细信息
Computing-in-memory (CIM) architecture is a promising approach to breaking the bottleneck in von Neumann' architecture. To shed light on large matrix operations in flash-based CIM with ultrahigh bit density (4-5 bit/cell), this work presents a novel incremental positive-negative step pulse programming (IPNPP) array programming scheme. The proposed scheme utilizes positive pulses for rough tuning and subsequent negative pulses for fine-tuning to cells' threshold voltages. By adopting the IPNPP scheme in 55-nm NOR flash CIM arrays, it is shown that the latency and power consumption could be lowered effectively. As for image dehazing of ultrahigh-resolution images, similar to 180.6-TOPS/W high energy efficiency with great accuracy and variation tolerability has been demonstrated successfully. Our results indicate that the IPNPP is effective for CIMs that require high precision and low power consumption.
FPGAs are able to support signal processing usually reserved for CPUs or GPUs. Complex algorithms, with extreme parallelism, can be implemented in FPGAs using single precision floating point. The FPGA provides very lo...
详细信息
ISBN:
(纸本)9781665423694
FPGAs are able to support signal processing usually reserved for CPUs or GPUs. Complex algorithms, with extreme parallelism, can be implemented in FPGAs using single precision floating point. The FPGA provides very low and deterministic latency and can operate in challenging embedded processing environments. This paper will detail implementation and performance of two representative algorithms, the QR Decomposition and FFT, as well describe the methods used to achieve high degrees of parallel processing, computed using single precision floating point numerical representation.
This article proposes an analog synapse-based neuromorphic on-chip training system that uses emerging indium gallium zinc oxide (IGZO) thin film transistor (TFT) synapse cells to store multi-bit states for deep neural...
详细信息
This article proposes an analog synapse-based neuromorphic on-chip training system that uses emerging indium gallium zinc oxide (IGZO) thin film transistor (TFT) synapse cells to store multi-bit states for deep neural networks (DNNs). IGZO TFT demonstrates extremely low leakage currents, preserving the charge stored in capacitors during prolonged training periods. The 6 transistor 1 capacitor (6T1C) structure, characterized by its symmetrical design and current sources configuration, achieves an average of 367 distinct states with high linearity, reflected by an R-2 value of 0.99 through a neuron circuit. By adjusting currents and capacitor sizes, the system effectively integrates currents from both individual synapses and the overall array. Additionally, the neuron circuit, implemented separately from the IGZO TFT synapse array, demonstrates an 8.95 effective number of bits (ENOB) in overall performance measurements. The neuron circuit and IGZO TFT array have areas of 7.2 and 10.2 mm(2), respectively. Using the proposed neuromorphic system with the 6T1C memory structure, we successfully conducted the first analog on-chip training with the last layer, achieving an accuracy of 97.1% on the MNIST dataset.
matrix computations are at the heart of scientific computing, especially in models involving large-scale linear systems. As the scale and complexity of the problems grow, energy-efficient matrix computation becomes cr...
详细信息
matrix computations are at the heart of scientific computing, especially in models involving large-scale linear systems. As the scale and complexity of the problems grow, energy-efficient matrix computation becomes critical in these applications. Meanwhile, the advantages of miniaturizing conventional digital electronic processors, predicted by the Dennard scaling, diminish in post-Moore's law era. Analogue photonic devices based on passive and high-throughput interconnects are becoming promising alternatives as next-generation energy-efficient computing units. However, the limited reconfigurability and precision of an analogue photonic computing device make it unsuitable for scientific computing applications. Here, we report a general-purpose analogue photonic matrix processing unit (MPU) based on coherent analogue photonic cores, which perform signed multiplications, with reconfigurability and memory provided by digital electronics. Combined with error management strategies, our photonic MPU can perform tasks conventionally dominated by floating-point digital processors, elevating analog photonic-based platforms toward scientific computing applications. We have experimentally demonstrated its feasibilities in a range of computing tasks, including matrix multiplication and inversion as well as solving finite-difference partial differential equations.
This paper proposes a high-precision analog compute-in-memory (CIM) neuromorphic system that adopts a nonvolatile electro-chemical random-access memory (ECRAM) to improve linearity, symmetry, and endurance of the syna...
详细信息
Different subtasks of an application usually have different computational, memory, and I/O requirements that result in different needs for computer capabilities. Thus, the more appropriate approach for both high perfo...
详细信息
Different subtasks of an application usually have different computational, memory, and I/O requirements that result in different needs for computer capabilities. Thus, the more appropriate approach for both high performance and simple programming model is designing a processor having multi-level instruction set architecture (ISA). This leads to high performance and minimum executable code size. Since the fundamental data structures for a wide variety of existing applications are scalar, vector, and matrix, our research Trident processor has three-level ISA executed on zero-,one-,and two- dimensional arrays of data. These levels are used to express a great amount of fine-grain data parallelism to a processor instead of the dynamical extraction by a complicated logic or statically with compilers. This reduces the design complexity and provides high-level programming interface to hardware. In this paper, the performance of Trident processor is evaluated on BLAS, which represent the kernel operations of many data parallel applications. We show that Trident processor proportionally reduces the number of clock cycles per floating-point operation by increasing the number of execution datapaths.
Background: Administration of a single physiological dose of 17beta-estradiol (E2:40 microg/kg) to the ovariectomized immature rat rapidly induces uterine growth and remodeling. The response is characterized by change...
详细信息
Background: Administration of a single physiological dose of 17beta-estradiol (E2:40 microg/kg) to the ovariectomized immature rat rapidly induces uterine growth and remodeling. The response is characterized by changes in endometrial stromal architecture during an inflammatory-like response that likely involves activated matrix-metalloproteinases (MMPs). While estrogen is known as an inducer of endometrial growth, its role in specific expression of MMP family members in vivo is poorly characterized. E2-induced changes in MMP-2, -3, -7, and -9 mRNA and protein expression were analyzed to survey regulation along an extended time course 0-72 hours post-treatment. Because E2 effects inflammatory-like changes that may alter MMP expression, we assessed changes in tissue levels of TNF-alpha and MCP-1, and we utilized dexamethasone (600 microg/kg) to better understand the role of inflammation on matrix remodeling. Methods: Ovariectomized 21 day-old female Sprague-Dawley rats were administered E2 and uterine tissues were extracted and prepared for transmission electron microscopy (TEM), mRNA extraction and real-time RT-PCR, protein extraction and Western blot, or gelatin zymography. In inhibitor studies, pretreatment compounds were administered prior to E2 and tissues were harvested at 4 hours post-hormone challenge. Results: Using a novel TEM method to quantitatively assess changes in stromal collagen density, we show that E2-induced matrix remodeling is rapid in onset (< 1 hour) and leads to a 70% reduction in collagen density by 4 hours. matrix remodeling is MMP-dependent, as pretreatment with batimastat ablates the hormone effect. MMP-3, -7, and -9 and inflammatory markers (TNF-alpha and MCP-1) are transiently upregulated with peak expression at 4 hours post-E2 treatment. MMP-2 expression is increased by E2 but highest expression and activity occur later in the response (48 hours). Dexamethasone inhibits E2-modulated changes in collagen density and expression of MMPs alt
暂无评论