检索结果-内蒙古大学图书馆

46th International Symposium on computer Architecture (ISCA) / Workshop on computer Architecture Education (WCAE)

作者： Jang, Hanhwi Kim, Joonsung Jo, Jae-Eon Lee, Jaewon Kim, Jangwoo POSTECH Pohang Dept Comp Sci & Engn Pohang South Korea Seoul Natl Univ Dept Elect & Comp Engn Seoul South Korea

ISBN: (纸本)9781450366694

Memory-augmented neural networks are getting more attention from many researchers as they can make an inference with the previous history stored in memory. Especially, among these memory-augmented neural networks, memory networks are known for their huge reasoning power and capability to learn from a large number of inputs rather than other networks. As the size of input datasets rapidly grows, the necessity of large-scale memory networks continuously arises. Such large-scale memory networks provide excellent reasoning power;however, the current computer infrastructure cannot achieve scalable performance due to its limited system architecture. In this paper, we propose MnnFast, a novel system architecture for large-scale memory networks to achieve fast and scalable reasoning performance. We identify the performance problems of the current architecture by conducting extensive performance bottleneck analysis. Our in-depth analysis indicates that the current architecture suffers from three major performance problems: high memory bandwidth consumption, heavy computation, and cache contention. To overcome these performance problems, we propose three novel optimizations. First, to reduce the memory bandwidth consumption, we propose a new column-based algorithm with streaming which minimizes the size of data spills and hides most of the offchip memory accessing overhead. Second, to decrease the high computational overhead, we propose a zero-skipping optimization to bypass a large amount of output computation. Lastly, to eliminate the cache contention, we propose an embedding cache dedicated to efficiently cache the embedding matrix. Our evaluations show that MnnFast is significantly effective in various types of hardware: CPU, GPU, and FPGA. MnnFast improves the overall throughput by up to 5.38x, 4.34x, and 2.01x on CPU, GPU, and FPGA respectively. Also, compared to CPU-based MnnFast, our FPGA-based MnnFast achieves 6.54x higher energy efficiency.

关键词： Memory Networks Attention-based Neural Networks Machine Learning Parallel algorithm computation/Dataflow Optimization Accelerator algorithm-hardware co-design Architecture

来源：评论

学校读者我要写书评

暂无评论

A comprehensive analysis of DAC-SDC FPGA low power object detection challenge

引用

Science China(Information Sciences) 2024年第8期67卷 300-320页

作者： Jingwei ZHANG Guoqing LI Meng ZHANG Xinye CAO Yu ZHANG Xiang LI Ziyang CHEN Jun YANG School of Electronics Science and Engineering Southeast University

The lower power object detection challenge(LPODC) at the IEEE/ACM design Automation conference is a premier contest in low-power object detection and algorithm(software)-hardware co-design for edge artificial intelligence, which has been a success in the past five years. LPODC focused on designing and implementing novel algorithms on the edge platform for object detection in images taken from unmanned aerial vehicles(UAVs), which attracted hundreds of teams from dozens of countries to participate. Our team SEUer has been participating in this competition for three consecutive years from 2020 to 2022 and obtained sixth place respectively in 2020 and 2021. Recently, we achieved the championship in 2022. In this paper, we presented the LPODC for UAV object detection from 2018 to 2022, including the dataset, hardware platform,and evaluation method. In addition, we also introduced and discussed the details of methods proposed by each year's top three teams from 2018 to 2022 in terms of network, accuracy, quantization method, hardware performance, and total score. Additionally, we conducted an in-depth analysis of the selected entries and results, along with summarizing representative methodologies. This analysis serves as a valuable practical resource for researchers and engineers in deploying the UAV application on edge platforms and enhancing its feasibility and reliability. According to the analysis and discussion, it becomes evident that the adoption of a hardware-algorithm co-design approach is paramount in the context of tiny machine learning(TinyML).This approach surpasses the mere optimization of software and hardware as separate entities, proving to be essential for achieving optimal performance and efficiency in TinyML applications.

关键词： tiny machine learning object detection convolutional neural networks algorithm-hardware co-design low power field programmable gate array

来源：评论

学校读者我要写书评

暂无评论

G-NMP: Accelerating Graph Neural Networks with DIMM-based Near-Memory Processing

引用

JOURNAL OF SYSTEMS ARCHITECTURE 2022年 129卷

作者： Tian, Teng Wang, Xiaotian Zhao, Letian Wu, Wei Zhang, Xuecang Lu, Fangmin Wang, Tianqi Jin, Xi Univ Sci & Technol China State Key Lab Particle Detect & Elect Hefei 230026 Peoples R China Univ Sci & Technol China Inst Microelect Dept Phys Hefei 230026 Peoples R China Huawei Technol Cent Res Inst Hangzhou 310000 Peoples R China

Graph Neural Networks (GNNs) are of great value in numerous applications and promote the development of cognitive intelligence, due to the capability of modeling non-euclidean data structures. However, the inherent irregularity makes GNNs memory-bound, and the hybrid computing paradigm of GNNs poses significant challenges for efficient deployment on existing hardware architectures. Near-Memory Processing (NMP) is a promising solution for alleviating the memory wall problem. In this paper, we present G-NMP, a practical and efficient DIMM-based NMP solution for accelerating GNNs, which accelerates both sparse Aggregation and dense combination computations on DIMM for the first time. We propose a novel G-NMP hardware architecture to exploit rank-level memory parallelism efficiently, and the G-ISA instructions to reduce host memory requests significantly. We conduct several data flow optimizations on the G-NMP to improve memory-compute overlap and to realize efficient matrix computation. Then we develop an adaptive data allocation strategy for diverse vector sizes to further exploit feature-level parallelism. We also propose a novel memory request scheduling method to achieve flexible and low-overhead DRAM ownership transition between host and G-NMP. Overall, G-NMP achieves consistent performance advantages across diverse GNN models and datasets, and offers 1.46x overall performance and 1.29x energy efficiency on average compared with the state-of-the-art work.

关键词： Graph neural network Near-memory processing Dual in-line memory module algorithm-hardware co-design

来源：评论

学校读者我要写书评

暂无评论

A Unified Programmable Edge Matrix Processor for Deep Neural Networks and Matrix Algebra

引用

ACM TRANSACTIONS ON EMBEDDED coMPUTING SYSTEMS 2022年第5期21卷 63-63页

作者： George, Biji Omer, Om Ji Choudhury, Ziaul Anoop, V Subramoney, Sreenivas Processor Architecture Res Lab 23-56 P Bangalore 560103 Karnataka India Int Inst Informat Technol Hyderabad 500032 Telangana India Intel Technol India Pvt Ltd Xeon Server Grp Bangalore 560103 Karnataka India

Matrix Algebra and Deep Neural Networks represent foundational classes of computational algorithms across multiple emerging applications like Augmented Reality or Virtual Reality, autonomous navigation (cars, drones, robots), data science, and various artificial intelligence-driven solutions. An accelerator-based architecture can provide performance and energy efficiency supporting fixed functions through customized data paths. However, constrained Edge systems requiring multiple applications and diverse matrix operations to be efficiently supported, cannot afford numerous custom accelerators. In this article, we present Mxcore, a unified architecture that comprises tightly coupled vector and programmable cores sharing data through highly optimized interconnects along with a configurable hardware schedulermanaging the co-execution. We submit Mxcore as the generalized approach to facilitate the flexible acceleration of multiple Matrix Algebra and Deep-learning applications across a range of sparsity levels. Unified compute resources improve overall resource utilization and performance per unit area. Aggressive and novel microarchitecture techniques along with block-level sparsity support optimize compute and data-reuse to minimize bandwidth and power requirements enabling ultra-low latency applications for low-power and cost-sensitive Edge deployments. Mxcore requires a small silicon footprint of 0.2068 mm(2), in a modern 7-nm process at 1 GHz and achieves (0.15 FP32 and 0.62 INT8) TMAC/mm(2), dissipating only 11.66 mu W of leakage power. At iso-technology and iso-frequency, Mxcore provides an energy efficiency of 651.4x, 159.9x, 104.8x, and 124.2x as compared to the 128-core Nvidia's Maxwell GPU for dense General Matrix Multiply, sparse Deep Neural Network, Cholesky decomposition, and triangular matrix solve respectively.

关键词： Deep neural network learning algorithm-hardware co-design ASIC hardware acceleration matrix factorization matrix solve convolution neural network edge computing

来源：评论

学校读者我要写书评

暂无评论

Spiking neural networks compensate for weight drift in organic neuromorphic device networks

引用

NEUROMORPHIC coMPUTING AND ENGINEERING 2023年第2期3卷 024008-024008页

作者： Felder, Daniel Linkhorst, John Wessling, Matthias Leibniz Inst Interact Mat DWI Forckenbeckstr 50 Aachen Germany Rhein Westfal TH Aachen AVTCVT Chair Chem Proc Engn Forckenbeckstr 51 Aachen Germany

Organic neuromorphic devices can accelerate neural networks and integrate with biological systems. Devices based on the biocompatible and conductive polymer PEDOT:PSS are fast, require low amounts of energy and perform well in crossbar simulations. However, parasitic electrochemical reactions lead to self-discharge and the fading of the learned conductance states over time. This limits a neural network's operating time and requires complex compensation mechanisms. Spiking neural networks (SNNs) take inspiration from biology to implement local and always-on learning. We show that these SNNs can function on organic neuromorphic hardware and compensate for self-discharge by continuously relearning and reinforcing forgotten states. In this work, we use a high-resolution charge transport model to describe the behavior of organic neuromorphic devices and create a computationally efficient surrogate model. By integrating the surrogate model into a Brian 2 simulation, we can describe the behavior of SNNs on organic neuromorphic hardware. A biologically plausible two-layer network for recognizing 28x28 pixel MNIST images is trained and observed during self-discharge. The network achieves, for its size, competitive recognition results of up to 82.5%. Building a network with forgetful devices yields superior accuracy during training with 84.5% compared to ideal devices. However, trained networks without active spike-timing-dependent plasticity quickly lose their predictive performance. We show that online learning can keep the performance at a steady level close to the initial accuracy, even for idle rates of up to 90%. This performance is maintained when the output neuron's labels are not revalidated for up to 24 h. These findings reconfirm the potential of organic neuromorphic devices for brain-inspired computing. Their biocompatibility and the demonstrated adaptability to SNNs open the path towards close integration with multi-electrode arrays, drug-delivery devices, and ot

关键词： neuromorphic computing spiking neural network spike timing dependent plasticity organic electronics algorithm-hardware co-design

来源：评论

学校读者我要写书评

暂无评论

Reminding forgetful organic neuromorphic device networks

引用

NEUROMORPHIC coMPUTING AND ENGINEERING 2022年第4期2卷 044014-044014页

作者： Felder, Daniel Muche, Katerina Linkhorst, John Wessling, Matthias DWI Leibniz Inst Interact Mat Forckenbeckstr 50 Aachen Germany Rhein Westfal TH Aachen AVT CVT Chair Chem Proc Engn Forckenbeckstr 51 Aachen Germany

Organic neuromorphic device networks can accelerate neural network algorithms and directly integrate with microfluidic systems or living tissues. Proposed devices based on the bio-compatible conductive polymer PEDOT:PSS have shown high switching speeds and low energy demand. However, as electrochemical systems, they are prone to self-discharge through parasitic electrochemical reactions. Therefore, the network's synapses forget their trained conductance states over time. This work integrates single-device high-resolution charge transport models to simulate entire neuromorphic device networks and analyze the impact of self-discharge on network performance. Simulation of a single-layer nine-pixel image classification network commonly used in experimental demonstrations reveals no significant impact of self-discharge on training efficiency. And, even though the network's weights drift significantly during self-discharge, its predictions remain 100% accurate for over ten hours. On the other hand, a multi-layer network for the approximation of the circle function is shown to degrade significantly over twenty minutes with a final mean-squared-error loss of 0.4. We propose to counter the effect by periodically reminding the network based on a map between a synapse's current state, the time since the last reminder, and the weight drift. We show that this method with a map obtained through validated simulations can reduce the effective loss to below 0.1 even with worst-case assumptions. Finally, while the training of this network is affected by self-discharge, a good classification is still obtained. Electrochemical organic neuromorphic devices have not been integrated into larger device networks. This work predicts their behavior under nonideal conditions, mitigates the worst-case effects of parasitic self-discharge, and opens the path toward implementing fast and efficient neural networks on organic neuromorphic hardware.

关键词： neuromorphic computing artificial synapse organic electronics neural network algorithm-hardware co-design

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：