检索结果-内蒙古大学图书馆

IEEE International Symposium on Circuits and systems (ISCAS)

作者： Xi Cheng Shu Cao Shangmei Wang Xiaoyang Zeng Wenhong Li Mingyu Wang State Key Lab of ASIC & System Fudan University Shanghai China

ISBN: (数字)9798350330991

ISBN: (纸本)9798350331004

This paper proposed a hardware-algorithm co-design of an event-driven Spiking Neural Network (SNN) accelerator with structured sparsity for Dynamic Vision Sensors (DVS) applications. The accelerator can accommodate up to 1024 neurons and 1 million synapses for a feed-forward fully connected SNN implementation. Configurable structured sparsity is introduced by modular arithmetic both in the algorithm and hardware to improve the energy efficiency, reduce the memory requirement, and balance the workload between different processing elements. With an event-driven neuron update scheme, the accelerator can fully utilize the benefits of structured sparsity and can directly process DVS output data for classification tasks without encoding. A three-layer SNN is trained through backpropagation through time (BPTT) and is implemented on Xilinx ZCU104 FPGA, which achieves 96% accuracy on the N-MNIST dataset and 79% accuracy on the DVS-Gesture dataset both at 50% sparsity. The top performance of the accelerator on ZCU104 is 3.82 GSOP/S and 5.31 GSOP/W at 250 MHz.

关键词： Accuracy Neurons Vision sensors Throughput Energy efficiency Hardware Voltage control

来源：评论

学校读者我要写书评

暂无评论

TransFRU: Efficient Deployment of Transformers on FPGA with Full Resource Utilization

TransFRU: Efficient Deployment of Transformers on FPGA with ...

引用

Asia and South Pacific Design Automation Conference

作者： Hongji Wang Yueyin Bai Jun Yu Kun Wang State Key Lab of ASIC & System Fudan University Shanghai China

Transformer-based models have achieved huge success in various artificial intelligence (AI) tasks, e.g., natural language processing (NLP) and computer vision (CV). However, transformer-based models always suffer from high computation density, making them hard to be deployed on resource-constrained devices like field-programmable gate array (FPGA). Among the overall process of transformers, self-attention contributes to most of the computation load and becomes the bottleneck of transformer-based models. In this paper, we propose TransFRU, a novel FPGA-based accelerator for self-attention mechanism with full utilization of hardware resources. Specifically, we first leverage 4-bit and 8-bit processing elements (PEs) to package multiple signed multiplications into one DSP block. Second, we skip the zero and near-zero values in the intermediate result of self-attention by a sorting engine. The sorting engine is also responsible for operand sharing to boost the computation efficiency of one DSP block. Experimental results show that our TransFRU achieves $7.86-49.16 \times$ speedup and $151.1 \times$ energy efficiency compared with CPU, $1.41 \times$ speedup and $5.9 \times$ energy efficiency compared with GPU. Furthermore, we observe $1.91- 13.56 \times$ better throughput per DSP block and $3.53-9.62 \times$ energy efficiency compared with previous FPGA accelerators.

关键词：

来源：评论

学校读者我要写书评

暂无评论

A Lossless Compression Algorithm with Hardware Implementation for Dynamic Vision Sensor

A Lossless Compression Algorithm with Hardware Implementatio...

引用

IEEE International Symposium on Circuits and systems (ISCAS)

作者： Zewei Ding Shangmei Wang Yujie Cai Xiaoyang Zeng Wenhong Li Mingyu Wang State Key Lab of ASIC & System Fudan University Shanghai China

ISBN: (数字)9798350330991

ISBN: (纸本)9798350331004

Nowadays, with the increase resolution of Dynamic Vision Sensor (DVS), efficient compression algorithm for event stream is needed urgently. Conventional DVS system encodes event data in address event representation (AER) for output while ignores the data redundancy imposed by the correlation of events. To address this challenge, this paper first analyzes the spatiotemporal characteristics of event stream and the impact of readout circuits. Based on the analysis, the context-based encoding strategies for spatial address, timestamp and polarity of events are proposed respectively with the consideration of data flow in DVS hardware. Besides, the hardware architecture with high parallelism is presented to implement the compression algorithm, which achieves high throughput at an affordable cost. The hardware is implemented in the 55nm process as part of a 512x512 resolution DVS. The experimental results demonstrate that our methods achieves higher average compression ratio compared to conventional and DVS-specific coding algorithms.

关键词： Costs Correlation Heuristic algorithms Vision sensors Throughput Hardware Encoding

来源：评论

学校读者我要写书评

暂无评论

SDAcc: A Stable Diffusion Accelerator on FPGA via Software-Hardware Co-Design

SDAcc: A Stable Diffusion Accelerator on FPGA via Software-H...

引用

Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM)

作者： Hao Zhou Yang Liu Hongji Wang Enhao Tang Shun Li Yifan Zhang Kun Wang State Key Lab of ASIC & System Fudan University Shanghai China

ISBN: (数字)9798350372434

ISBN: (纸本)9798350372441

Stable Diffusion has become one of the mainstream image synthesis algorithms. The mainstream computing platform for Stable Diffusion is GPU. However, the deployment of Stable Diffusion on GPU still faces the problems of power consumption. With dedicated hardware design and optimization, FPGA based Stable Diffusion accelerator can achieve better performance of energy efficiency. In this paper, we propose SDAcc for realizing efficient inference of Stable Diffusion on FPGA. SDAcc is 4.40× faster than CPU. Compared to GPU and CPU, SDAcc achieves 1.27× and 19.66× energy efficiency improvement, respectively.

关键词： Power demand Image synthesis Graphics processing units Energy efficiency Hardware Field programmable gate arrays Optimization

来源：评论

学校读者我要写书评

暂无评论

Booth-NeRF: An FPGA Accelerator for Instant-NGP Inference with Novel Booth-Multiplier

Booth-NeRF: An FPGA Accelerator for Instant-NGP Inference wi...

引用

Asia and South Pacific Design Automation Conference

作者： Zihang Ma Zeyu Li Yuanfang Wang Yu Li Jun Yu Kun Wang State Key Lab of ASIC & System Fudan University Shanghai China

Instant-NGP is the state-of-the-art (SOTA) algorithm of Neural Radiance Field (NeRF) and shows great potential to be adopted in ARNR. However, the high cost of memory and computation limits Instant-NGP’s implementation on edge devices. In light of this, we propose a novel FPGA-based accelerator to reduce power consumption, called Booth-NeRF. Booth-NeRF adopts a fully-pipelined technique and is built upon the Booth algorithm. In addition, it introduces a new instruction set to accommodate Multi-Layer Perceptrons (MLPs) of different sizes, ensuring flexibility and efficiency. Moreover, we propose an FPGA-friendly multiplier architecture for matrix multiplication which is capable of performing exact or approximate multiplication using the Booth algorithm and the select-shift-add technique. Evaluations with a Xilinx Kintex XC7K325T board show that Booth-NeRF achieves $2.20\times$ speedup and $1.31\times$ energy efficiency compared with NVIDIA Jetson Xavier NX-16G GPU.

关键词：

来源：评论

学校读者我要写书评

暂无评论

A Broadband and High Speed CML Divider with Inductor Peaking in 40-nm SMIC 16

A Broadband and High Speed CML Divider with Inductor Peaking...

引用

16th IEEE International Conference on Solid-state and Integrated Circuit Technology, ICSICT 2022

作者： Shen, Chao Jiang, Yang Wu, Tianxiang Cui, Huarui Ma, Shunli Ren, Junyan Fudan University State-Key Laboratory of ASIC&System Shanghai201203 China The 24th Research Institute of China Electronics Technology Group Corporation Chongqing400060 China

ISBN: (数字)9781665469067

ISBN: (纸本)9781665469067

High-frequency broadband current-mode logic static divider circuit fabricated in CMOS 40-nm is presented. In the proposed circuit, inductance peaking technology is utilized to improve the locking frequency and working bandwidth. The optimization strategy of frequency divider in detail is introduced, which can increase the locking range while maintaining low power consumption. The frequency divider can work in the frequency range of more than 20-54GHz, consume 6.48mw of power with a 1.2V power supply, and the core area is only 0.02mm2. © 2022 IEEE.

关键词： Frequency dividing circuits

来源：评论

学校读者我要写书评

暂无评论

A Pin Photodetector Based on a RF-SOI Substrate

A Pin Photodetector Based on a RF-SOI Substrate

引用

2022 China Semiconductor Technology International Conference, CSTIC 2022

作者： Chou, J.J. Wan, J. School of Information Science and Engineering Fudan University State Key Lab of Asic and System Shanghai China

ISBN: (数字)9781665497589

ISBN: (纸本)9781665497589

In this work, A lateral PIN diode is fabricated based on the RF-SOI substrate. The diode is further used in UV photodetection and high-speed optical communication. It exhibits excellent linearity under various light intensity. The response spectrum of the photodiode shows UV-enhanced characteristics and its quantum efficiency reaches up to 89% at wavelength of 300nm. Thanks to the RF-SOI substrate which reduces the parasitic capacitance dramatically, the capacitance of the fabricated photodiode is as low as 0.4pF at -1V bias. The demonstrated photodiode based on RF-SOI substrate might be attractive for high speed optical communication system using short wavelength light. © 2022 IEEE.

关键词： Photodiodes

来源：评论

学校读者我要写书评

暂无评论

Routability-Driven Macro Placement Engine for Modern FPGAs With Complex Cascade Shape and Region Constraints

引用

IEEE Transactions on Computer-Aided Design of Integrated Circuits and systems 2025年

作者： Gu, Hao Gu, Jian Peng, keyu Chen, Jianli Yang, Jun Zhu, Ziran Southeast University National ASIC System Engineering Research Center School of Integrated Circuits Nanjing210096 China Fudan University State Key Lab of ASIC and System Shanghai200433 China

Field-programmable gate array (FPGA) macro placement holds a crucial role within the FPGA physical design flow since it substantially influences the subsequent stages of cell placement and routing. With the increasing number of macros and the complex cascade shape and region constraints imposed by modern FPGAs, the routability and macro placement have become much more challenging. In this paper, we propose an effective and efficient routability-driven macro placement algorithm for modern FPGAs with cascade shape and region constraints. To reserve adequate space for cell placement and guarantee routability, we first develop a routability-driven mixed-size analytical global placement that evenly distributes both macros and cells while considering cascade shape and region constraints. Particularly, the proposed global placement engine integrates a well-trained congestion prediction model, targeting benchmarks with high routing congestion to enhance overall routability. Then, we propose an integer linear programming (ILP)-based cascade shape legalization followed by matching-based macro legalization to remove macro overlaps while satisfying the region constraints. Finally, a routability-driven detailed macro placement is proposed to refine the solution. Compared with the winners of the MLCAD 2023 FPGA macro placement contest and state-of-the-art works, experimental results show that our algorithm achieves the best overall score and routability. © 1982-2012 IEEE.

关键词： Electronic design automation

来源：评论

学校读者我要写书评

暂无评论

A 6b 800MS/s SAR ADC With Speed-Enhanced SAR Logic and Grouped DAC Capacitors 16

A 6b 800MS/s SAR ADC With Speed-Enhanced SAR Logic and Group...

引用

16th IEEE International Conference on Solid-state and Integrated Circuit Technology, ICSICT 2022

作者： Zhang, Yuxuan Zhao, Yutong Lan, Jingchao Ye, Fan Xie, Yufeng Ren, Junyan State Key Lab. of ASIC & System Fudan University Shanghai China School of Microelectronics Fudan University

ISBN: (数字)9781665469067

ISBN: (纸本)9781665469067

This paper presents a 6-bit 800MS/s successive approximation register (SAR) analog-to digital converter (ADC) in 28nm CMOS with grouped digital-to-analog converter (DAC) capacitor array. High-speed operation is achieved by improving the control logic and using custom-designed unit capacitor of comb structure. At 800MS/s sampling rate with Nyquist input, the post simulated SNDR and SFDR are 37.68dB and 50.1dB respectively, and the ADC core consumes 1.82mW from 1V supply, achieving 36.5fJ/conversion-step. © 2022 IEEE.

关键词： Digital to analog conversion

来源：评论

学校读者我要写书评

暂无评论

High Efficient Automatic Power/Ground Layout Routing Algorithm for Analog ICS

High Efficient Automatic Power/Ground Layout Routing Algorit...

引用

2023 China Semiconductor Technology International Conference, CSTIC 2023

作者： Zuo, Jiaxin. Li, Fei. Wan, Jing. Fudan University State Key Lab of Asic and System School of Information Science and Engineering Shanghai China Suzhou Foohu Technology Co. Ltd. China

ISBN: (纸本)9798350311006

In this work, we explored an efficient automatic layout routing algorithm for connecting the power and ground pins in analog integrated circuits. A rectilinear minimal spanning tree (RMST) algorithm for two sets of pins is developed, in which minimal spanning tree is used to form the initial connections between pins. The obstacle-avoiding maze routing algorithm is used to break and reconnect the power and ground nets to avoid any short circuit. The genetic algorithm (GA) is further introduced to optimize the total connection wirelength. We also expanding the wire width to avoid electromigration and IR-drop. © 2023 IEEE.

关键词： Routing algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：