检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

735 篇 会议
74 篇 期刊文献

馆藏范围

809 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

501 篇 工学
- 301 篇 电子科学与技术（可...
- 164 篇 计算机科学与技术...
- 99 篇 信息与通信工程
- 93 篇 电气工程
- 93 篇 软件工程
- 69 篇 材料科学与工程（可...
- 43 篇 化学工程与技术
- 36 篇 仪器科学与技术
- 36 篇 控制科学与工程
- 34 篇 光学工程
- 31 篇 机械工程
- 19 篇 冶金工程
- 16 篇 动力工程及工程热...
- 11 篇 网络空间安全
- 10 篇 生物医学工程（可授...
- 7 篇 生物工程
- 5 篇 建筑学
- 5 篇 土木工程
200 篇 理学
- 93 篇 物理学
- 83 篇 数学
- 45 篇 化学
- 18 篇 统计学（可授理学、...
- 13 篇 系统科学
- 8 篇 生物学
30 篇 管理学
- 29 篇 管理科学与工程(可...
- 11 篇 工商管理
10 篇 军事学
- 10 篇 军队指挥学
8 篇 经济学
- 8 篇 应用经济学
4 篇 法学
- 4 篇 社会学
3 篇 医学
1 篇 农学
1 篇 艺术学

主题

28 篇 hardware
25 篇 clocks
22 篇 field programmab...
21 篇 throughput
21 篇 cmos technology
20 篇 application spec...
19 篇 abstracts
18 篇 logic gates
18 篇 random access me...
17 篇 computer archite...
17 篇 switches
15 篇 computational mo...
15 篇 silicon
14 篇 films
14 篇 decoding
13 篇 cmos integrated ...
13 篇 substrates
13 篇 voltage
13 篇 algorithm design...
13 篇 field programmab...

机构

115 篇 state key lab of...
96 篇 state key lab of...
58 篇 state key lab. o...
30 篇 state-key lab of...
23 篇 state key lab of...
19 篇 asic and system ...
19 篇 asic and system ...
15 篇 state-key lab. o...
15 篇 state key lab of...
15 篇 state key lab of...
14 篇 asic & system st...
13 篇 center for discr...
11 篇 state key lab of...
10 篇 state key lab. o...
9 篇 state key lab of...
8 篇 the state key la...
8 篇 department of mi...
7 篇 fudan university...
7 篇 school of microe...
6 篇 state key lab of...

作者

87 篇 xiaoyang zeng
44 篇 zeng xiaoyang
42 篇 xuan zeng
32 篇 xin-ping qu
29 篇 jianli chen
27 篇 jia zhou
27 篇 yibo fan
26 篇 jun yu
24 篇 dian zhou
22 篇 wei li
22 篇 jun han
22 篇 fan yang
22 篇 junyan ren
20 篇 yinyin lin
20 篇 zeng xuan
18 篇 chi nan
18 篇 yun chen
17 篇 kun wang
17 篇 lingli wang
16 篇 nan chi

语言

764 篇 英文
42 篇 中文
3 篇 其他

检索条件"机构=ASIC and System State-Key-Lab"

共 809 条记录，以下是31-40 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

g-BERT: Enabling Green BERT Deployment on FPGA via Hardware-Aware Hybrid Pruning

g-BERT: Enabling Green BERT Deployment on FPGA via Hardware-...

引用

IEEE International Conference on Communications (ICC)

作者： Yueyin Bai Hao Zhou Ruiqi Chen Kuangjie Zou Jialin Cao Haoyang Zhang Jianli Chen Jun Yu Kun Wang State Key Lab of ASIC & System Fudan University Shanghai China

Transformer-based models suffer from large num-ber of parameters and high inference latency, whose deployment are not green due to the potential environmental damage caused by high inference energy consumption. In addition, it is difficult to deploy such models on devices, especially on resource constrained devices such as FPGA. Various model pruning methods are proposed to shrink the model size and resource consumption, so as to fit the models on hardware. However, such methods often introduce floating point of operations (FLOPs) as an agent of hardware performance, which is not accurate. Furthermore, structural pruning methods are always in a single head-wise or layer-wise pattern, which fails to compress the models to the extreme. To resolve the above issues, we propose a green BERT deployment method on FPGA via hardware-aware and hybrid pruning, named g-BERT. Specifically, two hardware-aware metrics are introduced by High Level Synthesis (HLS) to evaluate the latency and power consumption of inference on FPGA, which can be optimized directly while pruning. Moreover, we simultaneously consider pruning of heads and full encoder layers. To efficiently find the optimal structure, g-BERT applies differentiable neural architecture search (NAS) with a special 0–1 loss function. Compared with the BERT-base, g-BERT achieves $2.1\times$ speedup, $1.9\times$ power consumption reduction and $1.8\times$ model size reduction with comparable accuracy, on par with the state-of-the-art methods.

关键词：

来源：评论

学校读者我要写书评

暂无评论

UPTRA: An Ultra-Parameterized Temporal CGRA Modeling and Optimization

UPTRA: An Ultra-Parameterized Temporal CGRA Modeling and Opt...

引用

Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM)

作者： Yuan Dai Yunhui Qiu Qilong Zhu Jingyuan Li Wenbo Yin Lingli Wang State Key Lab of ASIC and System Fudan University Shanghai China

Temporal Coarse-Grained Reconfigurable Architecture (CGRA) is a typical category of CGRA that supports single-cycle context switching and time-multiplexing hardware resources to perform both spatial and temporal computations. Compared with the spatial CGRA, it can be used in area and power budget-constrained scenarios, with the sacrifice of the throughput. Therefore, achieving minimum Initialization Interval (II) for higher throughput is the main objective in many works for temporal CGRA mapping.

关键词：

来源：评论

学校读者我要写书评

暂无评论

LTrans-OPU: A Low-Latency FPGA-Based Overlay Processor for Transformer Networks

LTrans-OPU: A Low-Latency FPGA-Based Overlay Processor for T...

引用

International Conference on Field Programmable Logic and Applications

作者： Yueyin Bai Hao Zhou Keqing Zhao Manting Zhang Jianli Chen Jun Yu Kun Wang State Key Lab of ASIC & System Fudan University Shanghai China

Existing accelerators for transformer networks with field-programmable gate array (FPGA) either focus only on attention computation or suffer from fixed data streams without flexibility. Moreover, compression and approximation methods of transformer networks have the potential for further optimization. In this article, we propose a low-latency FPGA-based overlay processor, named LTrans-OPU for general accelerations of transformer networks. Specifically, we design a domain-specific overlay architecture, including a computation unit for matrix multiplication of arbitrary dimensions. An instruction set customized for our overlay architecture is also introduced, dynamically controlling data flows by generated instructions. In addition, we introduce a hybrid pruning method common to various transformer networks, along with an efficient non-linear function approximation method. Experimental results show that our design is rather competitive and has low latency. LTrans-OPU achieves 11.10-32.20× speedup compared with CPU and 2.44-6.18 × latency reduction compared with GPU. We also observe 2.36-12.43 × lower latency compared with customized FPGA/asic accelerators, and can be 3.10× faster than NPE.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Edge FPGA-based Onsite Neural Network Training

Edge FPGA-based Onsite Neural Network Training

引用

IEEE International Symposium on Circuits and systems (ISCAS)

作者： Ruiqi Chen Haoyang Zhang Yu Li Runzhou Zhang Guoyu Li Jun Yu Kun Wang State Key Lab of ASIC & System Fudan University Shanghai China

Conjugate gradient (CG) is widely used in training sparse neural networks. However, CG, involving a large amount of sparse matrix and vector operations, cannot be efficiently implemented on resource-limited edge devices. In this paper, a high-performance and energy-efficient CG accelerator implemented on edge Field Programmable Gate Array is proposed for fast onsite neural networks training. According to the profiling, we propose a unified matrix multiplier that is compatible with the sparse and dense matrix. We also design a novel T-engine to handle transpose operation with the compressed sparse format. Experimental results show that our proposal outperforms the state-of-the-art FPGA work with a resource reduction of up to 41.3%. In addition, we achieve on average $10.2\times$ and $2.0\times$ speedup, while $10.1\times$ and $3.5\times$ better energy efficiency than implementations on CPU and GPU, respectively.

关键词：

来源：评论

学校读者我要写书评

暂无评论

PP-Transformer: Enable Efficient Deployment of Transformers Through Pattern Pruning

PP-Transformer: Enable Efficient Deployment of Transformers ...

引用

IEEE International Conference on Computer-Aided Design

作者： Jialin Cao Xuanda Lin Manting Zhang Kejia Shi Jun Yu Kun Wang State Key Lab of ASIC & System Fudan University Shanghai China

Transformer models have been widely adopted in the field of Natural Language Processing (NLP) and Computer Vision (CV). However, the excellent performance of Transformers comes at the cost of heavy memory footprints and gigantic computing complexity. To deploy Transformers on resource constrained platforms, e.g., FPGA, diverse weight pruning strategies have been proposed. However, pattern pruning, as an alternative pruning method, is not well explored in the context of Transformers. In this paper, we propose PP-Transformer, a framework specifically designed to efficiently deploy Transformer models on FPGA using pattern pruning. At the algorithm level, we leverage pattern pruning, a coarse-grained structured pruning strategy, to reduce parameter storage. Meanwhile, we have developed a dedicated hardware architecture, featuring a custom computing engine tailored to support pattern pruning algorithm. Experimental results demonstrate that our algorithm achieves up to $2.26\times$ reduction in parameter storage with acceptable accuracy degradation. Additionally, our hardware implementation exhibits $839.72\times$ and $5.72\times$ speedup in comparison to CPU and GPU implementations.

关键词：

来源：评论

学校读者我要写书评

暂无评论

THRAM: A Template-based Heterogeneous CGRA Modeling Framework Supporting Fast DSE

THRAM: A Template-based Heterogeneous CGRA Modeling Framewor...

引用

IEEE International Symposium on Circuits and systems (ISCAS)

作者： Jingyuan Li Yunhui Qiu Guowei Zhu Qilong Zhu Wenbo Yin Lingli Wang State Key Lab of ASIC and System Fudan University Shanghai China

Coarse-grained reconfigurable architecture (CGRA), composed of word-level processing elements (PEs) and interconnects, has emerged as a promising architecture due to its high performance, energy efficiency, and flexibility. Although multiple CGRA frameworks have been proposed, a complete heterogeneous CGRA exploration framework with tunable interconnect flexibility and fast design space exploration (DSE) is still lacking. In this paper, we propose an open-source template-based CGRA exploration framework that integrates the modeling of heterogeneous PEs and interconnects, RTL generation, DFG mapping, automatic simulation and verification, and fast DSE based on a CGRA framework TRAM. Moreover, we present a novel resource-efficient shared reconfigurable delay unit (RDU) for data synchronization, which can save the CGRA area by 7%, compared with the separated RDU. Further, the explored optimal heterogeneous architecture can reduce the area and power by 44.7% and 42.9% respectively, and improve the PE utilization by 20.4%, compared with the 8 × 8 baseline architecture in TRAM.

关键词：

来源：评论

学校读者我要写书评

暂无评论

A Dynamic Partial Reconfigurable CGRA Framework for Multi-Kernel Applications

A Dynamic Partial Reconfigurable CGRA Framework for Multi-Ke...

引用

IEEE International Conference on Field-Programmable Technology (FPT)

作者： Qilong Zhu Yuhang Cao Yunhui Qiu Xuchen Gao Wenbo Yin Lingli Wang State Key Lab of ASIC and System Fudan University Shanghai China

When an application is accelerated with Coarse-Grained Reconfigurable Architecture (CGRA), it is compiled into Data Flow Graph (DFG). In conventional CGRA frameworks, only one DFG is accelerated in each epoch. Consequently, single-context CGRAs can’t fully utilize hardware resources when executing multi-kernel applications. In this paper, we propose a dynamic partial reconfigurable CGRA framework for multi-kernel applications. The modeled CGRA can flexibly partition hardware resources and support parallelism of multiple DFGs by implementing dynamic partial reconfiguration (DPR). A multi-kernel scheduler based on integer linear programming (ILP) makes a timetable for the execution state of the application, and an incremental mapper compiles DFGs according to the timetable. Compared with the baseline, TRAM, our framework achieves an average throughput increase of 67.30% and utilization increase of 32.46% for a single task with multi-kernels while an average execution time reduction of 55.71% and an average utilization increase of 70.43% for applications with multiple tasks.

关键词：

来源：评论

学校读者我要写书评

暂无评论

FPGA Accelerating Multi-Source Transfer Learning with GAT for Bioactivities of Ligands Targeting Orphan G Protein-Coupled Receptors

FPGA Accelerating Multi-Source Transfer Learning with GAT fo...

引用

International Conference on Field Programmable Logic and Applications

作者： Ruiqi Chen Haoyang Zhang Jun Yu Kun Wang State Key Lab of ASIC & System Fudan University Shanghai China

Machine learning has been used extensively in the bioactivity value (BAV) prediction of G Protein-Coupled Receptors (GPCR) targeting ligands. However, the performance of over 140 types of GPCR endogenous ligands, also called orphan GPCRs (oGPCRs), is still unsatisfactory due to the limited sample size. Also, current works are far from meeting the demand for fast inference time and energy efficiency. We propose the Multi-Source Transfer-Graph Attention Network (MSTL-GAT), as well as its FPGA-based accelerator. Firstly, we make use of the three ideal data sources for transfer learning, oGPCRs, experimentally validated GPCRs, and invalidated GPCRs similar to the former one. Secondly, we transform GPCRs from the SIMLEs format to graphics as the input of GAT to improve prediction accuracy. Moreover, we propose an FPGA-based accelerator tailored for the inference phase of MSTL-GAT. Finally, our experimental results show that MSTL-GAT remarkably improves the prediction of GPCRs ligand activity value compared with previous studies. On average, the two evaluation indexes we adopt, R2 and RMSE, improve by 34.76% and 13.16%, respectively. The proposed FPGA accelerator achieves 2.7× and 4.7× speedup, 29.7×, and 3.6× energy efficiency compared with works on GPU implementation and the state-of-the-art FPGA accelerator, respectively.

关键词：

来源：评论

学校读者我要写书评

暂无评论

E2-ACE: An Energy-Efficient Reconfigurable Crypto-Accelerator with Agile End-to-End Toolchain

E2-ACE: An Energy-Efficient Reconfigurable Crypto-Accelerato...

引用

IEEE International Conference on Field-Programmable Technology (FPT)

作者： Yuhang Cao Yunhui Qiu Xuchen Gao Qilong Zhu Wenbo Yin Lingli Wang State Key Lab of ASIC and System Fudan University Shanghai China

In today’s tech-driven society, the emphasis on data privacy and security has skyrocketed. With technological progress, the emergence of new encryption algorithms and advanced attack technologies compel the need for algorithm upgrades. With rising hardware costs and demand for flexible cryptographic platforms, single-algorithm accelerators are insufficient, making versatile accelerators supporting multiple encryption algorithms essential. Besides flexibility, energy and area efficiency are increasingly important for various encrypted application platforms like embedded devices. Currently, few cryptographic processing accelerators like Anole [1] prioritize energy efficiency and flexibility. However, their primary focus is on symmetric key algorithms and Hash algorithms, without including Fully Homomorphic Encryption over the Torus (TFHE) [2]. Solutions like MATCHA [3] focus on TFHE but compromise compatibility with other algorithms. Additionally, a user-friendly, end-to-end toolchain is lacking in existing solutions. To address these challenges, we propose $\mathrm{E}^{2}$-ACE, based on the TRAM [4], supporting symmetric key algorithms, Hash algorithms, and TFHE.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Auto-LUT: Auto Approximation of Non-Linear Operations for Neural Networks on FPGA

Auto-LUT: Auto Approximation of Non-Linear Operations for Ne...

引用

IEEE International Symposium on Circuits and systems (ISCAS)

作者： Haodong Lu Qichang Mei Kun Wang State Key Lab of ASIC & System Fudan University Shanghai China

The approximation of non-linear operation can simplify the logic design and save the system resources during the neural network inference on Field-Programmable Gate Array (FPGA). Prior work can approximate the non-linear operations with piecewise linear (PWL) function, but such approximation neglects considering the hardware overhead simultaneously. This paper proposes a novel approximation framework called Auto-LUT, which leverages a neural network to automatically approximate the non-linear operations. The framework formulates the approximation error and hardware overhead as a multi-objective optimization problem and employs an automated search mechanism to find the minimum number of segments and data bit width. To improve the approximation accuracy, we propose a bias clipping operation during the training of approximation networks, which enforces the model to approximate within the range of interest. Moreover, a hardware-friendly quantization scheme is further introduced to simulate the hardware behavior, thereby reducing the hardware overhead. Finally, a customized hardware architecture based on FPGA is utilized to deploy the quantized result. The experimental results show that Auto-LUT costs 56.32% less LUTs and 32.31% less flip-flops (FF) while reducing 4.32% approximation error compared to the state-of-the-art method.

关键词：

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共81页 << < 1 2 3 4 5 6 7 8 9 10 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：