检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

1,335 篇 会议
41 篇 期刊文献

馆藏范围

1,376 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

811 篇 工学
- 784 篇 计算机科学与技术...
- 348 篇 软件工程
- 281 篇 电气工程
- 259 篇 电子科学与技术（可...
- 122 篇 信息与通信工程
- 36 篇 动力工程及工程热...
- 35 篇 控制科学与工程
- 30 篇 机械工程
- 17 篇 生物工程
- 14 篇 仪器科学与技术
- 12 篇 建筑学
- 12 篇 土木工程
- 10 篇 生物医学工程（可授...
- 9 篇 冶金工程
- 8 篇 化学工程与技术
- 7 篇 光学工程
- 7 篇 材料科学与工程（可...
- 3 篇 农业工程
224 篇 理学
- 184 篇 数学
- 36 篇 物理学
- 20 篇 统计学（可授理学、...
- 19 篇 生物学
- 11 篇 系统科学
- 7 篇 化学
62 篇 管理学
- 50 篇 管理科学与工程(可...
- 34 篇 工商管理
- 13 篇 图书情报与档案管...
16 篇 经济学
- 16 篇 应用经济学
11 篇 法学
- 9 篇 社会学
3 篇 农学
- 3 篇 作物学
2 篇 教育学
1 篇 医学

主题

451 篇 fpga
269 篇 field programmab...
171 篇 field programmab...
27 篇 high-level synth...
26 篇 reconfigurable c...
22 篇 deep learning
22 篇 opencl
20 篇 computer archite...
18 篇 hls
18 篇 routing
18 篇 hardware acceler...
18 篇 hardware
17 篇 fpgas
16 篇 accelerator
16 篇 placement
14 篇 neural networks
14 篇 cnn
14 篇 machine learning
14 篇 convolutional ne...
13 篇 clocks

机构

19 篇 university of ca...
12 篇 tsinghua univers...
12 篇 fudan university
12 篇 imperial college...
11 篇 university of to...
10 篇 peking universit...
10 篇 university of to...
9 篇 university of ce...
9 篇 university of so...
8 篇 univ of californ...
7 篇 university of sc...
7 篇 univ of toronto ...
7 篇 epfl lausanne
7 篇 école polytechni...
7 篇 univ toronto dep...
7 篇 nanyang technolo...
7 篇 tsinghua univ pe...
6 篇 univ british col...
6 篇 univ calif los a...
6 篇 northeastern uni...

作者

37 篇 cong jason
26 篇 rose jonathan
22 篇 jason cong
17 篇 betz vaughn
15 篇 zhang zhiru
14 篇 chen deming
13 篇 ienne paolo
12 篇 chow paul
12 篇 wawrzynek john
11 篇 hauck scott
10 篇 dehon andré
10 篇 luk wayne
10 篇 prasanna viktor ...
10 篇 langhammer marti...
9 篇 jinmei lai
9 篇 anderson jason h...
9 篇 wilton steven j....
9 篇 schmit herman
9 篇 jonathan rose
9 篇 constantinides g...

语言

1,356 篇 英文
19 篇 其他
1 篇 中文

检索条件"任意字段=FPGA 2000: ACM/SIGDA International Symposium on Field Programmable Gate Arrays"

共 1376 条记录，以下是281-290 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

ADAM: Automated Design Analysis and Merging for Speeding up fpga Development 18

ADAM: Automated Design Analysis and Merging for Speeding up ...

引用

acm/sigda international symposium on field-programmable gate arrays (fpga)

作者： Ng, Ho-Cheung Liu, Shuanglong Luk, Wayne Imperial Coll London Dept Comp London England

ISBN: (纸本)9781450356145

This paper introduces ADAM, an approach for merging multiple fpga designs into a single hardware design, so that multiple place-and-route tasks can be replaced by a single task to speed up functional evaluation of designs, especially during the development process. ADAM has three key elements. First, a novel approximate maximum common subgraph detection algorithm with linear time complexity to maximize sharing of resources in the merged design. Second, a prototype tool implementing this common subgraph detection algorithm for dataflow graphs derived from Verilog designs;this tool would also generate the appropriate control circuits to enable selection of the original designs at runtime. Third, a comprehensive analysis of compilation time versus degree of similarity to identify the optimized user parameters for the proposed approach. Experimental results show that ADAM can reduce compilation time by around 5 times when each design is 95% similar to the others, and the compilation time is reduced from 1 hour to 10 minutes in the case of binomial filters.

关键词： Design productivity fpga Maximum common subgraph Design merging

来源：评论

学校读者我要写书评

暂无评论

Scalable Window Generation for the Intel Broadwell+Arria 10 and High-Bandwidth fpga Systems 18

Scalable Window Generation for the Intel Broadwell+Arria 10 ...

引用

acm/sigda international symposium on field-programmable gate arrays (fpga)

作者： Stitt, Greg Gupta, Abhay Emas, Madison N. Wilson, David Baylis, Austin Univ Florida Dept Elect & Comp Engn Gainesville FL 32611 USA

ISBN: (纸本)9781450356145

Emerging fpga systems are providing higher external memory bandwidth to compete with GPU performance. However, because fpgas often achieve parallelism through deep pipelines, traditional fpga design strategies do not necessarily scale well to large amounts of replicated pipelines that can take advantage of higher bandwidth. We show that sliding-window applications-an important subset of digital signal processing-demonstrate this scalability problem. We introduce a window generator architecture that enables replication to over 330 GB/s, which is an 8.7x improvement over previous work. We evaluate the window generator on the Intel Broadwell+Arria10 system for 2D convolution and show that for traditional convolution (one filter per image), our approach outperforms a 12-core Xeon Broadwell E5 by 81x and a high-end Nvidia P6000 GPU by an order of magnitude for most input sizes, while improving energy by 15.7x. For convolutional neural nets (CNNs), we show that although the GPU and Xeon typically outperform existing fpga systems, projected performances of the window generator running on fpgas with sufficient bandwidth can outperform high-end GPUs for many common CNN parameters.

关键词： fpga convolution neural networks

来源：评论

学校读者我要写书评

暂无评论

Analysis and Optimization of the Implicit Broadcasts in fpga HLS to Improve Maximum Frequency 20

Analysis and Optimization of the Implicit Broadcasts in FPGA...

引用

Proceedings of the 2020 acm/sigda international symposium on field-programmable gate arrays

作者： Licheng Guo Jason Lau Yuze Chi Jie Wang Cody Hao Yu Zhe Chen Zhiru Zhang Jason Cong University of California Los Angeles Los Angeles CA USA Cornell University Ithaca NY USA

ISBN: (纸本)9781450370998

Designs generated by high-level synthesis (HLS) tools typically achieve a lower frequency compared to manual RTL designs. We study the timing issues in a diverse set of nine realistic HLS designs and observe that in most cases the frequency degradation is related to the signal broadcast structures. In this work, we classify the common broadcast types in HLS designs, including the data signal broadcast and two types of control signal broadcast: the pipeline control broadcast and the synchronization signal broadcast. We further identify several common limitations of the current HLS tools, which lead to improper handling of the broadcasts. First, the HLS delay model does not consider the extra delay caused by broadcasts, thus the scheduling results will be suboptimal. To solve the issue, we implement a set of comprehensive synthetic designs and benchmark the extra delay to calibrate the HLS delay model. Second, the HLS adopts back-pressure signals for pipeline control, which will lead to large broadcasts. Instead, we propose to use the skid-buffer-based pipeline control, where the back-pressure signal is removed, and an extra skid-buffer is used for flow-control. We use dynamic programming to minimize the area of the extra FIFO. Third, there exist redundant synchronizations among concurrent modules that may lead to huge broadcasts. We propose methods to identify and prune unnecessary synchronization signals. Our solutions boost the frequency of nine real-world HLS benchmarks by 53% on average and with marginal area and latency overhead. In some cases, the gain is more than 100 MHz.

关键词： high-level synthesis broadcast fpga frequency fan-out

来源：评论

学校读者我要写书评

暂无评论

LPAC: A Low-Precision Accelerator for CNN on fpgas 20

LPAC: A Low-Precision Accelerator for CNN on FPGAs

引用

Proceedings of the 2020 acm/sigda international symposium on field-programmable gate arrays

作者： Tianyu Zhang Tiantian Han Lu Tian Yi Li Xijie Jia Guangdong Liu Pingbo An Yingran Tan Lingzhi Sui Shaoxie Fang Dongliang Xie Michaela Blott Yi Shan Xilinx Inc. Beijing Shi China Xilinx Inc. Dublin Ireland

ISBN: (纸本)9781450370998

Low bit quantization of neural network is required on edge devices to achieve lower power consumption and higher performance. 8bit or binary network either consumes a lot of resources or has accuracy degradation. Thus, a full-process hardware-friendly quantization solution of 4A4W (activations 4bit and weights 4bit) is proposed to achieve better accuracy/resource trade-off. It doesn't contain any additional floating operations and achieve accuracy comparable to full-precision. We also implement a low-precision accelerator for CNN (LPAC) on the Xilinx fpga, which takes full advantage of its DSP by efficiently mapping convolutional computations. Through on-chip reassign management and resource-saving analysis, high performance can be achieved on small chips. Our 4A4W solution achieves 1.8x higher performance than 8A8W and 2.42x increase in power efficiency under the same resource. On ImageNet classification, the accuracy has a gap less than 1% to full-precision in Top-5. On the human pose estimation, we achieve 261 frames per second on ZU2EG, which is 1.78x speed up compared to 8A8W and the accuracy has only 1.62% gap to full-precision. This proves that our solution has better universality.

关键词： low-precision hardware accelerator dsp fpga low-precision cnns full-process hardware-friendly quantization solution

来源：评论

学校读者我要写书评

暂无评论

Accelerating Graph Analytics by Co-Optimizing Storage and Access on an fpga-HMC Platform 18

Accelerating Graph Analytics by Co-Optimizing Storage and Ac...

引用

acm/sigda international symposium on field-programmable gate arrays (fpga)

作者： Khoram, Soroosh Zhang, Jialiang Strange, Maxwell Li, Jing Univ Wisconsin Dept Elect & Comp Engn Madison WI 53706 USA

ISBN: (纸本)9781450356145

Graph analytics, which explores the relationships among interconnected entities, is becoming increasingly important due to its broad applicability, from machine learning to social sciences. However, due to the irregular data access patterns in graph computations, one major challenge for graph processing systems is performance. The algorithms, softwares, and hardwares that have been tailored for mainstream parallel applications are generally not effective for massive, sparse graphs from the real-world problems, due to their complex and irregular structures. To address the performance issues in large-scale graph analytics, we leverage the exceptional random access performance of the emerging Hybrid Memory Cube (HMC) combined with the flexibility and efficiency of modern fpgas. In particular, we develop a collaborative software/hardware technique to perform a level-synchronized Breadth First Search (BFS) on a fpga-HMC platform. From the software perspective, we develop an architecture-aware graph clustering algorithm that exploits the fpga-HMC platform's capability to improve data locality and memory access efficiency. From the hardware perspective, we further improve the fpga-HMC graph processor architecture by designing a memory request merging unit to take advantage of the increased data locality resulting from graph clustering. We evaluate the performance of our BFS implementation using the AC-510 development kit from Micron and achieve 2.8x average performance improvement compared to the latest fpga-HMC based graph processing system over a set of benchmarks from a wide range of applications.

关键词： Graph Analytics Graph Clustering Hybrid Memory Cube Reconfigurable Logic Hardware Accelerators

来源：评论

学校读者我要写书评

暂无评论

Degree-aware Hybrid Graph Traversal on fpga-HMC Platform 18

Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform

引用

acm/sigda international symposium on field-programmable gate arrays (fpga)

作者： Zhang, Jialiang Li, Jing Univ Wisconsin Dept Elect & Comp Engn 1415 Johnson Dr Madison WI 53706 USA

ISBN: (纸本)9781450356145

Graph traversal is a core primitive for graph analytics and a basis for many higher-level graph analysis methods. However, irregularities in the structure of scale-free graphs (e.g., social network) limit our ability to analyze these important and growing datasets. A key challenge is the redundant graph computations caused by the presence of high-degree vertices which not only increase the total amount of computations but also incur unnecessary random data access. In this paper, we present a graph processing system on an fpga-HMC platform, based on software/hardware co-design and co- optimization. For the first time, we leverage the inherent graph property i.e. vertex degree to co-optimize algorithm and hardware architecture. In particular, we first develop two algorithm optimization techniques: degree-aware adjacency list reordering and degree-aware vertex index sorting. The former can reduce the number of redundant graph computations, while the latter can create a strong correlation between vertex index and data access frequency, which can be effectively applied to guide the hardware design. We further implement the optimized hybrid graph traversal algorithm on an fpga-HMC platform. By leveraging the strong correlation between vertex index and data access frequency made by degree-aware vertex index sorting, we develop two platform-dependent hardware optimization techniques, namely degree-aware data placement and degree-aware adjacency list compression. These two techniques together substantially reduce the amount of access to external memory. Finally, we conduct extensive experiments on an fpga-HMC platform to verify the effectiveness of the proposed techniques. To the best of our knowledge, our implementation achieves the highest performance (45.8 billion traversed edges per second) among existing fpga-based graph processing systems.

关键词： hybrid memory cude graph processing fpga

来源：评论

学校读者我要写书评

暂无评论

Configurable fpga Packet Parser for Terabit Networks with Guaranteed Wire-Speed Throughput 18

Configurable FPGA Packet Parser for Terabit Networks with Gu...

引用

acm/sigda international symposium on field-programmable gate arrays (fpga)

作者： Cabal, Jakub Benacek, Pavel Kekely, Lukas Kekely, Michal Pus, Viktor Korenek, Jan CESNET Ale Prague Czech Republic Netcope Technol Brno Czech Republic FIT BUT IT4Innovat Ctr Excellence Brno Czech Republic

ISBN: (纸本)9781450356145

As throughput of computer networks is on a constant rise, there is a need for ever-faster packet parsing modules at all points of the networking infrastructure. Parsing is a crucial operation which has an influence on the final throughput of a network device. Moreover, this operation must precede any kind of further traffic processing like filtering/classification, deep packet inspection, and so on. This paper presents a parser architecture which is capable to currently scale up to a terabit throughput in a single fpga, while the overall processing speed is sustained even on the shortest frame lengths and for an arbitrary number of supported protocols. The architecture of our parser can be also automatically generated from a high-level description of a protocol stack in the P4 language which makes the rapid deployment of new protocols considerably easier. The results presented in the paper confirm that our automatically generated parsers are capable of reaching an effective throughput of over 1 Tbps (or more than 2 000 Mpps) on the Xilinx UltraScale+fpgas and around 800 Gbps (or more than 1 200 Mpps) on their previous generation Virtex-7 fpgas.

关键词： packet parser HLS P4 Ethernet high-speed networks VHDL

来源：评论

学校读者我要写书评

暂无评论

Rosetta: A Realistic High-Level Synthesis Benchmark Suite for Software programmable fpgas 18

Rosetta: A Realistic High-Level Synthesis Benchmark Suite fo...

引用

acm/sigda international symposium on field-programmable gate arrays (fpga)

作者： Zhou, Yuan Gupta, Udit Dai, Steve Zhao, Ritchie Srivastava, Nitish Jin, Hanchen Featherston, Joseph Lai, Yi-Hsiang Liu, Gai Velasquez, Gustavo Angarita Wang, Wenping Zhang, Zhiru Cornell Univ Sch Elect & Comp Engn Ithaca NY 14853 USA Harvard Univ Comp Sci Cambridge MA 02138 USA Univ Nacl Colombia Syst Engn & Comp Sci Bogota Colombia Zhejiang Univ Elect & Informat Engn Hangzhou Peoples R China

ISBN: (纸本)9781450356145

Modern high-level synthesis (HLS) tools greatly reduce the turnaround time of designing and implementing complex fpga-based accelerators. They also expose various optimization opportunities, which cannot be easily explored at the register-transfer level. With the increasing adoption of the HLS design methodology and continued advances of synthesis optimization, there is a growing need for realistic benchmarks to (1) facilitate comparisons between tools, (2) evaluate and stress-test new synthesis techniques, and (3) establish meaningful performance baselines to track progress of the HLS technology. While several HLS benchmark suites already exist, they are primarily comprised of small textbook-style function kernels, instead of complete and complex applications. To address this limitation, we introduce Rosetta, a realistic benchmark suite for software programmable fpgas. Designs in Rosetta are fully-developed applications. They are associated with realistic performance constraints, and optimized with advanced features of modern HLS tools. We believe that Rosetta is not only useful for the HLS research community, but can also serve as a set of design tutorials for non-expert HLS users. In this paper we describe the characteristics of our benchmarks and the optimization techniques applied to them. We further report experimental results on an embedded fpga device as well as a cloud fpga platform.

关键词： high-level synthesis fpga heterogeneous computing reconfigurable computing benchmarking

来源：评论

学校读者我要写书评

暂无评论

Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on fpgas Using OpenCL 18

Combined Spatial and Temporal Blocking for High-Performance ...

引用

acm/sigda international symposium on field-programmable gate arrays (fpga)

作者： Zohouri, Hamid Reza Podobas, Artur Matsuoka, Satoshi Tokyo Inst Technol Tokyo Japan

ISBN: (纸本)9781450356145

Recent developments in High Level Synthesis tools have attracted software programmers to accelerate their high-performance computing applications on fpgas. Even though it has been shown that fpgas can compete with GPUs in terms of performance for stencil computation, most previous work achieve this by avoiding spatial blocking and restricting input dimensions relative to fpga on-chip memory. In this work we create a stencil accelerator using Intel fpga SDK for OpenCL that achieves high performance without having such restrictions. We combine spatial and temporal blocking to avoid input size restrictions, and employ multiple fpga-specific optimizations to tackle issues arisen from the added design complexity. Accelerator parameter tuning is guided by our performance model, which we also use to project performance for the upcoming Intel Stratix 10 devices. On an Arria 10 GX 1150 device, our accelerator can reach up to 760 and 375 GFLOP/s of compute performance, for 2D and 3D stencils, respectively, which rivals the performance of a highly-optimized GPU implementation. Furthermore, we estimate that the upcoming Stratix 10 devices can achieve a performance of up to 3.5 TFLOP/s and 1.6 TFLOP/s for 2D and 3D stencil computation, respectively.

关键词： fpga Stencil OpenCL Spatial Blocking Temporal Blocking

来源：评论

学校读者我要写书评

暂无评论

Architecture and Circuit Design of an All-Spintronic fpga 18

Architecture and Circuit Design of an All-Spintronic FPGA

引用

acm/sigda international symposium on field-programmable gate arrays (fpga)

作者： Williams, Stephen M. Lin, Mingjie Univ Cent Florida Dept Elect & Comp Engn Orlando FL 32816 USA

ISBN: (纸本)9781450356145

Reconfigurable logic device, such as fpga, has been well-known to be the driver of cutting-edge device technology. In the last five years, there have been extensive studies on constructing novel fpga devices using CMOS technology combined with emerging spintronic devices. Unfortunately, although spintronic device technology promises desirable features such as non-volatility and high area density, its relatively slow switching speed makes it quite challenging to use them as drop-in replacements for CMOS transistors. As such, to fully unlock the performance benefits of spintronic devices, it is imperative to develop innovative design techniques of circuit and architecture that are custom-made for building high-performance fpga devices. In this paper, we aim at fully extracting the benefits of new spin-based device technology through innovative circuit and architecture design techniques for fpgas. Specifically, we exploit the unique characteristics of a domain-wall logic device called the mCell to achieve a direct mapping to NAND-NOR logic and in doing so create a high-throughput non-volatile alternative to LUT-based CMOS reconfigurable logic. To empirically validate our approach, we have performed extensive HSpice circuit simulations. Our simulation results have shown that, for a similar logic capacity, the NAND-NOR fpga design with mCell devices excels across all metrics when compared to the CMOS NAND-NOR fpga design. Not only do we reduce average delay by about 17%, but we also improve path delay variance between different logic block configurations by about 59%, which can ease the burden on the fpga timing analysis CAD tools by having more consistent delay between configurations. To judge the performance of our mCell fpga in practical applications, we measured it against the Stratix IV LUT-based fpga for the MCNC and VTR benchmark suites. Our mCell-based fpga devices prove to be quite competitive against the CMOS LUT-based fpga design, on average reducing delay and

关键词： Spintronic fpga mCell

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共138页 << < 25 26 27 28 29 30 31 32 33 34 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：