检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

1,335 篇 会议
41 篇 期刊文献

馆藏范围

1,376 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

811 篇 工学
- 784 篇 计算机科学与技术...
- 348 篇 软件工程
- 281 篇 电气工程
- 259 篇 电子科学与技术（可...
- 122 篇 信息与通信工程
- 36 篇 动力工程及工程热...
- 35 篇 控制科学与工程
- 30 篇 机械工程
- 17 篇 生物工程
- 14 篇 仪器科学与技术
- 12 篇 建筑学
- 12 篇 土木工程
- 10 篇 生物医学工程（可授...
- 9 篇 冶金工程
- 8 篇 化学工程与技术
- 7 篇 光学工程
- 7 篇 材料科学与工程（可...
- 3 篇 农业工程
224 篇 理学
- 184 篇 数学
- 36 篇 物理学
- 20 篇 统计学（可授理学、...
- 19 篇 生物学
- 11 篇 系统科学
- 7 篇 化学
62 篇 管理学
- 50 篇 管理科学与工程(可...
- 34 篇 工商管理
- 13 篇 图书情报与档案管...
16 篇 经济学
- 16 篇 应用经济学
11 篇 法学
- 9 篇 社会学
3 篇 农学
- 3 篇 作物学
2 篇 教育学
1 篇 医学

主题

451 篇 fpga
269 篇 field programmab...
171 篇 field programmab...
27 篇 high-level synth...
26 篇 reconfigurable c...
22 篇 deep learning
22 篇 opencl
20 篇 computer archite...
18 篇 hls
18 篇 routing
18 篇 hardware acceler...
18 篇 hardware
17 篇 fpgas
16 篇 accelerator
16 篇 placement
14 篇 neural networks
14 篇 cnn
14 篇 machine learning
14 篇 convolutional ne...
13 篇 clocks

机构

19 篇 university of ca...
12 篇 tsinghua univers...
12 篇 fudan university
12 篇 imperial college...
11 篇 university of to...
10 篇 peking universit...
10 篇 university of to...
9 篇 university of ce...
9 篇 university of so...
8 篇 univ of californ...
7 篇 university of sc...
7 篇 univ of toronto ...
7 篇 epfl lausanne
7 篇 école polytechni...
7 篇 univ toronto dep...
7 篇 nanyang technolo...
7 篇 tsinghua univ pe...
6 篇 univ british col...
6 篇 univ calif los a...
6 篇 northeastern uni...

作者

37 篇 cong jason
26 篇 rose jonathan
22 篇 jason cong
17 篇 betz vaughn
15 篇 zhang zhiru
14 篇 chen deming
13 篇 ienne paolo
12 篇 chow paul
12 篇 wawrzynek john
11 篇 hauck scott
10 篇 dehon andré
10 篇 luk wayne
10 篇 prasanna viktor ...
10 篇 langhammer marti...
9 篇 jinmei lai
9 篇 anderson jason h...
9 篇 wilton steven j....
9 篇 schmit herman
9 篇 jonathan rose
9 篇 constantinides g...

语言

1,356 篇 英文
19 篇 其他
1 篇 中文

检索条件"任意字段=FPGA 2000: ACM/SIGDA International Symposium on Field Programmable Gate Arrays"

共 1376 条记录，以下是261-270 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

On the Exploration of Connection-aware Partitioning for Parallel fpga Routing 20

On the Exploration of Connection-aware Partitioning for Para...

引用

Proceedings of the 2020 acm/sigda international symposium on field-programmable gate arrays

作者： Yun Zhou Dries Vercruyce Dirk Stroobandt Ghent University Ghent Belgium

ISBN: (纸本)9781450370998

Routing is one of the most time-consuming steps in the fpga synthesis flow. Existing works have described several ways to accelerate the routing process. The partitioning-based parallel routing technique that leverages the high-performance computing of multi-core processors are gaining popularity recently. Specifically, those parallel routers partition nets to regions by nets' bounding boxes, followed by a parallel routing procedure. Nets can be split up into source-sink connections that share wire segments as much as possible. In order to exploit more parallelism by a finer granularity in both spatial partitioning and routing, a connection-aware routing bounding box model is introduced in this work. We first explore in detail to show that connection-aware partitioning using the new routing bounding boxes enables the parallel routing to perform better runtime efficiency than the existing net-based partitioning by analyzing the workloads of parallel routers. It reduces the connections spanning more than one region and exploits more parallelism. The large heterogeneous Titan23 designs and a detailed representation of the Stratix IV fpga are used for benchmarking. Experimental results show that the parallel fpga router is faster when using our connection-aware partitioning than using the existing net-based partitioning, while achieving similar quality of routing results in terms of the wirelength and critical path delay. The connection-aware routing bounding box model is easy to be embedded into other existing parallel routers and further enables them to be faster.

关键词： better runtime efficiency parallel fpga routing connection-aware partitioning routing bounding box

来源：评论

学校读者我要写书评

暂无评论

QTAccel: A Generic fpga based Design for Q-Table based Reinforcement Learning Accelerators 20

QTAccel: A Generic FPGA based Design for Q-Table based Reinf...

引用

Proceedings of the 2020 acm/sigda international symposium on field-programmable gate arrays

作者： Rachit Rajat Yuan Meng Sanmukh Kuppannagari Ajitesh Srivastava Viktor Prasanna Rajgopal Kannan University of Southern California Los Angeles CA USA US Army Research Lab Adelphi MD USA

ISBN: (纸本)9781450370998

Q-Table based Reinforcement Learning (QRL) is a class of widely used algorithms in AI that work by successively improving the estimates of Q values -- quality of state-action pairs, stored in a table. They significantly outperform Neural Network based techniques when the state space is tractable. Fast learning for AI applications in several domains (e.g. robotics), with tractable 'mid-sized' Q-tables, still necessitates performing substantial rapid updates. State-of-the-art fpga implementations of QRL do not scale with the increasing Q-Table state space, thus are not efficient for such applications. In this work, we develop a novel fpga implementation of QRL, scalable to large state spaces and facilitating a large class of AI applications. Our pipelined architecture provides higher throughput while using significantly fewer on-chip resources and thereby supports a variety of action selection policies that covers Q-Learning and variations of bandit algorithms. Possible dependencies caused by consecutive Q value updates are handled, allowing the design to process one Q-sample every clock cycle. Additionally, we provide the first known fpga implementation of the SARSA (State-Action-Reward-State-Action) algorithm. We evaluate our architecture for Q-Learning and SARSA algorithms and show that our designs achieve a high throughput of up to 180 million Q samples per second.

关键词： artificial intelligence q learning fpga acceleration reinforcement learning accelerator

来源：评论

学校读者我要写书评

暂无评论

INCAME: INterruptible CNN Accelerator for Multi-robot Exploration 20

INCAME: INterruptible CNN Accelerator for Multi-robot Explor...

引用

Proceedings of the 2020 acm/sigda international symposium on field-programmable gate arrays

作者： Jincheng Yu Zhilin Xu Shulin Zeng Chao Yu Jiantao Qiu Chaoyang Shen Yuanfan Xu Guohao Dai Yu Wang Huazhong Yang Tsinghua University Beijing China Tsinghua University Bejing China

ISBN: (纸本)9781450370998

Multi-Robot Exploration (MR-Exploration) that provides the location and map is a basic task for many multi-robot applications. Recent researches introduce Convolutional Neural Network (CNN) to critical components in MR-Exploration, like Feature-point Extraction (FE) and Place Recognition (PR), to improve the system performance. Such CNN-based MR-Exploration requires running multiple CNN models simultaneously, together with complex post-processing algorithms, greatly challenges the hardware platforms, which are usually embedded systems. Previous researches have shown that fpga is a good candidate for CNN processing on embedded platforms. But such accelerators usually process different models sequentially, lacking the ability to schedule multiple tasks at runtime. Furthermore, post-processing of CNNs in FE is also computation consuming and becomes the system bottleneck after accelerating the CNN models. To handle such problems, we propose an INterruptible CNN Accelerator for Multi-Robot Exploration (INCAME) framework for rapid deployment of robot applications on fpga. In INCAME, we propose a virtual-instruction-based interrupt method to support multi-task on CNN accelerators. INCAME also includes hardware modules to accelerate the post-processing of the CNN-based components. Experimental results show that INCAME enables multi-task scheduling on the CNN accelerator with negligible performance degradation (0.3%). With the help of multi-task supporting and post-processing acceleration, INCAME enables embedded fpga to execute MR-Exploration in real time (20 fps).

关键词： robot deep learning fpga multi-agent

来源：评论

学校读者我要写书评

暂无评论

INTB: A New fpga Interconnect Model for Architecture Exploration 20

INTB: A New FPGA Interconnect Model for Architecture Explora...

引用

Proceedings of the 2020 acm/sigda international symposium on field-programmable gate arrays

作者： Chengyu Hu Qinghua Duan Peng Lu Wei Liu Jian Wang Jinmei Lai Fudan University Shanghai China Chengdu Sino Microelectronic Technology Co. Ltd Chengdu China

ISBN: (纸本)9781450370998

CAD exploration is important for designing fpga interconnect topologies. It includes two steps: first, design a model with some parameters that can express as much architecture space. Second, use CAD flow to analyze the described interconnect architecture. In this paper, we present a new interconnect model, named INTB (Interconnect Block). At a logical position, one INTB is adopted to represent all related routing resources and hierarchical parameters are designed to simplify description. Compared with existing CB-SB model, INTB model can support more interconnect features of modern fpga, such as various types of wire segment and complex connections. These features can improve fpga routing ability. For the application of INTB model, two modifications are made in CAD flow: one is generation of routing resource graph (RRG). A tile-based method is proposed to generate RRG from parameters. The other is cost computing during routing process. Two strategies are applied respectively for cost estimation of short and curve wire segment, which do not exist in CB-SB model. INTB model and CAD improvement are implemented in VTR 8.0. The experiments consist of two parts. First, INTB model is adopted to re-describe CB-SB architectures to verify its description capacity. After CAD flow, average difference of routing area and timing between two models is about 4% and 5%. Second, INTB model is used to explore architecture space with modern fpga features. Experimental results show obvious performance enhancement, over 10% in some benchmarks.

关键词： fpga rrg interconnect model cost routing

来源：评论

学校读者我要写书评

暂无评论

Pipeline-aware Logic Deduplication in High-Level Synthesis for Post-Quantum Cryptography Algorithms 20

Pipeline-aware Logic Deduplication in High-Level Synthesis f...

引用

Proceedings of the 2020 acm/sigda international symposium on field-programmable gate arrays

作者： Changsu Kim Yongwoo Lee Shinnung Jeong Wen Wang Jakub Szefer Hanjun Kim POSTECH Seoul South Korea Yonsei University Seoul South Korea Yale University New Haven CT USA

ISBN: (纸本)9781450370998

With the technical advance of quantum computers that can solve intractable problems for conventional computers, many of the currently used public-key cryptosystems become vulnerable. Recently proposed post-quantum cryptography (PQC) is secure against both classical and quantum computers, but existing embedded systems such as smart card can not easily support the PQC algorithms due to their much larger key sizes and more complex arithmetics. To accelerate the PQC algorithms, embedded systems have to embed the PQC hardware blocks, which can lead to huge hardware design costs. Although High-Level Synthesis (HLS) helps significantly reduce the design costs, current HLS frameworks produce inefficient hardware design for the PQC algorithms in terms of area and performance. This work analyzes common features of the PQC algorithms and proposes a new pipeline-aware logic deduplication method in HLS. The proposed method shares commonly invoked logic across hardware design while considering load balancing in pipeline and resolving dynamic memory accesses. This work implements fpga hardware design of seven PQC algorithms in the round 2 candidates from the National Institute of Standards and Technology (NIST) PQC standardization process. Compared to commercial HLS framework, the proposed method achieves an area-delay-product reduction by 34.5%.

关键词： pipeline processing algorithm analysis post-quantum cryptography high-level synthesis fpga resource sharing

来源：评论

学校读者我要写书评

暂无评论

High-Performance QR Decomposition for fpgas 18

High-Performance QR Decomposition for FPGAs

引用

acm/sigda international symposium on field-programmable gate arrays (fpga)

作者： Langhammer, Martin Pasca, Bogdan Intel Programmable Solut Grp Swindon Wilts England Intel Programmable Solut Grp Paris France

ISBN: (纸本)9781450356145

QR decomposition (QRD) is of increasing importance for many current applications, such as wireless and radar. Data dependencies in known algorithms and approaches, combined with the data access patterns used in many of these methods, restrict the achievable performance in software programmable targets. Some fpga architectures now incorporate hard floating-point (HFP) resources, and in combination with distributed memories, as well as the flexibility of internal connectivity, can support high-performance matrix arithmetic. In this work, we present the mapping to parallel structures with inter-vector connectivity of a new QRD algorithm. Based on a Modified Gram-Schmidt (MGS) algorithm, this new algorithm has a different loop organization, but the dependent functional sequences are unchanged, so error analysis and numerical stability are unaffected. This work has a theoretical sustained-to-peak performance close to 100% for large matrices, which is roughly three times the functional density of the previously best known implementations. Mapped to an Intel Arria 10 device, we achieve 80us for a 256x256 single precision real matrix, for a 417 GFLOP equivalent. This corresponds to a 95% sustained to peak ratio, for the portion of the device used for this work.

关键词： QRD MGS fpga Arria10 throughput

来源：评论

学校读者我要写书评

暂无评论

Advanced Dataflow Programming using Actor Machines for High-Level Synthesis 20

Advanced Dataflow Programming using Actor Machines for High-...

引用

Proceedings of the 2020 acm/sigda international symposium on field-programmable gate arrays

作者： Endri Bezati Mahyar Emami James Larus École Polytechnique Fédérale de Lausanne Lausanne Switzerland

ISBN: (纸本)9781450370998

The use of parallelism has increased drastically in recent years. Parallel platforms come in many forms: multi-core processors, embedded hybrid solutions such as multi-processor system-on-chip with reconfigurable logic, and cloud datacenters with multi-core and reconfigurable logic. These heterogeneous platforms can offer massive parallelism, but it can be difficult to exploit, particularly when combining solutions constructed with multiple architectures. To program a heterogeneous platform, a developer must master different programming languages, tools, and APIs to program each aspect of platform separately and then must find a means to connect them with communication interfaces. The motivation of this work is to provide a single programming model and framework for hardware-software stream programs on heterogeneous platforms. Our framework, StreamBlocks, starts with a dataflow programming model for both embedded and datacenter platforms. Dataflow programming is an alternative model of computation that captures both data and task parallelism. We describe a compiler infrastructure for CAL dataflow programs for hardware code generation. CAL is a dataflow programming language that can express multiple dataflow models of computation. StreamBlocks is based on the Tycho compiler infrastructure, which transforms each actor in a dataflow program to an abstract machine model, called Actor Machine. Actor Machines provides a unified model for executing actors in both hardware and software and permit our compiler extension and backend to generate efficient fpga code. Unlike other systems, the programming model and compiler directly support hardware-software systems in which an fpga functions as a coprocessor to a CPU. This permits easy integration with existing workflows.

关键词： hls opencl cal dataflow actor machine stream programming fpga

来源：评论

学校读者我要写书评

暂无评论

Performance Evaluation and Power Analysis of Teraflop-scale Fluid Simulation with Stratix 10 fpga 20

Performance Evaluation and Power Analysis of Teraflop-scale ...

引用

Proceedings of the 2020 acm/sigda international symposium on field-programmable gate arrays

作者： Atsushi Koshiba Kouki Watanabe Takaaki Miyajima Kentaro Sano RIKEN Center for Computational Science Kobe Japan RIKEN Center for Computational Science Tohoku University Sendai Japan

ISBN: (纸本)9781450370998

Stream computing is a suitable approach to improve both performance and power efficiency of numerical computations with fpgas. To achieve further performance gain, temporal and spatial parallelism were exploited: the first one deepens and the latter duplicates pipelines of streamed computation cores. These two types of parallelism were previously evaluated with Arria 10 fpga. However, it has not been verified if they are also effective for the latest fpga, Stratix 10, which has a larger amount of logic elements (i.e., 2.4X of Arria 10) and is equipped with a new feature to improve the maximum clock frequency (i.e., HyperFlex architecture). To show the scalability for such state-of-the-art fpgas, in this paper, we firstly implemented a streamed fluid simulation accelerator with both parallelism types for Stratix 10. We then thoroughly evaluated it by obtaining computational performance (FLOPS), power efficiency (FLOPS/W), resource utilization, and maximum clock frequency (Fmax). From the results, we found that this implementation excessively used DSP blocks due to inefficient mapping of floating-point operations, which reduced Fmax and the number of pipelined cores. To improve the scalability, we optimized the implementation to reduce the DSP block usage by utilizing a Multiply-Add function in a single DSP block. As a result, the optimized fluid simulation achieves 1.06 TFLOPS and 12.6 GFLOPS/W, which is 1.36X and 1.24X higher than the non-optimized version, respectively. Moreover, we estimate that the fluid simulation with Stratix 10 could outperform GPU-based implementation with Tesla V100 by optimizing it for HyperFlex architecture.

关键词： hardware accelerators fluid simulation fpga stream computing floating-point

来源：评论

学校读者我要写书评

暂无评论

A HOG-based Real-time and Multi-scale Pedestrian Detector Demonstration System on fpga 18

A HOG-based Real-time and Multi-scale Pedestrian Detector De...

引用

acm/sigda international symposium on field-programmable gate arrays (fpga)

作者： Duerre, Jan Paradzik, Dario Blume, Holger Leibniz Univ Hannover Inst Microelect Syst Hannover Germany

ISBN: (纸本)9781450356145

Pedestrian detection will play a major role in future driver assistance and autonomous driving. One powerful algorithm in this field uses HOG features to describe the specific properties of pedestrians in images. To determine their locations, features are extracted and classified window-wise from different scales of an input image. The results of the classification are finally merged to remove overlapping detections. The real-time execution of this method requires specific fpga- or ASIC-architectures. Recent work focused on accelerating the feature extraction and classification. Although merging is an important step in the algorithm, it is only rarely considered in hardware implementations. A reason for that could be its complexity and irregularity that is not trivial to implement in hardware. In this paper, we present a new bottom-up fpga architecture that maps the full HOG-based algorithm for pedestrian detection including feature extraction, SVM classification, and multi-scale processing in combination with merging. For that purpose, we also propose a new hardware-optimized merging method. The resulting architecture is highly efficient. Additionally, we present an fpga-based full real-time and multi-scale pedestrian detection demonstration system.

关键词： svm fpga real-time demonstration system pedestrian detection multi-scale merging hog

来源：评论

学校读者我要写书评

暂无评论

Improving fpga Performance with a S44 LUT Structure 18

Improving FPGA Performance with a S44 LUT Structure

引用

acm/sigda international symposium on field-programmable gate arrays (fpga)

作者： Feng, Wenyi Greene, Jonathan Mishchenko, Alan Microsemi Corp SOC Prod Grp San Jose CA 95134 USA Univ Calif Berkeley Dept EECS Berkeley CA 94720 USA

ISBN: (纸本)9781450356145

fpga performance depends in part on the choice of basic logic cell. Previous work dating back to 1999-2005 found that the best look-up table (LUT) sizes for area-delay product are 4-6, with 4 better for area and 6 for performance. Since that time several things have changed. A new "LUT structure" mapping technique can target cells with a larger number of inputs (cut size) without assuming that the cell implements all possible functions of those inputs. We consider in particular a 7-input function composed of two tightly-coupled 4-input LUTs. Changes in process technology have increased the relative importance of wiring delay and configuration memory area. Finally, modern benchmark applications include carry chains, math and memory blocks. Due to these changes, we show that mapping to a 7-input LUT structure can approach the performance of 6-input LUTs while retaining the area and static power advantage of 4-input LUTs.

关键词： placement mapping routing fpga logic module

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共138页 << < 23 24 25 26 27 28 29 30 31 32 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：