field-programmable gate array (FPGA) based accelerators are being widely used for acceleration of convolutional neural networks (CNNs) due to their potential in improving the performance and reconfigurability for spec...
详细信息
ISBN:
(纸本)9783030445331;9783030445348
field-programmable gate array (FPGA) based accelerators are being widely used for acceleration of convolutional neural networks (CNNs) due to their potential in improving the performance and reconfigurability for specific application instances. To determine the optimal configuration of an FPGA-based accelerator, it is necessary to explore the design space and an accurate performance prediction plays an important role during the exploration. This work introduces a novel method for fast and accurate estimation of latency based on a Gaussian process parametrised by an analytic approximation and coupled with runtime data. The experiments conducted on three different CNNs on an FPGA-based accelerator on Intel Arria 10 GX 1150 demonstrated a 30.7% improvement in accuracy with respect to the mean absolute error in comparison to a standard analytic method in leave-one-out cross-validation.
A novel hardware-efficient central pattern generator (CPG) model based on the nonlinear dynamics of an asynchronous cellular automaton is presented. It is shown that the presented model can generate multi-phase synchr...
详细信息
ISBN:
(纸本)9783319701363;9783319701356
A novel hardware-efficient central pattern generator (CPG) model based on the nonlinear dynamics of an asynchronous cellular automaton is presented. It is shown that the presented model can generate multi-phase synchronized periodic signals, which are suitable for controlling a snake robot. Then, the presented model is implemented on a fieldprogrammablegatearray (FPGA) and is connected to a snake robot hardware. It is shown by real machine experiments that the presented model can realize rhythmic spinal locomotions of the snake robot. Moreover, it is shown that the presented model consumes much fewer hardware resources (FPGA slices) than a standard simple CPG model.
In this paper, an optimized data structure for managing triples used in a Semantic Web Database and a hardware engine for index construction are presented. We propose an FPGA-centric design, which we call Hardware-Tri...
详细信息
ISBN:
(纸本)9781728102139
In this paper, an optimized data structure for managing triples used in a Semantic Web Database and a hardware engine for index construction are presented. We propose an FPGA-centric design, which we call Hardware-Triplestore. As part of the design, a scalable and parallel architecture for Triplestore construction is introduced. We propose a hybrid data structure consisting of three layers, one for every element of the semantic triple. The data structure is optimized for our hardware-centric design and is stored on an external DDR4-Memory. The Hardware-Triplestore is evaluated separately from the rest of the database system and achieves an insertion rate of 1.24 million triples per second, which is 17 times faster than one of the fastest software Triplestore-RDF-3X-.
Bubble detection and correction logic is vital in modern data capture devices to solve bubbles in the output thermometer codes due to non-linearities in the scale causing negative bin widths. Previous bubble correctio...
详细信息
ISBN:
(纸本)9781665436977
Bubble detection and correction logic is vital in modern data capture devices to solve bubbles in the output thermometer codes due to non-linearities in the scale causing negative bin widths. Previous bubble correction techniques are either unsuitable for short pulse widths and multiple registration (ones-encoder) or have a very short range (all other methods). In this paper, we propose a hardware technique to detect and correct bubbles up to the length of the pulse width while preserving position information using a hybrid between the ones-encoder and a single stage of a modified insertion sort. This design was shown to meet timing on a Xilinx Artix-7 FPGA at 100 MHz or above using only 13% of the device, demonstrating hardware-viability. The design is also fully-pipelined to demonstrate high bandwidths. The limitations of the algorithm are stated and some possible improvements are suggested.
The demand for virtual private networks (VPNs) that provide confidentiality, integrity, and authenticity of communications is growing every year. IPsec is one of the oldest and most widely used VPN protocols, implemen...
详细信息
ISBN:
(纸本)9798350383515;9798350383508
The demand for virtual private networks (VPNs) that provide confidentiality, integrity, and authenticity of communications is growing every year. IPsec is one of the oldest and most widely used VPN protocols, implemented between the internet protocol (IP) layer and the data link layer of the Linux kernel. This implementation method, known as bump-in-the-stack, has the advantage of being able to transparently apply IPsec to traffic without changing the application. However, its throughput efficiency (Gbps/core) is worse than regular Linux communication. Therefore, we chose the bump-in-the-wire (BITW) architecture, which handles IPsec in hardware separate from the host. Our proposed BITW architecture consists of inline cryptographic accelerators implemented in field-programmable gate arrays and a programmable switch that connects multiple such accelerators. A VPN gateway implemented with our architecture is transparent and improves the throughput efficiency by 3.51 times and power efficiency by 3.40 times over a VPN gateway implemented in the Linux kernel. It also demonstrates excellent scalability, and has been confirmed to scale to a maximum of 386.24 Gbps per tunnel, exceeding state-of-the-art technology in maximum throughput and efficiency per tunnel. In multiple-tunnels use cases, the proposed architecture improves the energy efficiency by 2.49 times.
Fluorescence endoscopy is a novel imaging technique that offers a non-invasive means to diagnose and stage cancers without the need to conduct biopsies of suspected lesions. Work carried out by our research group show...
详细信息
ISBN:
(纸本)9789810579432
Fluorescence endoscopy is a novel imaging technique that offers a non-invasive means to diagnose and stage cancers without the need to conduct biopsies of suspected lesions. Work carried out by our research group showed that fluorescence diagnosis can be further enhanced by incorporating a ratio diagnostic algorithm in the system. Currently images captured using these imaging techniques have to be processed and analyzed off-line, adding a delay to the diagnosis process. We aim to develop a real-time image processing and analysis system to be used with fluorescence endoscopy for early diagnosis and staging of oral and bladder cancers. Fast capturing of suspicious features implied by the fluorescence images provides a means for efficient communication, and an optimized and focused in vivo imaging process. The ultimate aim is to provide an accurate, sensitive and non-invasive real-time cancer diagnosis and staging system that can be used in an outpatient clinical setting. In this paper, we describe the framework of such an imaging system, as well as the initial algorithm development and implementation with the field-programmable gate arrays.
High-Level Synthesis (HLS) tools are aimed at enabling performant FPGA designs that are authored in a high-level language. While commercial HLS tools are available today, there is still a substantial performance gap b...
详细信息
ISBN:
(数字)9781665497862
ISBN:
(纸本)9781665497862
High-Level Synthesis (HLS) tools are aimed at enabling performant FPGA designs that are authored in a high-level language. While commercial HLS tools are available today, there is still a substantial performance gap between most designs developed via HLS relative to traditional, labor intensive approaches. We report on several cases where an anticipated performance improvement was either not realized or resulted in decreased performance. These include: programming paradigm choices between data parallel vs. pipelined designs;dataflow implementations;configuration parameter choices;and handling odd data set sizes. The results point to a number of improvements that are needed for HLS tool flows, including a strong need for performance modeling that can reliably guide the compilation optimization process.
The cosine number transform (CNT) is a cosine-like number-theoretic transform, which has been employed as the basis for multimedia security schemes. In this paper, we propose hardware architectures for computing an 8-...
详细信息
ISBN:
(纸本)9781538648810
The cosine number transform (CNT) is a cosine-like number-theoretic transform, which has been employed as the basis for multimedia security schemes. In this paper, we propose hardware architectures for computing an 8-point CNT. The architectures include a pipelined approach and are based on a recently introduced fast algorithm, which has been demonstrated to be more efficient than the direct computation of the CNT. We quantify such an efficiency by considering several aspects inherent to modular arithmetic and comparing metrics obtained from field-programmable gate array (FPGA) implementations of the proposed architectures.
The results of numerical simulations and experiments on the correction of turbulent distortions of a laser beam are presented. The experiments were carried out using an adaptive optical system with a bandwidth of 2000...
详细信息
ISBN:
(数字)9781510638198
ISBN:
(纸本)9781510638198;9781510638181
The results of numerical simulations and experiments on the correction of turbulent distortions of a laser beam are presented. The experiments were carried out using an adaptive optical system with a bandwidth of 2000 Hz. It was shown that for effective correction the bandwidth of the adaptive optical system should be an order of magnitude larger than the bandwidth of turbulent distortions.
Functional hardware description languages (FHDL) provide powerful tools for building new abstractions that enable sophisticated hardware system to be constructed by composing small reusable parts. Raising the level of...
详细信息
ISBN:
(纸本)9781728109961
Functional hardware description languages (FHDL) provide powerful tools for building new abstractions that enable sophisticated hardware system to be constructed by composing small reusable parts. Raising the level of abstractions in hardware designs means the programmer can focus on high-level circuit structure rather than mundane low-level details. The language features that facilitate this include high-order functions, rich static type system with type inference, and parametric polymorphism. We use hand-written structural and behavioral VHDL, Simulink, and the Kansas Lava FHDL to re-implement several components taken from a Simulink model of an orthogonal frequency-division multiplexing (OFDM) physical layer (PHY). Our development demonstrates that an FHDL can require fewer lines of code than traditional design languages without sacrificing performance.
暂无评论