In this paper, we investigate three different realizations of the same block from different points of view. the mentioned different realizations include two realizations with embedded processors (custom 16-bit RISC pr...
详细信息
ISBN:
(纸本)9781424410590
In this paper, we investigate three different realizations of the same block from different points of view. the mentioned different realizations include two realizations with embedded processors (custom 16-bit RISC processor and general soft-core processor) and the third realization uses Handel-C as an example of synthesisable high-level abstraction languages. the results show that development time of complete solution (HW and SW) is approximately the same for the Handel-C design and the design with soft-core processor;the development time of the Custom 16-bit RISC processor is about five times higher. Moreover, the throughput of the Handel-C design measured in the number of bits processed in one second is the highest. the obtained frequency and occupied area of the Handel-C design depends on the complexity of the used program. However, results are comparable or even better than results of the embedded processors.
Flow-in-Cloud(FiC) is an acceleration platform designed to make a virtual monolithic large FPGA image from a number of mid-range economical FPGAs. We will show the live demonstration of the acceleration example of FiC...
详细信息
ISBN:
(纸本)9781728148847
Flow-in-Cloud(FiC) is an acceleration platform designed to make a virtual monolithic large FPGA image from a number of mid-range economical FPGAs. We will show the live demonstration of the acceleration example of FiC with 24 boards through the network.
this paper presents the FISH (FPGA-Initiated Software-Handled) framework which allows FPGA accelerators to make system calls to the Linux operating system in CPU-FPGA systems. A special FISH Linux kernel module runnin...
详细信息
ISBN:
(纸本)9789090304281
this paper presents the FISH (FPGA-Initiated Software-Handled) framework which allows FPGA accelerators to make system calls to the Linux operating system in CPU-FPGA systems. A special FISH Linux kernel module running on the CPU provides a system call interface for FPGA accelerators, much like the ABI which exists for software programs. We provide a proofof-concept implementation of this framework running on the Intel Cyclone V SoC device, and show that an FPGA accelerator can seamlessly make system calls as if it were the host program. We see the FISH framework being especially useful for high-level synthesis (HLS) by making it possible to synthesize software code that contains system calls.
A True Random Number Generator (TRNG) is an essential component for security applications of FPGAs. Its requirements include small logic area, high throughput, sufficient randomness backed with a mathematical model, a...
详细信息
ISBN:
(纸本)9781728199023
A True Random Number Generator (TRNG) is an essential component for security applications of FPGAs. Its requirements include small logic area, high throughput, sufficient randomness backed with a mathematical model, and feasibility - ease of implementation. this paper focuses on TRNGs based on a Transition Effect Ring Oscillator (TERO) and presents a three-path configurable TERO (TC-TERO), an improved implementation of TERO that achieves high feasibility with a minimal amount of hardware. According to the evaluation with a Xilinx Artix-7 FPGA, a TC-TERO with a 20-bit configurable parameter only required 40 LUTs. By selecting one of the promising parameters, the proposed TRNG passed AIS-31 Procedure A without post-processing and NIST SP 800-22 with a simple debiasing.
this paper presents the complexity analysis of bit parallel multiplier in polynomial basis on FPGAs, both without and with carry logic. We directly present the Look-Up-Table (LUT) complexity and estimate the resource ...
详细信息
ISBN:
(纸本)9781424419609
this paper presents the complexity analysis of bit parallel multiplier in polynomial basis on FPGAs, both without and with carry logic. We directly present the Look-Up-Table (LUT) complexity and estimate the resource upper bound based on the existed gate-oriented architectures. Experimental results show that no FPGA synthesis tool reaches the estimated upper bound. Furthermore, the area optimization with fast carry logic can save additional 17% resources. the implementation results with manually mapped design on a Xilinx Virtex-4 device are reported.
FPGA becomes a popular technology for implementing Convolutional Neural Network (CNN) in recent years. Most CNN applications on FPGA are domain-specific, e.g., detecting objects from specific categories, in which comm...
详细信息
ISBN:
(数字)9781538685174
ISBN:
(纸本)9781538685174
FPGA becomes a popular technology for implementing Convolutional Neural Network (CNN) in recent years. Most CNN applications on FPGA are domain-specific, e.g., detecting objects from specific categories, in which commonly used CNN models pre-trained on general datasets may not be efficient enough. this paper presents TuRF, an end-to-end CNN acceleration framework to efficiently deploy domain-specific applications on FPGA by transfer learning that adapts pre-trained models to specific domains, replacing standard convolution layers with efficient convolution blocks, and applying layer fusion to enhance hardware design performance. We evaluate TuRF by deploying a pre-trained VGG-16 model for a domain-specific image recognition task onto a Stratix V FPGA. Results show that designs generated by TuRF achieve better performance than prior methods for the original VGG-16 and ResNet-50 models, while for the optimised VGG-16 model TuRF designs are more accurate and easier to process.
this paper introduces a new flow able to fit a parallel application onto an FPGA according to the FPGA characteristics such as computing power and IOs. the flow is based on iterative refactoring and transformations of...
详细信息
ISBN:
(纸本)9781424410590
this paper introduces a new flow able to fit a parallel application onto an FPGA according to the FPGA characteristics such as computing power and IOs. the flow is based on iterative refactoring and transformations of the application. From the resulting application, a VHDL code is generated. this code is finally used to simulate or synthesize the application. Significant experiments have validated the approach.
Supervised machine learning for data classification is increasingly implemented in hardware to be integrated close to the source of the data. the ability to update a trained machine learning model is the most importan...
详细信息
ISBN:
(数字)9781538685174
ISBN:
(纸本)9781538685174
Supervised machine learning for data classification is increasingly implemented in hardware to be integrated close to the source of the data. the ability to update a trained machine learning model is the most important property any classification system must fulfill. this is often achieved by implementing the algorithm on reconfigurable hardware but some applications require speed, size, or power efficiency only application-specific integrated circuits (ASICs) can offer. Architectures that have proven to be very efficient on reconfigurable hardware are not always suited for custom ASIC designs. We therefore propose to integrate commonly used field-programmable technology in an application-specific architecture to allow updates of the trained model. this design pattern allows deep integration into full custom ASICs while leveraging all advantages of reconfigurable hardware.
Physically unclonable functions are used for IP protection, hardware authentication and supply chain security. While many PUF constructions have been put forward in the past decade, only few of them are applicable to ...
详细信息
ISBN:
(纸本)9789090304281
Physically unclonable functions are used for IP protection, hardware authentication and supply chain security. While many PUF constructions have been put forward in the past decade, only few of them are applicable to FPGA platforms. Strict constraints on the placement and routing are the main disadvantages of the existing PUFs on FPGAs, because they place a high effort on the designer. In this paper we propose a new delay-based PUF construction called Monte Carlo PUF, that does not require low-level placement and routing control. this construction relies on the on-chip Monte Carlo method that is applied for measuring the delays of logic elements in order to extract a unique device fingerprint. the proposed construction allows a trade-off between the evaluation time and the error rate. the Monte Carlo PUF is implemented and evaluated on Xilinx Spartan-6 FPGAs.
Variable-latency, or speculative, addition is an effective technique to implement fast adders working on very long operands. Most approaches to speculative addition are either based on the assumption that operands hav...
详细信息
ISBN:
(纸本)9781467381239
Variable-latency, or speculative, addition is an effective technique to implement fast adders working on very long operands. Most approaches to speculative addition are either based on the assumption that operands have equiprobable independent bits, which is rarely the case in real applications due to sign-extension, or they can handle the case of signed numbers at the price of a considerable area overhead. Furthermore, many existing approaches require ad-hoc schemes preventing the reuse of standard adders typically available as optimized library components in many technologies, most notably field-programmable Gate Arrays. this paper introduces an innovative scheme for speculative addition that effectively addresses both problems, yielding fast and low-area circuits able to handle sign-extended numbers speculatively and only made of optimized carry-propagation adders based on fast carry circuitry as basic building blocks.
暂无评论