We implement the YOLO (You only look once) object detector on an FPGA, which is faster and has higher accuracy. It is based on the convolutional deep neural network (CNN), and it is a dominant part of boththe perform...
详细信息
ISBN:
(数字)9781538685174
ISBN:
(纸本)9781538685174
We implement the YOLO (You only look once) object detector on an FPGA, which is faster and has higher accuracy. It is based on the convolutional deep neural network (CNN), and it is a dominant part of boththe performance and the area. It is widely used in the embedded systems, such as robotics, autonomous driving, security, and drones, all of which require high-performance and low-power consumption. A frame object detection problem consists of two problems: one is a regression problem to spatially separated bounding boxes, the second is the associated classification of the objects within realtime frame rate. We used the binary (1 bit) precision CNN for feature extraction and the half-precision (16 bit) precision CNN for both classification and localization. We implement a pipelined based architecture for the mixed-precision YOLOv2 on the Xilinx Inc. zcul02 board, which has the Xilinx Inc. Zynq Ultrascale+ MPSoC. the implemented object detector archived 35.71 frames per second (FPS), which is faster than the standard video speed (29.9 FPS). Compared with a CPU and a GPU, an FPGA based accelerator was superior in power performance efficiency. Our method is suitable for the frame object detector for an embedded vision system.
In image processing, FPGAs have shown very high performance in spite of their low operational frequency. this high performance comes from (1) high parallelism in applications in image processing, (2) high ratio of 8 b...
详细信息
ISBN:
(纸本)9781424419609
In image processing, FPGAs have shown very high performance in spite of their low operational frequency. this high performance comes from (1) high parallelism in applications in image processing, (2) high ratio of 8 bit operations, and (3) a large number of internal memory banks on FPGAs which can be accessed in parallel. In the recent micro processors, it becomes possible to execute SIMD instructions on 128 bit data in one clock cycle. Furthermore, these processors support multi-cores and large cache memory which can hold all image data for each core. In this paper, we compare the performance of FPGAs withthose processors using three applications in image processing;two-dimensional filters, stereo-vision and k-means clustering, and make it clear how fast is an FPGA in image processing, and how many hardware resources are required to achieve the performance.
Withthe increasing wider application of neural networks, there has been significant focus on accelerating this class of computations. Larger, more complex networks are being proposed in a variety of domains, requirin...
详细信息
ISBN:
(纸本)9781728148847
Withthe increasing wider application of neural networks, there has been significant focus on accelerating this class of computations. Larger, more complex networks are being proposed in a variety of domains, requiring more powerful computation platforms. the inherent parallelism and regularity of neural network structures means custom architectures can be adopted for this purpose. FPGAs have been widely used to implement such accelerators because of their flexibility, achievable performance, efficiency, and abundant peripherals. While platforms that utilize multicore CPUs and GPUs are also competitive, FPGAs offer superior energy efficiency, and a wider space of optimisations to enhance performance and efficiency. FPGAs are also more suitable for performing such computations at the edge, where multicore CPUs and GPUs are are less likely to be used and energy efficiency is paramount.
this paper describes a deterministic and parallel implementation of the VPR routability-driven router for FPGAs. We considered two parallefization strategies: (1) routing multiple nets in parallel;and (2) routing one ...
详细信息
ISBN:
(数字)9781538685174
ISBN:
(纸本)9781538685174
this paper describes a deterministic and parallel implementation of the VPR routability-driven router for FPGAs. We considered two parallefization strategies: (1) routing multiple nets in parallel;and (2) routing one net at a time, while parallelizing the Maze Expansion step. Using eight threads running on eight cores, the two methods achieved speedups of 1.84 x and 3.67 x, respectively, compared to VPR's single threaded routability-driven router. Removing the determinism requirement increased these respective speedups to 2.67 x and 5.46 x, while sacrificing the guarantee of reproducible results.
the GUINNESS is a tool flow for the deep neural network toward FPGA implementation [3,4,5] based on the GUI (Graphical User Interface) including boththe binarized deep neural network training on GPUs and the inferenc...
详细信息
ISBN:
(纸本)9789090304281
the GUINNESS is a tool flow for the deep neural network toward FPGA implementation [3,4,5] based on the GUI (Graphical User Interface) including boththe binarized deep neural network training on GPUs and the inference on an FPGA. It generates the trained the Binarized deep neural network [2] on the desktop PC, then, it generates the bitstream by using standard the FPGA CAD tool flow. All the operation is done on the GUI, thus, the designer is not necessary to write any scripts to descript the neural network structure, training behaviour, only specify the values for hyper parameters. After finished the training, it automatically generates C++ codes to synthesis the bitstream using the Xilinx SDSoC system design tool flow. thus, our tool flow is suitable for the software programmers who are not familiar withthe FPGA design.
the ability to accurately and efficiently estimate the routability of a circuit based on its placement is one of the most challenging and difficult tasks in the fieldprogrammable Gate Array (FPGA) flow. In this paper...
详细信息
ISBN:
(纸本)9781728148847
the ability to accurately and efficiently estimate the routability of a circuit based on its placement is one of the most challenging and difficult tasks in the fieldprogrammable Gate Array (FPGA) flow. In this paper, we present a novel, deep-learning framework based on a Convolutional Neural Network model for predicting the routability of a placement. We also incorporate the deep-learning model into a state-of-the-art placement tool, and show how the model can be used to (1) avoid costly, but futile, place-and-route iterations, and (2) improve the placer's ability to produce routable placements for hard-to-route circuits using feedback based on routability estimates generated by the proposed model. the model is trained and evaluated using over 26K placement images derived from 372 benchmarks supplied by Xilinx Inc. Experimental results show that the proposed framework achieves a routability prediction accuracy of 97%, while exhibiting runtimes of only a few milliseconds.
Reduced device-level reliability and increased within-die process variability will become serious issues for future field-programmable gate arrays (FPGAs), and will result in faults developing dynamically during the l...
详细信息
Reduced device-level reliability and increased within-die process variability will become serious issues for future field-programmable gate arrays (FPGAs), and will result in faults developing dynamically during the lifetime of the integrated circuit. Fortunately, FPGAs have the ability to reconfigure in the field and at runtime, thus providing opportunities to overcome such degradation-induced faults. this study provides a comprehensive survey of fault detection methods and fault-tolerance schemes specifically for FPGAs and in the context of device degradation, withthe goal of laying a strong foundation for future research in this field. All methods and schemes are quantitatively compared and some particularly promising approaches are highlighted.
We describe the application of a hybrid functional level power analysis (FLPA) and instruction level power analysis (ILPA) approach to a processor model implemented on an FPGA. this technique enables the estimation of...
详细信息
ISBN:
(纸本)9781424410590
We describe the application of a hybrid functional level power analysis (FLPA) and instruction level power analysis (ILPA) approach to a processor model implemented on an FPGA. this technique enables the estimation of the task specific power consumption of the modeled processor, in our case a LEON2, very early during a system design flow, based on the software which will run on it. the FLPA/ILPA model used during our work as well as the test scenarios and the measured results are described. Later, the function block separation and the power consumption modeling are discussed. Finally, the model is validated by benchmarking. the obtained model is promising in the sense that a) its estimations are close (4 % on average) to the measured data, and b) the model structure is similar to that of hardcore processors which is not a trivial result.
Structured ASICs have recently emerged as a mid-way between cell-based ASICs with high NRE costs and FPGAs with high unit costs. though the structured ASIC fabric attacks mask and other fixed cost it does not solve ve...
详细信息
ISBN:
(纸本)9781424410590
Structured ASICs have recently emerged as a mid-way between cell-based ASICs with high NRE costs and FPGAs with high unit costs. though the structured ASIC fabric attacks mask and other fixed cost it does not solve verification, particularly physical verification issues with ASICs or logic errors missed by simulation which would require re-spins. these can be avoided by testing in-system with an FPGA and migrating the FPGA design to a closely coupled structured ASIC fabric. Here we describe a practical methodology for a fast, push-button, and thorough verification approach tying an FPGA prototype to a matching structured-ASIC implementation for cost-reduction. Our focus is the equivalence verification between the respective revisions of a design, including netlist, compiler settings, macro-block parameters, timing constraints, pin layout and resource count.
FPGA accelerators are being applied in various types of systems ranging from embedded systems to cloud computing for their high performance and energy efficiency. Given the scale of deployment, there is a need for eff...
详细信息
ISBN:
(数字)9781538685174
ISBN:
(纸本)9781538685174
FPGA accelerators are being applied in various types of systems ranging from embedded systems to cloud computing for their high performance and energy efficiency. Given the scale of deployment, there is a need for efficient application development, resource management, and scalable systems, which make FPGA virtualization extremely important. Consequently, FPGA virtualization methods and hardware infrastructures have frequently been proposed in both academia and industry for addressing multi-tenancy execution, multi-FPGA acceleration, flexibility, resource management and security. In this survey, we identify and classify the various techniques and approaches into three main categories: 1) Resource level, 2) Node level, and 3) Multi-node level. In addition, we identify current trends and developments and highlight important future directions for FPGA virtualization which require further work.
暂无评论