Simultaneous Localization and Mapping is intended for robotic and autonomous vehicle applications. These targets require an optimal embedded implementation that respects real-time constraints, limited hardware resourc...
详细信息
Simultaneous Localization and Mapping is intended for robotic and autonomous vehicle applications. These targets require an optimal embedded implementation that respects real-time constraints, limited hardware resources, and energy consumption. SLAM algorithms are computationally intensive to run on embedded targets, and often, the algorithms are deployed on CPUs or CPU-GPGPU architectures. With the growth of embedded heterogeneous computing systems, research work is increasingly interested in the algorithm-architecture mapping of existing SLAM algorithms. The latest trend is pushing processing closer to the sensor. FPGAs constitute the perfect architecture for designing smart sensors by providing low latency suitable for real-time applications, such as video streaming, as they supply data directly into the FPGA without needing a CPU. In this work, we propose the implementation of the HOOFR-SLAM front end on a CPU-FPGA architecture, including both feature extraction and matching processing blocks. A high-level synthesis (HLS) approach based on OpenCL paradigm has been used to design a new system architecture. The performance of the FPGA-based architecture was compared to a high-performance CPU. This innovative architecture delivers superior performance compared to existing state-of-the-art systems.
Aiming at the problem of low defect detection rate of PCB images captured by cameras in industrial scenarios under low-light environments, an MGIE (Mean-Gamma image Enhancement) image brightness enhancement algorithm ...
详细信息
Aiming at the problem of low defect detection rate of PCB images captured by cameras in industrial scenarios under low-light environments, an MGIE (Mean-Gamma image Enhancement) image brightness enhancement algorithm and the corresponding FPGA design scheme are proposed. Firstly, the RGB image is converted into the YCrCb color space, and the illumination component Y is separated. Then, the illumination component Y is enhanced by the MSR (Multi-Scale Retinex) algorithm based on multi-scale mean filtering, and the Gamma correction algorithm is used to adjust the brightness. Subsequently, the processed Y channel is fused with the Cr and Cb channels to obtain the final output. Secondly, after algorithm research, this paper elaborates on the algorithm design and deployment scheme based on FPGA. The MGIE IP core is designed in the HLS (High-Level Synthesis) environment, and optimization and acceleration are carried out by means of creating look-up tables and constructing PIPELINE. Significantly, this research is capable of real-timeprocessing of images in video. Specifically, images are captured in realtime by the OV5640 camera, and the processed images are immediately displayed on the LCD screen. The experimental results show that the MGIE algorithm has remarkable effectiveness in processing low-light PCB images, with a PSNR (Peak Signal-to-Noise Ratio) reaching 17.34 and an SSIM (Structural Similarity Index Measure) reaching 0.79. After the end-to-end deployment, the processing speed of 1280 x 720 and 640 x 640 pixel images reaches 30fps/s and 70fps/s, respectively, meeting the needs of real-timeprocessing.
This study investigates the practical performance of neural-network post-filters standardized in ITU-T H.274. We implement neural-network models on a Field-Programmable Gate Array (FPGA), allowing real-timeprocessing...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
This study investigates the practical performance of neural-network post-filters standardized in ITU-T H.274. We implement neural-network models on a Field-Programmable Gate Array (FPGA), allowing real-timeprocessing of 4K 60fps encoded videos transmitted via 12G-SDI. Experimental results suggest that a minor bitrate increase for the transmission of the neural-network model weights can enhance the quality of the videos encoded by Versatile video Coding (VVC).
Experiments on public datasets suggest that this method certifies its effectiveness, reaches human-level performance, and outperforms current state-of-the-art methods with 92.8% on the extended Cohn-Kanade (CK+) and 8...
详细信息
Experiments on public datasets suggest that this method certifies its effectiveness, reaches human-level performance, and outperforms current state-of-the-art methods with 92.8% on the extended Cohn-Kanade (CK+) and 87.0% on FERPLUS.
“A locally-processed light-weight deep neural network for detecting colorectal polyps in wireless capsule endoscopes” propose a light-weight DNN model that has the potential of running locally in the WCE [2].
[...]only images indicating potential diseases are transmitted, saving energy on data transmission.
Background subtraction is a substantially important videoprocessing task that aims at separating the foreground from a video in order to make the post-processing tasks efficient.
[...]several different techniques have been proposed for this task but most of them cannot perform well for the videos having variations in both the foreground and the background.
“Background subtraction in videos using LRMF and CWM algorithm,” a novel background subtraction technique is proposed that aims at progressively fitting a particular subspace for the background that is obtained from L1-low rank matrix regularization using the cyclic weighted median algorithm and a certain distribution of a mixture of Gaussian noise for the foreground [3].
This paper introduces an efficient, lightweight, invisible, blind, real-timevideo watermarking system. Symmetric chaotic key encryption enhances the system's security, ensuring robustness by randomly selecting pi...
详细信息
This paper introduces an efficient, lightweight, invisible, blind, real-timevideo watermarking system. Symmetric chaotic key encryption enhances the system's security, ensuring robustness by randomly selecting pixels or coefficients for watermark embedding. The first-level discrete wavelet transform (DWT) is applied to selected data, embedding the watermark into the low-frequency band (LL sub-band). The approach involves random selection of data for quantization using the quantization index modulation (QIM) technique. The proposed scheme is implemented on a low-cost FPGA board (Zybo Z7-20), using a software/hardware (SW/HW) co-design approach. Experimental results demonstrate high fidelity with a peak signal-to-noise ratio (PSNR) exceeding 35 dB and normalized correlation (NC) around 0.99. The architecture achieves a balanced compromise between low FPGA area with high operational speed up to 127 MHz and minimal power consumption not exceeding 51 mW. Performance evaluation confirms the system's robustness against various attacks, including filtering, additional noise, geometrical modifications, and contrast adjustments. This makes it highly suitable for real-time embedded video applications where data integrity is paramount.
Robotic systems employed in tasks such as navigation, target tracking, security, and surveillance often use camera gimbal systems to enhance their monitoring and security capabilities. These camera gimbal systems unde...
详细信息
Robotic systems employed in tasks such as navigation, target tracking, security, and surveillance often use camera gimbal systems to enhance their monitoring and security capabilities. These camera gimbal systems undergo fast to-and-fro rotational motion to surveil the extended field of view (FOV). A high steering rate (rotation angle per second) of the gimbal is essential to revisit a given scene as fast as possible, which results in significant motion blur in the captured video frames. real-time motion deblurring is essential in surveillance robots since the subsequent image-processing tasks demand immediate availability of blur-free images. Existing deep learning (DL) based motion deblurring methods either lack real-time performance due to network complexity or suffer from poor deblurring quality for large motion blurs. In this work, we propose a Gyro-guided Network for real-time motion deblurring (GRNet) which makes effective use of existing prior information to improve deblurring without increasing the complexity of the network. The steering rate of the gimbal is taken as a prior for data generation. A contrastive learning scheme is introduced for the network to learn the amount of blur in an image by utilizing the knowledge of blur content in images during training. To the GRNet, a sharp reference image is additionally given as input to guide the deblurring process. The most relevant features from the reference image are selected using a cross-attention module. Our method works in real-time at 30 fps. As a first, we propose a Gimbal Yaw motion real-wOrld (GYRO) dataset of infrared (IR) as well as color images with significant motion blur along with the inertial measurements of camera rotation, captured by a gimbal-based imaging setup where the gimbal undergoes rotational yaw motion. Both qualitative and quantitative evaluations on our proposed GYRO dataset, demonstrate the practical utility of our method.
Versatile video Coding (VVC), the latest advancement in video coding standards, significantly outperforms its predecessor, High Efficiency video Coding (HEVC), in terms of coding efficiency. Traditional motion represe...
详细信息
Versatile video Coding (VVC), the latest advancement in video coding standards, significantly outperforms its predecessor, High Efficiency video Coding (HEVC), in terms of coding efficiency. Traditional motion representation methods, which rely solely on translational motion, often fall short in accurately depicting object positions in high-definition (HD) video content. The computation of motion with high software complexity can be very time-consuming;therefore, it is essential to utilize a hardware parallel pipeline for efficient processing. This paper introduces a novel, high-throughput iterative method optimized for hardware implementation, which significantly enhances affine motion prediction efficiency within VVC. The hardware architecture employs highly parallel pipeline operations to design matching criteria for computing the optimal combination of affine motion vectors, using 64x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document}64 processing units to support 64x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document}64 to 128x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document}128 coding units. Additionally, a high-efficiency affine iterative array was designed to adjust the motion vectors, along with an edge detection operator to streamline the computation of image gradients. Experimental results demonstrate that the hardware-based approach efficiently processes 4K vide
Due to the rapidly increasing number of vehicles and urbanization, the use of parking spaces on the streets has increased significantly. Many studies have been carried out on the determination of parking spaces by usi...
详细信息
Due to the rapidly increasing number of vehicles and urbanization, the use of parking spaces on the streets has increased significantly. Many studies have been carried out on the determination of parking spaces by using the lines in the parking areas. However, the usage areas of this method are very limited since these lines are not found in every parking area. In this research, a unique study has been presented to determine the empty and occupied parking spaces in the parking area by processing the images from the cameras located at high points on the streets with depth calculation, perspective transformation and certain imageprocessing techniques within the framework of specific features. Empty and full parking lots were determined by utilizing perspective transformation and depth measurement techniques, and the data obtained were transferred to the real-time Database environment. In addition to determining the parking spaces, the study also aims to inform users through the mobile application and to prevent traffic congestion, extra fuel consumption, waste of time and air pollution caused by fuel consumption.
With the rapid development of video-on-demand (VOD) and real-time streaming video technologies, the accurate objective assessment of streaming video Quality of Experience (QoE) has become a focal point for optimizing ...
详细信息
ISBN:
(纸本)9798331529543;9798331529550
With the rapid development of video-on-demand (VOD) and real-time streaming video technologies, the accurate objective assessment of streaming video Quality of Experience (QoE) has become a focal point for optimizing streaming-related technologies. However, due to the inherent transmission distortions caused by poor Quality of Service (QoS) conditions in streaming videos, such as intermittent stalling, rebuffering, and drastic changes in video sharpness due to bitrate fluctuations, evaluating streaming video QoE presents numerous challenges. This paper introduces a large and diverse in-the-wild streaming video QoE evaluation dataset - the SJLIVE-1k dataset. This work addresses the limitations of corresponding datasets, which lack in-the-wild video sequences under real network conditions and whose amount of video content is insufficient. Furthermore, we propose an end-to-end objective QoE evaluation strategy that extracts video content and QoS features from the video itself without using any extra information. By implementing self-supervised contrastive learning as the "reminder" to bridge the gap between the different types of features, our approach achieves state-of-the-art results across three datasets. Our proposed dataset will be released to facilitate further research.
Speeded up robust feature (SURF) is one of the most popular feature-based algorithms handling image matching. Compared to emerging deep learning neural network-based image matching algorithms, SURF is much faster with...
详细信息
Speeded up robust feature (SURF) is one of the most popular feature-based algorithms handling image matching. Compared to emerging deep learning neural network-based image matching algorithms, SURF is much faster with comparable accuracy. Currently, it is still one of the dominant algorithms adopted in majority of real-time applications. With the increasing popularity of video-based computer vision applications, image matching between an image and different frames of a video stream is required. Traditional algorithms could fail to deal with live video because spatiotemporal differences between frames could cause significant fluctuation in the results. In this study, we propose a self-adaptive methodology to improve the stability and precision of image-video matching. The proposed methodology dynamically adjusts threshold in feature points extraction to control the number of extracted feature points based on the content of the previous frame. Minimum ratio of distance (MROD) matching is integrated to preclude false matches while keeping abundant sample sizes. Finally, multiple homography matrix (H-Matrix) are estimated using progressive sample consensus (PROSAC) with various reprojection errors. The model with lowest mean square error (MSE) will be selected for image-to-video frame matching. The experimental results show that the self-adaptive SURF offers more accurate and stable results while balancing single frame processingtime in image-video matching.
暂无评论