video frame prediction represents a fundamental challenge in computer vision, necessitating precise modeling of both spatial and temporal dynamics within video sequences. This computational task holds substantial impl...
详细信息
video frame prediction represents a fundamental challenge in computer vision, necessitating precise modeling of both spatial and temporal dynamics within video sequences. This computational task holds substantial implications across diverse domains, including video compression optimization, robust object tracking systems, and advanced motion forecasting applications. In this investigation, we present a novel hybrid architecture that synthesizes the complementary strengths of Convolutional Long Short-Term Memory (ConvLSTM) networks and three-dimensional Convolutional Neural Networks (3D CNN) for enhanced frame prediction capabilities. Our methodological framework incorporates a ConvLSTM component that fundamentally augments the traditional LSTM architecture through the integration of convolutional operations, thereby facilitating sophisticated modeling of sequential dependencies. Concurrently, the 3D CNN component employs volumetric convolutional layers to extract rich spatio-temporal features from the input sequences. Rigorous empirical evaluation demonstrates the superior performance of the ConvLSTM architecture, which consistently yields reduced validation errors and elevated coefficients of determination. Specifically, the ConvLSTM model achieves a validation Mean Squared Error (MSE) of 0.0237 and an R2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textrm{R}}<^>video$$\end{document} value of 0.6951, substantially outperforming the 3D CNN model, which exhibits a validation MSE of 0.0471 and an R2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textrm{R}}<^>video$$\end{document} value of 0.3939. These empiri
This paper extends a previous conference publication that proposed a real-time task scheduling framework for criticality-based machine perception, leveraging image resizing as the tool to control the accuracy and exec...
详细信息
This paper extends a previous conference publication that proposed a real-time task scheduling framework for criticality-based machine perception, leveraging image resizing as the tool to control the accuracy and execution time trade-off. Criticality-based machine perception reduces the computing demand of on-board AI-based machine inference pipelines (that run on embedded hardware) in applications such as autonomous drones and cars. By segmenting inputs, such as individual video frames, into smaller parts and allowing the downstream AI-based perception module to process some segments ahead of (or at a higher quality than) others, limited machine resources are spent more judiciously on more important parts of the input (e.g., on foreground objects in lieu of backgrounds). In recent work, we explored the use of image resizing as a way to offer a middle ground between full-resolution processing and dropping, thus allowing more flexibility in handling less important parts of the input. In this journal extension, we make the following contributions: (i) We relax a limiting assumption of our prior work;namely, the need for a "perfect sensor" to identify which parts of the image are more critical. Instead, we investigate the use of real LiDAR measurements for quick-and-dirty image segmentation ahead of AI-based processing. (ii) We explore another dimension of freedom in the scheduler: namely, merging several nearby objects into a consolidated segment for downstream processing. We formulate the scheduling problem as an optimal resize-merge problem and design a solution for it. Experiments on an AI-powered embedded platform with a real-world driving dataset demonstrate the practicality and effectiveness of our proposed framework.
The research in this paper focuses on three main aspects: Improving the dark channel prior defogging algorithm to make it suitable for implementation on FPGA;Designing a high-definition real-timevideo defogging syste...
详细信息
Dehazing algorithms have been developed in re-sponse to the need for effectively and instantaneously removing atmospheric turbidities such as mist, haze, and fog from media. The removal of haze from an image or video ...
详细信息
In this paper, we propose a region-of-interest (RoI) reinforced real-time communication system, RoIRTC, for improving the quality of videos delivered in real-time communication. RoIRTC uses a novel RoI magnification t...
详细信息
ISBN:
(纸本)9798350390155;9798350390162
In this paper, we propose a region-of-interest (RoI) reinforced real-time communication system, RoIRTC, for improving the quality of videos delivered in real-time communication. RoIRTC uses a novel RoI magnification transformation for spatially adapting the camera-captured video frame. To automatically detect the RoI, it intelligently leverages a deeplearning-based saliency prediction model without affecting the video collector's processing throughput or the encoder's efficiency. Evaluation results based on actual remote learning videos show that RoIRTC that performs RoI magnification can improve the median PSNR by 2.6 dB compared to the naive WebRTC implementation. Compared to an approach that mimics the "background blur" scheme used in many realtime communication systems, RoIRTC can also improve the median PSNR by 4.2 dB.
Due to the improvement in the car manifacture, the rate of road traffic accidents is increasing. To solve these problems, there is loads of attention in research on the development of driver assistance systems, where ...
详细信息
ISBN:
(纸本)9783031538292;9783031538308
Due to the improvement in the car manifacture, the rate of road traffic accidents is increasing. To solve these problems, there is loads of attention in research on the development of driver assistance systems, where the main innovation is traffic sign recognition (TSR). In this article, a special convolutional neural network model with high accuracy compared to traditional models is used for TSR. The Uzbek Traffic Sign Dataset (UTSD) applied in the zone of Uzbekistan was created, consisting of 21.923 images belonging to 56 classes. We proposed a parallel computing method for real-timeprocessing of video haze removal. Our utilization can process the 1920 x 1080 video series with 176 frames per second for the dark channel prior (DCP) algorithm. 8.94 times reduction of calculation time compared to the Central processing Unit (CPU) was achieved by performing the TSR process on the Graphics processing Unit (GPU). The algorithms used to detect traffic signs are improved YOLOv5. The results showed a 3.9% increase in accuracy.
Intelligent recognition algorithms deployed on edge devices offer strong real-timeprocessing capabilities and high security for online videoimage analysis. However, real-timevideoimage recognition remains challeng...
详细信息
With the advancements in deep learning, video colorization by propagating color information from a colorized reference frame to a monochrome video sequence has been well explored. However, the existing approaches ofte...
详细信息
ISBN:
(纸本)9781728198354
With the advancements in deep learning, video colorization by propagating color information from a colorized reference frame to a monochrome video sequence has been well explored. However, the existing approaches often suffer from overfitting the training dataset and sequentially lead to suboptimal performance on colorizing testing samples. To address this issue, we propose an effective method, which aims to enhance video colorization through test-time tuning. By exploiting the reference to construct additional training samples during testing, our approach achieves a performance boost of 1 similar to 3 dB in PSNR on average compared to the baseline. Code is available at: https://***/IndigoPurple/T3.
Compared with the traditional video, omnidirectional stereo video (ODSV) provides a larger field of view (FOV) with depth perception but makes the capturing, processing and displaying more complicated. Even though man...
详细信息
Compared with the traditional video, omnidirectional stereo video (ODSV) provides a larger field of view (FOV) with depth perception but makes the capturing, processing and displaying more complicated. Even though many attempts have been made to address these challenges, they leave one or more of the following problems: complicated camera rig, high latency and visible distortions. This paper presents a practical end-to-end solution based on a novel hybrid representation to solve these problems simultaneously. The proposed solution is directly from capturing to displaying, which removes the processing step, thus reducing the total time consumption and visible stitching distortions. This hybrid representation is piecewise linear about the horizontal viewing direction whose domain of definition is 0 degrees to 360 degrees with an assumption that the background is static, consisting of both static and moving regions. Using this representation, ODSV can be presented by omnidirectional stereo images and normal stereo pair of videos respectively. Moreover, a single panoramic camera strategy can be adopted to capture the omnidirectional stereo images in real environment and a normal binocular camera can be used to capture the stereo pair of videos. To display the ODSV, this paper presents a real-time tracking-based rendering algorithm for head mounted display (HMD). Experiments show that the proposed method is effective and cost-efficient. In contrast to state-of-the-art methods, the proposed method significantly reduces the complexity of camera rig and data amount, preserving a competitive stereo quality without visible distortions.
暂无评论