检索结果-内蒙古大学图书馆

A novel hybrid architecture for video frame prediction: combining convolutional LSTM and 3D CNN

JOURNAL OF real-time image processing 2025年第1期22卷 1-18页

作者： Aravinda, C. V. Al-Shehari, Taher Alsadhan, Nasser A. Shetty, Shashank Padmajadevi, G. Reddy, K. R. Udaya Kumar Nitte Deemed Be Univ NMAM Inst Technol Dept Comp Sci & Engn NITTE Karkala 574110 Karnataka India King Saud Univ Dept Selfdev Skill Comp Skills Common Year Deanship 1 Riyadh 11362 Saudi Arabia King Saud Univ Coll Comp & Informat Sci Comp Sci Dept Riyadh 12372 Saudi Arabia Maland Coll Engn Dept Elect & Commun Engn Hassan 573202 Karnataka India Dayananda Sagar Univ Bangalore 560078 Karnataka India

video frame prediction represents a fundamental challenge in computer vision, necessitating precise modeling of both spatial and temporal dynamics within video sequences. This computational task holds substantial implications across diverse domains, including video compression optimization, robust object tracking systems, and advanced motion forecasting applications. In this investigation, we present a novel hybrid architecture that synthesizes the complementary strengths of Convolutional Long Short-Term Memory (ConvLSTM) networks and three-dimensional Convolutional Neural Networks (3D CNN) for enhanced frame prediction capabilities. Our methodological framework incorporates a ConvLSTM component that fundamentally augments the traditional LSTM architecture through the integration of convolutional operations, thereby facilitating sophisticated modeling of sequential dependencies. Concurrently, the 3D CNN component employs volumetric convolutional layers to extract rich spatio-temporal features from the input sequences. Rigorous empirical evaluation demonstrates the superior performance of the ConvLSTM architecture, which consistently yields reduced validation errors and elevated coefficients of determination. Specifically, the ConvLSTM model achieves a validation Mean Squared Error (MSE) of 0.0237 and an R2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textrm{R}}<^>video$$\end{document} value of 0.6951, substantially outperforming the 3D CNN model, which exhibits a validation MSE of 0.0471 and an R2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\textrm{R}}<^>video$$\end{document} value of 0.3939. These empiri

关键词： video frame prediction ConvLSTM network Spatio-temporal model Encoder-decoder network Convolutional neural network (CNN) Long short-term memory (LSTM) Convolutional operations Spatio-temporal dependencies Future frame generation Computer vision applications

来源：评论

学校读者我要写书评

暂无评论

real-time task scheduling with image resizing for criticality-based machine perception

引用

real-time SYSTEMS 2022年第4期58卷 430-455页

作者： Hu, Yigong Liu, Shengzhong Abdelzaher, Tarek Wigness, Maggie David, Philip Univ Illinois Champaign IL 61820 USA US DEVCOM Army Res Lab Adelphi MD USA

This paper extends a previous conference publication that proposed a real-time task scheduling framework for criticality-based machine perception, leveraging image resizing as the tool to control the accuracy and execution time trade-off. Criticality-based machine perception reduces the computing demand of on-board AI-based machine inference pipelines (that run on embedded hardware) in applications such as autonomous drones and cars. By segmenting inputs, such as individual video frames, into smaller parts and allowing the downstream AI-based perception module to process some segments ahead of (or at a higher quality than) others, limited machine resources are spent more judiciously on more important parts of the input (e.g., on foreground objects in lieu of backgrounds). In recent work, we explored the use of image resizing as a way to offer a middle ground between full-resolution processing and dropping, thus allowing more flexibility in handling less important parts of the input. In this journal extension, we make the following contributions: (i) We relax a limiting assumption of our prior work;namely, the need for a "perfect sensor" to identify which parts of the image are more critical. Instead, we investigate the use of real LiDAR measurements for quick-and-dirty image segmentation ahead of AI-based processing. (ii) We explore another dimension of freedom in the scheduler: namely, merging several nearby objects into a consolidated segment for downstream processing. We formulate the scheduling problem as an optimal resize-merge problem and design a solution for it. Experiments on an AI-powered embedded platform with a real-world driving dataset demonstrate the practicality and effectiveness of our proposed framework.

关键词： Machine perception real-time scheduling Cyber-physical systems

来源：评论

学校读者我要写书评

暂无评论

Design of HD real-time video Defogging System Based on FPGA 9

Design of HD Real-time Video Defogging System Based on FPGA

引用

9th International conference on Integrated Circuits and Microsystems, ICICM 2024

作者： Wu, Xinchun Li, Yanliang Han, Xun Zhang, Xiaojun Huang, Xiaobing Sun, Biao School of Information Science and Technology Southwest Jiaotong University Chengdu610097 China Sichuan Police College Department of Transportation Management Luzhou646000 China Technology Co. Ltd Chengdu610095 China

ISBN: (纸本)9798331509453

The research in this paper focuses on three main aspects: Improving the dark channel prior defogging algorithm to make it suitable for implementation on FPGA;Designing a high-definition real-time video defogging system capable of supporting a resolution of 1920×1080 and a frame rate of 60 frames per second;Comparing the defogging results of different algorithms qualitatively and quantitatively. This paper optimizes the dark channel prior defogging algorithm by separately calculating the transmission rate for the sky region based on the high grayscale values in the dark channel image, aiming to reduce color deviation in the sky region after defogging. The proposed defogging algorithm demonstrates efficient performance on FPGA, producing vivid and clear defogged videos while maintaining real-time processing capability. This enhancement effectively improves the usability of video equipment in foggy weather conditions. © 2024 IEEE.

关键词： Integrated circuit design

来源：评论

学校读者我要写书评

暂无评论

Adaptive Contrast Based real-time image and video Dehazing 5

Adaptive Contrast Based Real-Time Image and Video Dehazing

引用

5th International conference on Circuits, Control, Communication and Computing, I4C 2024

作者： Joshi, Rohin Dange, Dhruv Kavya, K.S. Maiya, Sharath R Ramaiah Institute of Technology Department of Information Science and Engineering Bangalore India

ISBN: (纸本)9798331528539

Dehazing algorithms have been developed in re-sponse to the need for effectively and instantaneously removing atmospheric turbidities such as mist, haze, and fog from media. The removal of haze from an image or video enables the extraction of additional details from the scene. This paper presents the development of a real-time, memory-optimized dehazing system that utilizes digital image processing techniques and NVIDIA's CUDA architecture for efficient parallel computing. The methodology incorporates a quad-tree search algorithm for efficient atmospheric light estimation, which significantly enhances dehazing accuracy. Advanced contrast enhancement techniques are employed for transmission estimation to ensure clarity and visibility in dehazed images, addressing challenges such as non-uniform illumination and varying haze densities. For image restoration, the system dynamically clips values to improve brightness and minimize information loss. The transmission map's artifacts are then smoothed using a Gaussian filter, resulting in more reliable dehazing. Extensive testing on a variety of publicly accessible datasets demonstrated that the proposed model achieved comparable accuracy to numerous existing techniques, while also restoring high-quality dehazed images and videos. The system achieves a running time of 35ms per image and up to 10ms per frame for video sequences. Performance was objectively assessed using the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM), yielding a PSNR of 28.75 and an SSIM of 0.80. © 2024 IEEE.

关键词： image reconstruction

来源：评论

学校读者我要写书评

暂无评论

RoIRTC: TOWARD REGION-OF-INTEREST REINFORCED real-time video COMMUNICATION

RoIRTC: TOWARD REGION-OF-INTEREST REINFORCED REAL-TIME VIDEO...

引用

IEEE International conference on Multimedia and Expo (ICME)

作者： Wang, Shuoqian Xiao, Mengbai Liu, Yao SUNY Binghamton Binghamton NY 13902 USA Shandong Univ Jinan Peoples R China Rutgers State Univ New Brunswick NJ USA

ISBN: (纸本)9798350390155;9798350390162

In this paper, we propose a region-of-interest (RoI) reinforced real-time communication system, RoIRTC, for improving the quality of videos delivered in real-time communication. RoIRTC uses a novel RoI magnification transformation for spatially adapting the camera-captured video frame. To automatically detect the RoI, it intelligently leverages a deeplearning-based saliency prediction model without affecting the video collector's processing throughput or the encoder's efficiency. Evaluation results based on actual remote learning videos show that RoIRTC that performs RoI magnification can improve the median PSNR by 2.6 dB compared to the naive WebRTC implementation. Compared to an approach that mimics the "background blur" scheme used in many realtime communication systems, RoIRTC can also improve the median PSNR by 4.2 dB.

关键词： Deep reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Advances in deep learning for real-time image and video reconstruction and processing

引用

JOURNAL OF real-time image processing 2020年第6期17卷 1883-1884页

作者： Shamsolmoali, Pourya Celebi, M. Emre Wang, Ruili Shanghai Jiao Tong Univ Inst Image Proc & Pattern Recognit Shanghai Peoples R China Univ Cent Arkansas Dept Comp Sci Conway AR USA Massey Univ Sch Nat & Math Sci Auckland New Zealand

来源：评论

学校读者我要写书评

暂无评论

Traffic Sign Recognition by image Preprocessing and Deep Learning 15th

Traffic Sign Recognition by Image Preprocessing and Deep Lea...

引用

15th International conference on Intelligent Human-Computer Interaction (IHCI)

作者： Khamdamov, U. R. Umarov, M. A. Khalilov, S. P. Kayumov, A. A. Abidova, F. Sh. Univ Management & Future Technol Tashkent Uzbekistan Tashkent Univ Informat Technol Samarkand Branch Tashkent Uzbekistan Tashkent Univ Informat Technol Tashkent Uzbekistan

ISBN: (纸本)9783031538292;9783031538308

Due to the improvement in the car manifacture, the rate of road traffic accidents is increasing. To solve these problems, there is loads of attention in research on the development of driver assistance systems, where the main innovation is traffic sign recognition (TSR). In this article, a special convolutional neural network model with high accuracy compared to traditional models is used for TSR. The Uzbek Traffic Sign Dataset (UTSD) applied in the zone of Uzbekistan was created, consisting of 21.923 images belonging to 56 classes. We proposed a parallel computing method for real-time processing of video haze removal. Our utilization can process the 1920 x 1080 video series with 176 frames per second for the dark channel prior (DCP) algorithm. 8.94 times reduction of calculation time compared to the Central processing Unit (CPU) was achieved by performing the TSR process on the Graphics processing Unit (GPU). The algorithms used to detect traffic signs are improved YOLOv5. The results showed a 3.9% increase in accuracy.

关键词： Traffic sign recognition deep learning image dehazing UTSD dataset parallel processing data augmentation

来源：评论

学校读者我要写书评

暂无评论

Research on Human Action Recognition Based on Edge Intelligence 4

Research on Human Action Recognition Based on Edge Intellige...

引用

4th International conference on Artificial Intelligence, Robotics, and Communication, ICAIRC 2024

作者： Wang, Di Yang, Zhonglin Liu, Gaotian China North Vehicle Research Institute Fourth Technology Department Beijing China

ISBN: (纸本)9798331531225

Intelligent recognition algorithms deployed on edge devices offer strong real-time processing capabilities and high security for online video image analysis. However, real-time video image recognition remains challenging, particularly when deploying computationally intensive action recognition models on resource-constrained edge devices. This paper introduces a dual-network human action recognition method that leverages the strengths of both 2D and 3D convolutional neural networks (CNNs). Initially, a 2D CNN detects hu-man targets, followed by the generation of time-sequence data for these tar-gets using a matching method. Human action recognition is then achieved through a 3D CNN. This method addresses the challenges of extracting and modeling temporal action characteristics, enabling accurate human action localization and classification. Five distinct action recognition datasets were constructed to train the 3D CNN, which were subsequently deployed on an edge intelligent platform. Empirical testing of the action recognition system demonstrated a recognition speed of 10 frames per second, with accuracy exceeding 80% for certain action categories. © 2024 IEEE.

关键词： video analysis

来源：评论

学校读者我要写书评

暂无评论

IMPROVING video COLORIZATION BY TEST-time TUNING 30

IMPROVING VIDEO COLORIZATION BY TEST-TIME TUNING

引用

30th IEEE International conference on image processing (ICIP)

作者： Zhao, Yaping Zheng, Haitian Luo, Jiebo Lam, Edmund Y. Univ Hong Kong Pokfulam Hong Kong Peoples R China Univ Rochester Rochester NY 14627 USA ACCESS AI Chip Ctr Emerging Smart Syst Hong Kong Peoples R China

ISBN: (纸本)9781728198354

With the advancements in deep learning, video colorization by propagating color information from a colorized reference frame to a monochrome video sequence has been well explored. However, the existing approaches often suffer from overfitting the training dataset and sequentially lead to suboptimal performance on colorizing testing samples. To address this issue, we propose an effective method, which aims to enhance video colorization through test-time tuning. By exploiting the reference to construct additional training samples during testing, our approach achieves a performance boost of 1 similar to 3 dB in PSNR on average compared to the baseline. Code is available at: https://***/IndigoPurple/T3.

关键词： video colorization video restoration image processing

来源：评论

学校读者我要写书评

暂无评论

Omnidirectional stereo video using a hybrid representation

引用

MULtimeDIA TOOLS AND APPLICATIONS 2023年第3期82卷 3995-4010页

作者： Ai, Xiaofei Wang, Yigang Chen, Xiaodiao Li, Hong Hangzhou Dianzi Univ Sch Comp Sci Hangzhou Peoples R China Hangzhou Dianzi Univ Sch Media & Design Hangzhou Peoples R China

Compared with the traditional video, omnidirectional stereo video (ODSV) provides a larger field of view (FOV) with depth perception but makes the capturing, processing and displaying more complicated. Even though many attempts have been made to address these challenges, they leave one or more of the following problems: complicated camera rig, high latency and visible distortions. This paper presents a practical end-to-end solution based on a novel hybrid representation to solve these problems simultaneously. The proposed solution is directly from capturing to displaying, which removes the processing step, thus reducing the total time consumption and visible stitching distortions. This hybrid representation is piecewise linear about the horizontal viewing direction whose domain of definition is 0 degrees to 360 degrees with an assumption that the background is static, consisting of both static and moving regions. Using this representation, ODSV can be presented by omnidirectional stereo images and normal stereo pair of videos respectively. Moreover, a single panoramic camera strategy can be adopted to capture the omnidirectional stereo images in real environment and a normal binocular camera can be used to capture the stereo pair of videos. To display the ODSV, this paper presents a real-time tracking-based rendering algorithm for head mounted display (HMD). Experiments show that the proposed method is effective and cost-efficient. In contrast to state-of-the-art methods, the proposed method significantly reduces the complexity of camera rig and data amount, preserving a competitive stereo quality without visible distortions.

关键词： Omnidirectional stereo Stereo video Stereo panorama image based rendering Head mounted display

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：