检索结果-内蒙古大学图书馆

Small object detection model for UAV aerial image based on YOLOv7

SIGNAL image AND video processing 2024年第3期18卷 2695-2707页

作者： Chen, Jinguang Wen, Ronghui Ma, Lili Xian Polytech Univ Sch Comp Sci Shaanxi Key Lab Clothing Intelligence Xian 710048 Peoples R China

Unmanned Aerial Vehicle (UAV) aerial image target detection mainly faces the problems of small targets and target occlusion. In order to improve detection accuracy while maintaining efficiency, this work introduces a UAV aerial image small object detection model based on the real-time detector YOLOv7(SOD-YOLOv7). To address the challenge of small object detection, we have designed a module that combines Swin Transformer and convolution to better capture the global context information of small objects in the image. Additionally, we have introduced the Bi-Level Routing Attention (BRA) mechanism to enhance the model's focus on small objects. To improve the model's detection capabilities at multiple scales, we have added detection branches. For the issue of detecting occluded objects, we have incorporated a dynamic detection head with deformable convolution and attention mechanisms to enhance the model's spatial awareness of targets. The experimental results on the VisDrone and CARPK unmanned aerial vehicle image datasets show that the average precision (mAP@0.5) of our model reaches 53.2% and 98.5%, respectively. Compared to the original YOLOv7 method, our model achieves an improvement of 4.3% and 0.3%, demonstrating better performance in detecting small objects. The code will be soon released at https://***/Gentle-Hui/SOD-YOLOv7.

关键词： UAV image detection Small object detection YOLOv7 Swin transformer Detection head

来源：评论

学校读者我要写书评

暂无评论

A real-time video QUALITY METRIC FOR HTTP ADAPTIVE STREAMING 49

A REAL-TIME VIDEO QUALITY METRIC FOR HTTP ADAPTIVE STREAMING

引用

49th IEEE International conference on Acoustics, Speech, and Signal processing (ICASSP)

作者： Amirpour, Hadi Zhu, Jingwen Le Callet, Patrick Timmerer, Christian Alpen Adria Univ Christian Doppler Lab ATHENA Klagenfurt Austria Nantes Univ CNRS Ecole Centrale Nantes LS2NUMR 6004IUF Nantes France

ISBN: (纸本)9798350344868;9798350344851

In HTTP Adaptive Streaming (HAS), a video is encoded at multiple bitrate-resolution pairs, referred to as representations, which enables users to choose the most suitable representation based on their network connection. To optimize the set of bitrate-resolution pairs and improve the Quality of Experience (QoE) for users, it is of utmost importance to measure the quality of the representations. VMAF is a highly reliable metric used in HAS to assess the quality of representations. However, in practice, using it for optimization can be a very time-consuming process, and it is infeasible for live streaming applications. To tackle its high complexity, our paper introduces a new method called VQM4HAS, which extracts low-complexity features, including (i) video complexity features, (ii) bitstream features logged during the encoding process, and (iii) basic video quality metrics. These extracted features are then fed into a regression model to predict VMAF. Our experimental results demonstrate that VQM4HAS achieves a high Pearson Correlation Co-efficient (PCC) with VMAF, ranging from 0.95 to 0.96 depending on the resolution. However, it exhibits significantly lower complexity, making it suitable for live streaming scenarios.

关键词： video quality HAS VMAF QoE bitstream

来源：评论

学校读者我要写书评

暂无评论

real-time video PLC Using In-Painting 16

Real-Time Video PLC Using In-Painting

引用

16th International conference on COMmunication Systems and NETworkS, COMSNETS 2024

作者： Katiyar, Rajani Chakraborty, Prasenjit Surana, Ravi Holla, Ravishankar Sanjana, Sanka Acharya, Sathvik Sonia Singh, B. Agrawal, Yash Rv College of Engineering Cse Bangalore India Samsung R&d Institute Bangalore India

ISBN: (纸本)9798350383119

video packet loss during network transmission can lead to visible artifacts, freezing, and interruptions in video playback. Packet loss concealment techniques aim to mitigate these effects by concealing missing packets and enhancing the viewing experience. This work focuses on real-time concealment of packet loss using a deep learning model deployed within an Android application. The objective is to utilize frame prediction to conceal packet loss and improve video quality. This work presents a novel approach to real-time concealment of video packet loss, addressing the absence of existing solutions for the real time concealment. According to need of the work dataset was created, meticulously curated to facilitate the training and testing of a deep learning model. A deep learning model is integrated into an Android application for real-time packet loss detection. Frame prediction conceals lost packets by generating predicted frames, reducing the impact of packet loss for uninterrupted viewing. The implemented model achieves efficient packet loss detection and frame prediction, with an average prediction time of under 30ms per frame. This rapid prediction time contributes to minimal latency and reduced visual artifacts during packet loss concealment. In response to the challenges inherent in real-time Packet Loss Concealment (PLC), this work presents a solution that prioritizes the enhancement of video quality, all while meticulously managing stringent low inference time requirements, minimizing battery consumption, and seamlessly handling the real-time processing of video data, most of which have not been adequately addressed in existing literature. © 2024 IEEE.

关键词： Android Application Deep Learning Frame Prediction Packet Loss Packet Loss Concealment video Transmission

来源：评论

学校读者我要写书评

暂无评论

Proceedings - 2024 Asia-Pacific conference on image processing, Electronics and Computers, IPEC 2024

Proceedings - 2024 Asia-Pacific Conference on Image Processi...

引用

5th Asia-Pacific conference on image processing, Electronics and Computers, IPEC 2024

ISBN: (纸本)9798350374407

The proceedings contain 114 papers. The topics discussed include: research on intelligent English-Chinese translation proofreading system based on gated feedback re current neural networks;intelligent recognition of structures in earth and rock dam images based on MASK-RCNN;a pupil diameter measurement system based on image processing;image enhancement and deep learning in predicting the Gleason score of transcrectal ultrasound images of prostate cancer;research on video logo removal processing method based on MATLAB;analysis and evaluation of quality control throughout production of real scene 3D modeling based on oblique aerial photography;improving remote sensing image classification through stochastic bilevel optimization;cross-domain image translation algorithm based on self-cross auto-encoder;a fast image mosaic algorithm based on feature partition extraction;and video compression and action recognition in self-supervised learning.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Boosting Accuracy and Reducing time in Laser Shooting Practices: video processing Solutions for Impact Evaluation 1

引用

8th International conference on Information and Communication Technology for Intelligent Systems, ICTIS 2024

作者： Rivas-Lalaleo, David Bautista-Naranjo, Víctor Vayas-Ortega, Germania Iza-Chango, Erika Lema-Jumbo, Pamela Bran, Carlos Universidad de las Fuerzas Armadas—ESPE Sangolquí Ecuador Universidad Indoamerica Ambato Ecuador Instituto Investigación e Innovación en Electrónica Universidad Don Bosco Soyapango El Salvador

ISBN: (数字)9789819757992

ISBN: (纸本)9789819757985

This study explores the most effective method for impact measurement in laser shooting ranges, crucial for security training, accident prevention, and cost reduction. It utilizes video surveillance and image processing algorithms in Python to accurately determine laser impact locations. The research found that employing a single target per silhouette is most efficient, minimizing errors to ±1 impact and enabling heat map generation for feedback. The combination of video surveillance and image processing significantly enhances safety and reduces costs by providing real-time data. The findings advocate for the single target approach due to its precision and the comprehensive insights gained from heat maps, maintaining accuracy within a two-hit margin. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.

关键词： video signal processing

来源：评论

学校读者我要写书评

暂无评论

Detection and recognition of digital instrument in substation using improved YOLO-v3

引用

SIGNAL image AND video processing 2023年第6期17卷 2971-2979页

作者： Shi, Huibin Hua, Zexi Chen, Jianyi Tang, Yongchuan He, Rujiang Southwest Jiaotong Univ Sch Informat Sci & Technol Chengdu 611756 Sichuan Peoples R China China Railway Guangzhou Grp Co Ltd Guangzhou 510088 Guangdong Peoples R China Northwestern Polytech Univ Sch Microelect Xian 710072 Shaanxi Peoples R China Qianghua Times Chengdu Technol Co Ltd Chengdu 610095 Peoples R China Southwest Jiaotong Univ Sch Electical Engn Chengdu 611756 Sichuan Peoples R China

In order to monitor substation intelligently, it is of significance to obtain substation instrument automatically and accurately. This paper adopts the digital instrument of the substation in the actual scene as the research object and proposes a detection and identification method based on the improved YOLO-v3 for the substation digital instrument. In order to enrich the limited image data, this paper augments the specific image data of the number of substations collected and constructs the data set. Based on YOLO-v3, aiming at the problem of the accuracy of substation instrument detection and identification, and considering the real-time performance comprehensively, this pager proposes an improved YOLO-v3 model by using PANet structure. The effectiveness of the proposed method is verified according to the substation digital instrument detection experiment. Experimental results show that the improved YOLO-v3 is 0.23% higher than the classical YOLO-v3 network concerning mean average precision, and it has better accuracy in substation digital instrument detection and identification. The proposed method can still guarantee a real-time performance, and the detection frames per second (FPS) of image processing is 29 f/s;it meets the actual substation intelligent data acquisition, detection and identification engineering needs.

关键词： Digital instrument recognition image detection YOLO-v3 Data augmentation

来源：评论

学校读者我要写书评

暂无评论

Aerial image deblurring via progressive residual recurrent network

引用

SIGNAL image AND video processing 2024年第8-9期18卷 5879-5892页

作者： Liu, Keshun Zhang, Yuhua Li, Aihua Wang, Changlong Ma, Xiaolin Army Engn Univ PLA Shijiazhuang Campus Shijiazhuang 050003 Peoples R China

Limited by hardware conditions and complex degradation processes, aerial images obtained by drone reconnaissance are usually blurry data lacking high-frequency information. To address this problem, many image deblurring algorithms have been proposed. Although significant progress has been made, there are still some challenges in aerial image deblurring, such as low-performance deblurring and non real-time processing. In this work, we propose a progressive residual recurrent network (PRRN) for aerial image deblurring and make four contributions to overcoming the above challenges: (1) We design a lightweight encoder-decoder module (LEDM) which includes the progressive residual block and the feature recurrent structure (FRS), and we can control the number of LEDMs to balance the deblurring efficiency and performance. (2) We present the progressive residual block, which adopts simple gate to reduce the system complexity and introduces layer normalization to stabilize the training process. (3) We present the FRS composed of feature map recurrence and latent code recurrence to retain and remove the feature information of previous encoder-decoder modules. (4) We adopt aerial images from DOTA dataset as the initial data and use the motion blur kernel to generate blurry aerial images, aiming at forming a dataset named AID for aerial image deblurring. Extensive experiments on synthetic and our datasets prove the superior performance of PRRN in terms of quantitative and qualitative evaluation. Notably, our proposed network reaches 30.80 dB PSNR on AID dataset and 77.73% mAP on realistic blurry aerial images, which achieves state-of-the-art deblurring performance.

关键词： Aerial image deblurring Progressive residual recurrent network Lightweight encoder-decoder module Progressive residual block Feature recurrent structure AID dataset

来源：评论

学校读者我要写书评

暂无评论

real-time 3D Skeleton Reconstruction: A Comprehensive Approach from Multiple Views 24

Real-Time 3D Skeleton Reconstruction: A Comprehensive Approa...

引用

2nd Asia conference on Computer Vision, image processing and Pattern Recognition (CVIPPR)

作者： Wilser, Nicola Cordier, Frederic Maillot, Yvan Souchet, Philippe Univ Haute Alsace Bry Sur Marnes France XD Prod Bry Sur Marnes France Univ Haute Alsace Mulhouse France

ISBN: (纸本)9798400716607

This paper presents a comprehensive method for real-time 3D human skeleton reconstruction from calibrated camera sets, addressing challenges in scenes with multiple individuals. Accurate 3D pose estimation is crucial for various applications such as 3D model animation, augmented reality, and human-computer interaction. The approach involves initial 2D skeleton estimation, followed by skeleton identification through a matching algorithm and reconstruction via triangulation. Three key enhancements were implemented: refining the matching algorithm using 3D reconstruction reprojection, accelerating execution with skeleton tracking, and validation on a diverse dataset with over 9,000 frames. The method achieves accurate 3D reconstruction and robust performance in multi-individual scenarios, making it suitable for real-world applications. Project page: https://instant- ***.

关键词： 3D Human Pose Estimation Temporal Coherence in 3D Reconstruction real-time 3D Skeleton Reconstruction Multi-view image processing

来源：评论

学校读者我要写书评

暂无评论

Advances in Multi-Style and real-time Transfer 1

Advances in Multi-Style and Real-Time Transfer

引用

1st International conference on Software, Systems and Information Technology, SSITCON 2024

作者： Kini, Nisha Kamath, Vaishnavi Kelkar, Tanay Kanikar, Prashasti Nanade, Archana Kolhe, Abhay SVKM's Nmims Mpstme Dept. of Computer Engineering Mumbai India

ISBN: (纸本)9798350352931

This paper provides an overview of neural style transfer techniques, focusing on multi-style and real-time applications for both images and videos. Multi-style transfer refers to the technique of combining several art styles based on a given content image to create a unique and intriguing visual result. In contrast, real-time style transfer involves applying artistic styles almost instantaneously using pre-trained feedforward networks, making it particularly well-suited for video editing and live broadcasts. This paper reviews the most significant research and technical advancements in the field, while also addressing existing challenges that hinder the full potential of these transformative technologies. Furthermore, attention is given to emerging applications and the areas still in need of improvement, to enhance understanding and practical implementation. © 2024 IEEE.

关键词： image Style Tranfer Multi-Style Transfer real-time Style Transfer video Style Tranfer

来源：评论

学校读者我要写书评

暂无评论

real-time Platform Identification of VPN video Streaming Based on Side-Channel Attack 38th

Real-Time Platform Identification of VPN Video Streaming Bas...

引用

38th International conference on Information Security and Privacy Protection (SEC)

作者： Lu, Anting Wu, Hua Luo, Hao Cheng, Guang Hu, Xiaoyan Southeast Univ Sch Cyber Sci & Engn Nanjing Peoples R China Jiangsu Prov Engn Res Ctr Secur Ubiquitous Networ Nanjing Peoples R China

ISBN: (纸本)9783031563287;9783031563263;9783031563256

The video platforms that users watch leak the privacy of their preferences. More and more video streaming is being encrypted to protect users' privacy. In addition, many users use VPN to enhance their privacy protection further. VPN makes video platform identification challenging because it poses traffic obfuscation and further data encryption. Although the segment-based transmission mechanism and Variable BitRate encoding in HAS make network video traffic show still identifiable patterns, most existing work cannot distinguish different platforms due to the similarity of video streaming. Therefore, we propose a traffic-based side-channel attack method to identify VPN video streaming platforms in real time. The aggregated feature sequence of the unidirectional video streaming is extracted to significantly retain the characteristics of different video platforms. Experiments on 10Gbps backbone background traffic show that the F1-score of the method exceeds 97% and can be processed in real time. In addition, we verify the method's robustness on datasets with different path features and encryption techniques. A comparison with similar methods shows that our method only requires 1/1260 of the storage and 1/60 of the processing time to identify accurately.

关键词： video Streaming Side-channel attack VPN Privacy

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：