Unmanned Aerial Vehicle (UAV) aerial image target detection mainly faces the problems of small targets and target occlusion. In order to improve detection accuracy while maintaining efficiency, this work introduces a ...
详细信息
Unmanned Aerial Vehicle (UAV) aerial image target detection mainly faces the problems of small targets and target occlusion. In order to improve detection accuracy while maintaining efficiency, this work introduces a UAV aerial image small object detection model based on the real-time detector YOLOv7(SOD-YOLOv7). To address the challenge of small object detection, we have designed a module that combines Swin Transformer and convolution to better capture the global context information of small objects in the image. Additionally, we have introduced the Bi-Level Routing Attention (BRA) mechanism to enhance the model's focus on small objects. To improve the model's detection capabilities at multiple scales, we have added detection branches. For the issue of detecting occluded objects, we have incorporated a dynamic detection head with deformable convolution and attention mechanisms to enhance the model's spatial awareness of targets. The experimental results on the VisDrone and CARPK unmanned aerial vehicle image datasets show that the average precision (mAP@0.5) of our model reaches 53.2% and 98.5%, respectively. Compared to the original YOLOv7 method, our model achieves an improvement of 4.3% and 0.3%, demonstrating better performance in detecting small objects. The code will be soon released at https://***/Gentle-Hui/SOD-YOLOv7.
In HTTP Adaptive Streaming (HAS), a video is encoded at multiple bitrate-resolution pairs, referred to as representations, which enables users to choose the most suitable representation based on their network connecti...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
In HTTP Adaptive Streaming (HAS), a video is encoded at multiple bitrate-resolution pairs, referred to as representations, which enables users to choose the most suitable representation based on their network connection. To optimize the set of bitrate-resolution pairs and improve the Quality of Experience (QoE) for users, it is of utmost importance to measure the quality of the representations. VMAF is a highly reliable metric used in HAS to assess the quality of representations. However, in practice, using it for optimization can be a very time-consuming process, and it is infeasible for live streaming applications. To tackle its high complexity, our paper introduces a new method called VQM4HAS, which extracts low-complexity features, including (i) video complexity features, (ii) bitstream features logged during the encoding process, and (iii) basic video quality metrics. These extracted features are then fed into a regression model to predict VMAF. Our experimental results demonstrate that VQM4HAS achieves a high Pearson Correlation Co-efficient (PCC) with VMAF, ranging from 0.95 to 0.96 depending on the resolution. However, it exhibits significantly lower complexity, making it suitable for live streaming scenarios.
video packet loss during network transmission can lead to visible artifacts, freezing, and interruptions in video playback. Packet loss concealment techniques aim to mitigate these effects by concealing missing packet...
详细信息
The proceedings contain 114 papers. The topics discussed include: research on intelligent English-Chinese translation proofreading system based on gated feedback re current neural networks;intelligent recognition of s...
ISBN:
(纸本)9798350374407
The proceedings contain 114 papers. The topics discussed include: research on intelligent English-Chinese translation proofreading system based on gated feedback re current neural networks;intelligent recognition of structures in earth and rock dam images based on MASK-RCNN;a pupil diameter measurement system based on imageprocessing;image enhancement and deep learning in predicting the Gleason score of transcrectal ultrasound images of prostate cancer;research on video logo removal processing method based on MATLAB;analysis and evaluation of quality control throughout production of real scene 3D modeling based on oblique aerial photography;improving remote sensing image classification through stochastic bilevel optimization;cross-domain image translation algorithm based on self-cross auto-encoder;a fast image mosaic algorithm based on feature partition extraction;and video compression and action recognition in self-supervised learning.
This study explores the most effective method for impact measurement in laser shooting ranges, crucial for security training, accident prevention, and cost reduction. It utilizes video surveillance and image processin...
详细信息
In order to monitor substation intelligently, it is of significance to obtain substation instrument automatically and accurately. This paper adopts the digital instrument of the substation in the actual scene as the r...
详细信息
In order to monitor substation intelligently, it is of significance to obtain substation instrument automatically and accurately. This paper adopts the digital instrument of the substation in the actual scene as the research object and proposes a detection and identification method based on the improved YOLO-v3 for the substation digital instrument. In order to enrich the limited image data, this paper augments the specific image data of the number of substations collected and constructs the data set. Based on YOLO-v3, aiming at the problem of the accuracy of substation instrument detection and identification, and considering the real-time performance comprehensively, this pager proposes an improved YOLO-v3 model by using PANet structure. The effectiveness of the proposed method is verified according to the substation digital instrument detection experiment. Experimental results show that the improved YOLO-v3 is 0.23% higher than the classical YOLO-v3 network concerning mean average precision, and it has better accuracy in substation digital instrument detection and identification. The proposed method can still guarantee a real-time performance, and the detection frames per second (FPS) of imageprocessing is 29 f/s;it meets the actual substation intelligent data acquisition, detection and identification engineering needs.
Limited by hardware conditions and complex degradation processes, aerial images obtained by drone reconnaissance are usually blurry data lacking high-frequency information. To address this problem, many image deblurri...
详细信息
Limited by hardware conditions and complex degradation processes, aerial images obtained by drone reconnaissance are usually blurry data lacking high-frequency information. To address this problem, many image deblurring algorithms have been proposed. Although significant progress has been made, there are still some challenges in aerial image deblurring, such as low-performance deblurring and non real-timeprocessing. In this work, we propose a progressive residual recurrent network (PRRN) for aerial image deblurring and make four contributions to overcoming the above challenges: (1) We design a lightweight encoder-decoder module (LEDM) which includes the progressive residual block and the feature recurrent structure (FRS), and we can control the number of LEDMs to balance the deblurring efficiency and performance. (2) We present the progressive residual block, which adopts simple gate to reduce the system complexity and introduces layer normalization to stabilize the training process. (3) We present the FRS composed of feature map recurrence and latent code recurrence to retain and remove the feature information of previous encoder-decoder modules. (4) We adopt aerial images from DOTA dataset as the initial data and use the motion blur kernel to generate blurry aerial images, aiming at forming a dataset named AID for aerial image deblurring. Extensive experiments on synthetic and our datasets prove the superior performance of PRRN in terms of quantitative and qualitative evaluation. Notably, our proposed network reaches 30.80 dB PSNR on AID dataset and 77.73% mAP on realistic blurry aerial images, which achieves state-of-the-art deblurring performance.
This paper presents a comprehensive method for real-time 3D human skeleton reconstruction from calibrated camera sets, addressing challenges in scenes with multiple individuals. Accurate 3D pose estimation is crucial ...
详细信息
ISBN:
(纸本)9798400716607
This paper presents a comprehensive method for real-time 3D human skeleton reconstruction from calibrated camera sets, addressing challenges in scenes with multiple individuals. Accurate 3D pose estimation is crucial for various applications such as 3D model animation, augmented reality, and human-computer interaction. The approach involves initial 2D skeleton estimation, followed by skeleton identification through a matching algorithm and reconstruction via triangulation. Three key enhancements were implemented: refining the matching algorithm using 3D reconstruction reprojection, accelerating execution with skeleton tracking, and validation on a diverse dataset with over 9,000 frames. The method achieves accurate 3D reconstruction and robust performance in multi-individual scenarios, making it suitable for real-world applications. Project page: https://instant- ***.
This paper provides an overview of neural style transfer techniques, focusing on multi-style and real-time applications for both images and videos. Multi-style transfer refers to the technique of combining several art...
详细信息
The video platforms that users watch leak the privacy of their preferences. More and more video streaming is being encrypted to protect users' privacy. In addition, many users use VPN to enhance their privacy prot...
详细信息
The video platforms that users watch leak the privacy of their preferences. More and more video streaming is being encrypted to protect users' privacy. In addition, many users use VPN to enhance their privacy protection further. VPN makes video platform identification challenging because it poses traffic obfuscation and further data encryption. Although the segment-based transmission mechanism and Variable BitRate encoding in HAS make network video traffic show still identifiable patterns, most existing work cannot distinguish different platforms due to the similarity of video streaming. Therefore, we propose a traffic-based side-channel attack method to identify VPN video streaming platforms in realtime. The aggregated feature sequence of the unidirectional video streaming is extracted to significantly retain the characteristics of different video platforms. Experiments on 10Gbps backbone background traffic show that the F1-score of the method exceeds 97% and can be processed in realtime. In addition, we verify the method's robustness on datasets with different path features and encryption techniques. A comparison with similar methods shows that our method only requires 1/1260 of the storage and 1/60 of the processingtime to identify accurately.
暂无评论