The current deep learning-based target detection algorithms have problems such as the network perception domain being limited, poor adaptation to scale changes, feature mismatch in feature fusion, and smalldatasets. ...
详细信息
ISBN:
(纸本)9798350350920
The current deep learning-based target detection algorithms have problems such as the network perception domain being limited, poor adaptation to scale changes, feature mismatch in feature fusion, and smalldatasets. Aiming at the current problems in the field of infrared target detection, a global infrared image detection method based on graph convolutional neural network is proposed. In this paper, global feature interaction module and feature pyramid module are designed. It also proposes a graph-based knowledge distillation model compression method to provide support for hardware deployment. Finally, the algorithm proposed in this paper is experimentally demonstrated, using the classical infrared small target dataset for experiments, comparing the mainstream infrared small target detection algorithms, comparing and verifying that the algorithm of this paper has an effective performance enhancement in infrared smalltargets in infrared targets. Design ablation experiments to verify the performance of individual modules and fusion modules[2], to prove the effectiveness and enhancement of the module. Finally, the visualization analysis facilitates the subjective evaluation by the human eye, proving the excellence of this paper's algorithm.
smalltargets in infrared imagery exhibit challenging characteristics due to their minimal semantic information and the extremely imbalanced distribution between the targets and the background. In this paper, we propo...
详细信息
In complex scenarios, the utilization of temporal motion information can improve the detection performance of infrared small and dim targets. However, existing multiframe methods only consider short-term motion inform...
详细信息
In complex scenarios, the utilization of temporal motion information can improve the detection performance of infrared small and dim targets. However, existing multiframe methods only consider short-term motion information at each moment, which is difficult to capture reliable motion information for small and dim targets. In addition, existing multiframe data-driven methods generally utilize complex network structures which have longer inference times compared with the single-frame models of infrared target detection. Such a problem limits the applicability of the existing multiframe methods. In this article, we propose a long-term optical flow (OF)-based infrared small target motion pattern extractor (IR-MPE) to generate long-term OF energy (OFE) maps, which reflect the motion patterns of targets at the current moment. First, we design a long-term OF adaptive accumulation module (OFAAM) to adaptively control the update of current motion information and the retention of previous motion information. Second, we design an offset correction module (OCM), which is embedded in the OFAAM module to rectify the OFE from the previous frame. Meanwhile, the OCM also corrects the output of the previous frame to assist in the detection of the current frame. Embedding our IR-MPE module into the existing single-frame methods can easily extend them as multiframe methods. The only modification is adding an extra input channel of the first layer whose input is set as the concatenation of our OFE and the original infrared image. Such a simple structure can significantly improve the detection accuracy while maintaining fast inference speed. Extensive experiments on various public datasets show that our approach outperforms the state-of-the-art methods in challenging scenarios.
Self-supervised learning has recently demonstrated significant success in various speech processing applications. Recent studies report that pre-training with contextualized continuous targets plays a crucial role in ...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
Self-supervised learning has recently demonstrated significant success in various speech processing applications. Recent studies report that pre-training with contextualized continuous targets plays a crucial role in fine-tuning for better speech downstream tasks. However, unlike the continuous targets, it is challenging to produce contextualized targets on discrete space due to unstable training. To address this issue, we introduce a new hierarchical product quantizer that enables the full utilization of multi-layer features by reducing the possible case of quantized targets and preventing mode collapse through diversity loss for all codebooks. Our ablation study confirms the effectiveness of the proposed quantizer and contextualized discrete targets. For supervised ASR, the proposed model outperforms wav2vec2 and showed comparable results with data2vec. In addition, for unsupervised ASR, the proposed method surpasses two baselines.
Target detection methods based on multidimensional features are often applied to the detection of smalltargets in sea clutter. However, existing algorithms do not fully utilize the correlation information between fea...
详细信息
Deep learning-based infrared small target detection (IRSTD) methods typically exploit spatial domain cues to infer dim and weak infrared targets. However, relying solely on spatial domain information is sub-optimal du...
详细信息
The objective of infrared multi-frame super-resolution for smalltargets is to enhance the target's resolution by leveraging complementary information from multiple frames. However, the presence of motion variatio...
详细信息
The task of infrared dense small target detection aims to accurately locate densely distributed thermal radiation targets in complex scenes. However, in complex scenes, dense smalltargets often suffer from occlusion,...
详细信息
Object detection in unmanned aerial vehicle (UAV) images has become an important research area in computer vision due to its unique value and challenges. UAV images are characterized by densely distributed small targe...
详细信息
Object detection in unmanned aerial vehicle (UAV) images has become an important research area in computer vision due to its unique value and challenges. UAV images are characterized by densely distributed smalltargets, significant changes in target scale, and background noise, which affect the accuracy and reliability of detection. To address these issues, we propose an small target detection network based on Enhanced Scale Sequence Fusion and channel space fusion cross-attention mechanism, called *** tackle the high proportion of smalltargets and scale variation in UAV images, we employ Enhanced Scale Sequence Fusion, integrating fine-grained information from shallow feature maps and semantic information from deep feature maps. Additionally, we incorporate an tiny target detection head to enhance the network's ability to extract fine-grained information features for smalltargets. To address the issue of background noise, we propose a channel space fusion cross-attention mechanism, which first performs attention calculation on local patch block feature maps, and then performs attention calculation global patch blocks. This captures both long-range dependencies and detailed information. The method for calculating attention combines spatial description information and channel description *** experiments were conducted to validate the effectiveness of the model on the VisDrone benchmark dataset, UAVDT dataset and our self-made UAV power inspection dataset PIDrone. In comparison to the YOLOv8s model, the CSFCANet demonstrated an improvement in mAP of 7% on the PIDrone, 2.4% on the VisDrone, and 3.6% on the UAVDT.
In this study, Cross-YOLO, an enhanced version of the YOLOv8 model, is specifically designed to address the challenge of detecting small objects in UAV target detection scenarios. The model refines the original YOLOv8...
详细信息
In this study, Cross-YOLO, an enhanced version of the YOLOv8 model, is specifically designed to address the challenge of detecting small objects in UAV target detection scenarios. The model refines the original YOLOv8 through several innovative improvements: Firstly, in order to improve the detection accuracy of smalltargets, we propose Cross-FPN to bolster the original FPN. Secondly, we have redesigned a lightweight detection head, DELDH, to solve the problem of network bloat caused by the introduction of small object detection heads. Thirdly, a new attention mechanism CMCA is designed, that unifies the Coordinate attention mechanism with the Multi-scale convolutional attention mechanism to further enhance the feature extraction of smalltargets. Finally, the WIoU loss function is introduced to improve the accuracy of bounding box regression and improve detection performance. Experimental data in the Visdrone dataset indicate that, under the condition of the selected model size n, Cross-YOLO achieves a substantial reduction of 35.4% in parameter count compared to YOLOv8n, with only a marginal increase of 7.8% in computational load, and a significant improvement of 5.3% in mAP0.5\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$mAP_{0.5}$$\end{document}. Furthermore, its strong performance on the DOTA v1.5 and TinyPerson datasets confirms the model's generalization capabilities and practical applicability.
暂无评论