Aiming to detect novel objects from only a few annotated samples, few-shot object detection (FSOD) has undergone remarkable development. Previous works rarely pay attention to the perspective of gradient propagation t...
详细信息
Aiming to detect novel objects from only a few annotated samples, few-shot object detection (FSOD) has undergone remarkable development. Previous works rarely pay attention to the perspective of gradient propagation to optimize existing methods, therefore failing to make full use of information for novel objects in gradient propagation. We propose a method to solve this problem based on two-stage fine-tuning. A domain adaptation module with multi-constraints is used to promote the spread of gradients, a classification promotion network is used to improve the effect of classification, and a multi-path mask head is added to enrich RoI features. Experiments on PASCAL VOC and COCO datasets show that our model significantly raises the performance compared with previous methods (up to 1-5%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document} in average).
few-shot object detection (FSOD) based on fine-tuning is essential for analyzing optical remote sensing images. However, existing methods mainly focus on natural images and overlook the scale variations in remote sens...
详细信息
few-shot object detection (FSOD) based on fine-tuning is essential for analyzing optical remote sensing images. However, existing methods mainly focus on natural images and overlook the scale variations in remote sensing images, leading to feature confusion among foreground instances of different classes. Additionally, since only a subset of instances are labeled in FSOD training data, the model might mistakenly treat unlabeled instances as background, leading to confusion between foreground features and background features, particularly those of novel classes. The preceding phenomenon indicates that severe feature confusion in remote sensing FSOD hampers the ability of the model to accurately classify and localize instances. To address these issues, this paper proposes a two-stage FSOD framework based on transfer learning via pseudo-sample generation and feature enhancement (PSGFE), including pseudo-sample generation module (PSGM) and feature enhancement module (FEM). The former reduces the feature confusion between foreground and background by generating pseudo-samples for unannotated background areas. The latter dynamically captures and enhances multi-scale features on the region of interest (ROI), and extracts unique core information for each class to eliminate the feature confusion among foreground instances of different classes. Our method has been validated on the optical remote sensing datasets DIOR and RSOD. It demonstrates superior performance compared to existing methods.
few-shot object detection receives much attention with the ability to detect novel class objects using limited annotated *** transfer learning-based solution becomes popular due to its simple training with good accura...
详细信息
few-shot object detection receives much attention with the ability to detect novel class objects using limited annotated *** transfer learning-based solution becomes popular due to its simple training with good accuracy,however,it is still challenging to enrich the feature diversity during the training *** fine-grained features are also insufficient for novel class *** deal with the problems,this paper proposes a novel few-shot object detection method based on dual-domain feature fusion and patch-level *** original base domain,an elementary domain with more category-agnostic features is superposed to construct a two-stream backbone,which benefits to enrich the feature *** better integrate various features,a dual-domain feature fusion is designed,where the feature pairs with the same size are complementarily fused to extract more discriminative ***,a patch-wise feature refinement termed as patch-level attention is presented to mine internal relations among the patches,which enhances the adaptability to novel *** addition,a weighted classification loss is given to assist the fine-tuning of the classifier by combining extra features from FPN of the base training *** this way,the few-shotdetection quality to novel class objects is *** on PASCAL VOC and MS COCO datasets verify the effectiveness of the method.
few-shot object detection (FSOD) is affected by the long-tailed distribution of data and the discrepancy in sample quantities between base classes and novel classes, leading to evident data bias. As a result, the gene...
详细信息
ISBN:
(纸本)9789819784929;9789819784936
few-shot object detection (FSOD) is affected by the long-tailed distribution of data and the discrepancy in sample quantities between base classes and novel classes, leading to evident data bias. As a result, the generated feature distribution struggles to represent class features effectively. In scenarios with scarce samples, irrelevant factors in features may have a more significant impact on feature distribution and even dominate feature representation. To obtain more compact and accurate class-specific feature representations, this paper introduces the disentangled representation into few-shot object detection and proposes a semantic disentanglement representation meta-learning model, referred to as FSOD-SDR. Firstly, in the feature extraction phase, a feature information aggregation module is constructed to aggregate features from different scales of the backbone, thereby enabling a more comprehensive representation of support features containing limited information. Secondly, to address highly coupled features, background-relevant and label-relevant semantic factor distributions are simultaneously disentangled from aggregated features by a semantic disentanglement representation module. The label-relevant feature distribution can more accurately represent class features. To effectively achieve disentanglement of the goal, the Evidence Lower Bound (ELBO) loss function is extended during model optimization. Lastly, experiments on the PASCAL VOC and MS COCO datasets show that FSOD-SDR has a significant performance improvement (an average improvement of 5.7% across all metrics) over the previous state-of-the-art methods, achieving comparably good detection performance.
few-shot object detection (FSOD) in remote sensing images (RSIs) faces challenges such as data scarcity, difficulty in detecting small objects, and underutilization of frequency-domain information. Existing methods of...
详细信息
few-shot object detection (FSOD) in remote sensing images (RSIs) faces challenges such as data scarcity, difficulty in detecting small objects, and underutilization of frequency-domain information. Existing methods often rely on spatial-domain features, neglecting the complementary insights from low- and high-frequency characteristics. Additionally, their performance in detecting small objects is hindered by inadequate feature extraction in cluttered backgrounds. To tackle these problems, we propose a novel detection framework of Spatial-Frequency Interaction and Distribution Matching (SFIDM), which significantly enhances FSOD performance in RSIs. SFIDM focuses on rapid adaptation to target datasets and efficient fine-tuning with limited data. First, to improve feature representation, we introduce the Spatial-Frequency Interaction (SFI) module, which leverages the complementarity between low-frequency and high-frequency information. By decomposing input images into their frequency components, the SFI module extracts features critical for classification and precise localization, enabling the framework to capture fine details essential for detecting small objects. Secondly, to resolve the limitations of traditional label assignment strategies when dealing with small bounding boxes, we construct the Distribution Matching (DM) module, which models bounding boxes as 2D Gaussian distributions. This allows for the accurate detection of subtle offsets and overlapping or non-overlapping small objects. Moreover, to leverage the learned base-class information for improved performance on novel class detection, we employ a feature reweighting module, which adaptively fuses features extracted from the backbone network to generate representations better suited for downstream detection tasks. We conducted extensive experiments on two benchmark FSOD datasets to demonstrate the effectiveness and performance improvements achieved by the proposed SFIDM framework.
few-shot object detection (FSOD) methods can achieve detection of novel classes with only a small number of annotated samples and have received widespread attention in recent years. Meta-learning has been proven to be...
详细信息
few-shot object detection (FSOD) methods can achieve detection of novel classes with only a small number of annotated samples and have received widespread attention in recent years. Meta-learning has been proven to be a key technology for addressing few-shot problems. Typically, meta-learning-based methods require an additional support branch to extract class prototypes for the few-shot classes, and the detection head performs classification and detection by measuring the distance between the class prototypes and the query features. Since the input to the support branch is the object image annotated with a bounding box, it often contains a large amount of background information, which degrades the quality of the class prototypes. Through our meticulous observation, we found that the center of the bounding box is often the core feature area of the object. Based on this, we designed a lightweight Background Suppression (BS) module that suppresses background features by measuring the similarity between the peripheral and central features of the support features, thereby providing high-quality support features for class prototype extraction. Additionally, in terms of class prototype extraction, we designed a more robust Comprehensive Prototype Pyramid Distillation (CPPD) module. This module first captures the multi-scale feature information of the object from the background- suppressed support features, and then uses a pyramid structure to hierarchically distill the multi-scale features to extract more comprehensive and purer class prototypes. Extensive experimental results on the PASCAL VOC and COCO datasets show that compared to other models under the same architecture and techniques, we achieved the best results.
With the advancement of objectdetection technology, few-shot object detection (FSOD) has become a research hotspot. Existing methods face two major challenges: base models have limited generalization to unseen catego...
详细信息
With the advancement of objectdetection technology, few-shot object detection (FSOD) has become a research hotspot. Existing methods face two major challenges: base models have limited generalization to unseen categories, especially with limited few-shot data, where the shared feature representation fails to meet the distinct needs of classification and regression tasks;FSOD is susceptible to overfitting during training. To address these issues, this paper proposes a Multi-Task Decoupled Method (MTDM), which enhances the model's generalization to new categories by separating the feature extraction processes for different tasks. Additionally, a dynamic adjustment strategy is adopted, which adaptively modifies the IOU threshold and loss function parameters based on variations in the training data, reducing the risk of overfitting and maximizing the utilization of limited data resources. Experimental results show that the proposed hybrid model performs well on multiple few-shot datasets, effectively overcoming the challenges posed by limited annotated data.
few-shot object detection (FSOD) methods are mainly designed and evaluated on natural image datasets such as Pascal VOC and MS COCO. However, it is not clear whether the best methods for natural images are also the be...
详细信息
few-shot object detection (FSOD) methods are mainly designed and evaluated on natural image datasets such as Pascal VOC and MS COCO. However, it is not clear whether the best methods for natural images are also the best for aerial images. Furthermore, a direct comparison of performance between FSOD methods difficult due to the wide variety of detection frameworks and training strategies. To this end, our contributions are twofold. First, we propose a benchmarking framework that provides a flexible environment to implement and compare attention-based FSOD methods. The proposed framework focuses on attention mechanisms and is divided into three modules: spatial alignment, global attention, and fusion layer. To remain competitive with existing methods, which often leverage complex training, we propose new augmentation techniques designed specifically for objectdetection. Using this framework, several FSOD methods are reimplemented and compared. This comparison highlights two distinct performance regimes on aerial and natural images: FSOD performs worse on aerial images. Our experiments confirm that small objects account for the poor performance. Small objects are difficult to detect, however in the few-shot regime, this challenge is largely reinforced. While the small objectdetection issue is well-known, to our knowledge this few-shot complication has never been reported in the literature. Second, always within the proposed framework, we develop a novel alignment method called Cross-Scales Query-Support Alignment (XQSA) for FSOD, to improve the detection of small objects. XQSA significantly outperforms the state-of-the-art on DOTA and DIOR, two aerial image datasets.
In this paper, we propose an improved Region Proposal Network (RPN) by introducing a metric-based nonlinear classifier to compute the similarity between features extracted from the backbone network and those of new cl...
详细信息
In this paper, we propose an improved Region Proposal Network (RPN) by introducing a metric-based nonlinear classifier to compute the similarity between features extracted from the backbone network and those of new classes. This enhancement aims to improve the detection precision for candidate boxes of new classes and filter out candidate boxes with high Intersection of Union (IoU). Simultaneously, we introduce an attention-based Feature Aggregation Module (AFM) in Region of Interest (RoI) Align to aggregate feature information from different levels, obtaining more comprehensive information and feature representation to address the issue of missing feature information due to scale differences. Combining these two improvements, we present a novel few-shot object detection algorithm-IFA-FSOD. We conduct extensive experiments on datasets. Compared to some mainstream few-shot object detection algorithms, the IFA-FSOD algorithm can select more accurate candidate boxes, addressing issues of missed high IoU candidate boxes and incomplete feature information capture, resulting in higher precision.
few-shot object detection (FSOD) aims to detect the objects of novel classes using only a few manually annotated samples. With the few novel class samples, learning the inter-class relationships among foreground and c...
详细信息
few-shot object detection (FSOD) aims to detect the objects of novel classes using only a few manually annotated samples. With the few novel class samples, learning the inter-class relationships among foreground and constructing the corresponding class hierarchy in FSOD is a challenging task. The poor construction of the class hierarchy will result in the inter-class confusion problem, which has been identified as a primary cause of inferior performance in novel classes by recent FSOD methods. In this work, we further find that the intra-super-class confusion, where samples are misclassified as classes within their associated super-classes, is the main challenge in solving the confusion problem. To solve this issue, this work generates class-confusion-aware samples with a pre-defined tree-structure graph, for helping models to construct a precise class hierarchy. In precise, for generating class-confusion-aware samples, we add the noise into available samples and update the noise to maximize confidence scores on associated confusion categories of samples. Then, a confusion-aware curriculum learning strategy is proposed to make generated samples gradually participate in the training, which benefits the model convergence while learning the generated samples. Experimental results show that our method can be used as a plug-in in recent FSOD methods and consistently improve the model performance.
暂无评论