Drone objectdetection faces numerous challenges such as dense clusters with overlapping, scale diversity, and long-tail distributions. Utilizing tiling inference through uniform sliding window is an effective way of ...
详细信息
Drone objectdetection faces numerous challenges such as dense clusters with overlapping, scale diversity, and long-tail distributions. Utilizing tiling inference through uniform sliding window is an effective way of enlarging tiny objects and meanwhile efficient for real-world applications. However, merely partitioning input images may result in heavy truncation and an unexpected performance drop in large objects. Therefore, in this work, we strive to develop an improved tiling detection framework with both competitive performance and high efficiency. First, we formulate the tiling inference and training pipeline with a mixed data strategy. To avoid truncation and handle objects at all scales, we simultaneously perform global detection on the original image and local detection on corresponding sub-patches, employing appropriate patch settings. Correspondingly, the training data includes both original images and the patches generated by random online anchor-cropping, which can ensure the effectiveness of patches and enrich the image scenarios. Furthermore, a scale filtering mechanism is applied to assign objects at diverse scales to global and local detection tasks to keep the scale invariance of a detector and obtain optimal fused predictions. As most of the additional operations are performed in parallel, the tiling inference remains highly efficient. Additionally, we devise two augmentations customized for tiling detection to effectively increase valid annotations, which can generate more challenging drone scenarios and simulate the practical cluster with overlapping, especially for rare categories. Comprehensive experiments on both public drone benchmarks and our customized real-world images demonstrate that, in comparison to other drone detection frameworks, the proposed tiling framework can significantly improve the performance of general detectors in drone scenarios with lower additional computational costs.
In aerial image scenes, the objects have properties of arbitrary orientation, large-scale range, and dense distribution. Thus, the object detector uses an oriented bounding box (OBB) to locate objects, which is more c...
详细信息
In aerial image scenes, the objects have properties of arbitrary orientation, large-scale range, and dense distribution. Thus, the object detector uses an oriented bounding box (OBB) to locate objects, which is more complex and challenging than a horizontal bounding box (HBB) detector. Mainstream OBB detectors mostly use a one-to-many label assignment strategy to predict multiple bounding boxes for the same object and filter out repeat predictions by nonmaximum suppression (NMS). NMS ranks with confidence and drops the detection box with intersection over union (IoU) higher than the threshold, which makes it easy to get the local optimum result. The clustered synthesis method gets more accurate results than the original NMS, but applying it to the OBB detector leads to border shift, which arises from the angular discontinuity problem. Therefore, we use Gaussian OBB (G-OBB) to deal with the angular discontinuity and thus eliminate the offset generated by direct synthesis. G-OBB is not easy to understand and describe representation. For this reason, we analyze the properties of G-OBB and design a decoding method to convert a G-OBB to a rotated rectangular box, further discussing its conditions. Based on the decoding method, we propose a Gaussian synthesis (GauS) algorithm, which transforms the OBB into Gaussian space, followed by synthesis, and finally transforms the synthesis result back into a new OBB. We have derived the synthesis and decoding methods and further verified their effectiveness. The extensive experiments on several existing models show that GauS takes very little computation and improves the detector's high-precision performance. Extensive experiments verify the effectiveness, stability, and universality of the proposed algorithm. In addition, the RTMDet using GauS achieves a performance of 81.61 AP50 and gains a 0.39% improvement in mean average precision (mAP), which achieves the state-of-the-art (SOTA) performance. Our implementation is available a
Recently, the improvement of detection performance always relies on deeper convolutional layers and complex convolutional structures in remote sensing images, which significantly increases the storage space and comput...
详细信息
Recently, the improvement of detection performance always relies on deeper convolutional layers and complex convolutional structures in remote sensing images, which significantly increases the storage space and computational complexity of the detector. Although previous work has designed various novel lightweight convolutions, when these convolutional structures are applied to remote sensing detection tasks, the inconsistency between features and targets as well as between features and tasks in the detection architecture is often ignored: (1) The features extracted by convolution sliding in a fixed direction make it difficult to effectively model targets with arbitrary direction distribution, which leads to the detector needing more parameters to encode direction information and the network parameters being highly redundant;(2) The detector shares features from the backbone, but the classification task requires rotation-invariant features while the regression task requires rotation-sensitive features. This inconsistency in the task can lead to inefficient convolutional structures. Therefore, this paper proposed a detector that uses the Feature Decoupling for Lightweight Oriented objectdetection (FDLO-Det). Specifically, we constructed a rotational separable convolution that extracts rotational equivariant features while significantly compressing network parameters and computational complexity through highly shared parameters. Next, we introduced an orthogonal polarization transformation module that decomposes rotational equivariant features in both horizontal and vertical orthogonal directions, and used polarization functions to filter out the required features for classification and regression tasks, effectively improving detector performance. Extensive experiments on DOTA, HRSC2016, and UCAS-AOD show that the proposed detector can achieve the best performance and achieve an effective balance between computational complexity and detection accuracy.
Localization regression in oriented objectdetection tasks has long faced boundary discontinuity and angular discontinuity problems induced by periodic angles. These problems were successfully resolved by using a 2-D ...
详细信息
Localization regression in oriented objectdetection tasks has long faced boundary discontinuity and angular discontinuity problems induced by periodic angles. These problems were successfully resolved by using a 2-D Gaussian distribution to modeling the oriented bounding box (OBB). However, the angular information of square-like objects will be lost when they are converted to 2-D Gaussian distribution, forming a systematic problem. Its fundamental reason is that when the aspect ratio of the object tends to 1, the equiprobability curve of 2-D Gaussian distribution degenerates from an ellipse to a circle, thus losing the orientation information of the rotated object. This results in the bounding boxes of such square-like objects not being learned effectively. To resolve this problem, we used the Lame curve (or superellipse) to modify the existing 2-D Gaussian function and designed a super-Gaussian distribution. This distribution can maintain anisotropy at arbitrary aspect ratios, thus preserving the angular information of the oriented object. We used the Kullback-Leibler (KL) divergence to measure the distance between two super-Gaussian distributions and convert it into a localization loss (SGKLD) by a function. SGKLD is an improved version of KLD loss. By modifying the form of the probability distribution, we elegantly fix the angle missing problem of the traditional Gaussian distribution. We validated the effectiveness of the proposed algorithm on several datasets and obtained the performance of state-of-the-art (SOTA). Our algorithm achieves a mean average precision (mAP) of 80.07, 76.59, 62.27, and 90.55/98.13 on the DOTA-v1.0, DOTA-v1.5, DOTA-v2.0, and HRSC2016 datasets, respectively.
An oriented bounding box (OBB) is preferable over a horizontal bounding box (HBB) in accurate objectdetection. Most of existing works utilize a two-stage detector for locating the HBB and OBB, respectively, which hav...
详细信息
An oriented bounding box (OBB) is preferable over a horizontal bounding box (HBB) in accurate objectdetection. Most of existing works utilize a two-stage detector for locating the HBB and OBB, respectively, which have suffered from the misaligned horizontal proposals and the interference from complex backgrounds. To tackle these issues, region of interest transformer and attention models were proposed, yet they are extremely computationally intensive. To this end, we propose a semi-anchor-free detector (SAFDet) for objectdetection in aerial images, where a rotation-anchor-free-branch (RAFB) is used to enhance the foreground features via precisely regressing the OBB. Meanwhile, a center-prediction-module (CPM) is introduced for enhancing object localization and suppressing the background noise. Both RAFB and CPM are deployed during training, avoiding increased computational cost of inference. By evaluating on DOTA and HRSC2016 datasets, the efficacy of our approach has been fully validated for a good balance between the accuracy and computational cost.
暂无评论