We propose a scheme for supervised image classification that uses privileged information, in the form of keypoint annotations for the training data, to learn strong models from small and/or biased training sets. Our m...
详细信息
We propose a scheme for supervised image classification that uses privileged information, in the form of keypoint annotations for the training data, to learn strong models from small and/or biased training sets. Our main motivation is the recognition of animal species for ecological applications such as biodiversity modelling, which is challenging because of long-tailed species distributions due to rare species, and strong dataset biases such as repetitive scene background in camera traps. To counteract these challenges, we propose a visual attention mechanism that is supervised via keypoint annotations that highlight important object parts. This privileged information, implemented as a novel privileged pooling operation, is only required during training and helps the model to focus on regions that are discriminative. In experiments with three different animal species datasets, we show that deep networks with privileged pooling can use small training sets more efficiently and generalize better.
Leveraging visual sensing technologies for the detection and tracking of vehicles represents a critical application domain for unmanned aerial vehicles (UAVs), notably in challenging operational *** study focuses on e...
详细信息
Optical remotesensingimages are widely used in the fields of feature recognition, scene semantic segmentation, and others. However, the quality of remotesensingimages is degraded due to the influence of various no...
详细信息
Optical remotesensingimages are widely used in the fields of feature recognition, scene semantic segmentation, and others. However, the quality of remotesensingimages is degraded due to the influence of various noises, which seriously affects the practical use of remotesensingimages. As remotesensingimages have more complex texture features than ordinary images, this will lead to the previous denoising algorithm failing to achieve the desired result. Therefore, we propose a novel remotesensingimage denoising network (RSIDNet) based on a deep learning approach, which mainly consists of a multi-scale feature extraction module (MFE), multiple local skip-connected enhanced attention blocks (ECA), a global feature fusion block (GFF), and a noisy image reconstruction block (NR). The combination of these modules greatly improves the model's use of the extracted features and increases the model's denoising capability. Extensive experiments on synthetic Gaussian noise datasets and real noise datasets have shown that RSIDNet achieves satisfactory results. RSIDNet can improve the loss of detail information in denoised images in traditional denoising methods, retaining more of the higher-frequency components, which can have performance improvements for subsequent imageprocessing.
Deep-learning-based models usually require a large amount of data for training, which guarantees the effectiveness of the trained model. Generative models are no exception, and sufficient training data are necessary f...
详细信息
Deep-learning-based models usually require a large amount of data for training, which guarantees the effectiveness of the trained model. Generative models are no exception, and sufficient training data are necessary for the diversity of generated images. However, for synthetic aperture radar (SAR) images, data acquisition is expensive. Therefore, SAR image generation under a few training samples is still a challenging problem to be solved. In this article, we propose an attribute-guided generative adversarial network (AGGAN) with an improved episode training strategy for few-shot SAR image generation. First, we design the AGGAN structure, and spectral normalization is used to stabilize the training in the few-shot situation. The attribute labels of AGGAN are designed to be the category and aspect angle labels, which are essential information for SAR images. Second, an improved episode training strategy is proposed according to the characteristics of the few-shot generative task, and it can improve the quality of generated images in the few-shot situation. In addition, we explore the effectiveness of the proposed method when using different auxiliary data for training and use the Moving and Stationary Target Acquisition and recognition benchmark dataset and a simulated SAR dataset for verification. The experimental results show that AGGAN and the proposed improved episode training strategy can generate images of better quality when compared with some existing methods, which have been verified through visual observation, image similarity measures, and recognition experiments. When applying the generated images to the 5-shot SAR imagerecognition problem, the average recognition accuracy can be improved by at least 4$\%$.
Aiming at the difficulties in object detection and recognition in remotesensingimages caused by high background complexity, large scale variations of targets, and the presence of numerous small objects, an improved ...
Aiming at the difficulties in object detection and recognition in remotesensingimages caused by high background complexity, large scale variations of targets, and the presence of numerous small objects, an improved method for remotesensingimage object detection based on YOLOv7-tiny is proposed. This method combines the loss function based on normalized Gaussian Wasserstein distance (NWD) with the CIoU loss function to address the problem of sensitivity to positional deviation of small objects by IoU-Loss. The addition of a global attention mechanism (GAM) in the backbone network reduces information diffusion and enhances the interaction at the global dimension to mitigate the interference of complex backgrounds in remotesensingimages on the model, enabling the model to focus on the feature extraction of the desired targets. Finally, the coupled detection head (Coupled Head) of the model is replaced with a decoupled detection head (Decoupled Head), allowing the classification and regression tasks to output from different branches to achieve decoupling and avoid the decrease in detection accuracy caused by conflicts between classification and regression. The experimental results of this method on the public dataset DIOR achieved 88.73% accuracy, which is an improvement of 1.78% compared to the unimproved method's accuracy of 86.95%. Furthermore, compared to other researchers' methods tested on DIOR, the proposed method also shows improvement, thus validating its effectiveness.
Aiming at the problems of low planning accuracy and long planning time in the traditional spatial planning method of urban landscape architecture distribution pattern, a spatial planning method of urban landscape arch...
详细信息
Aiming at the problems of low planning accuracy and long planning time in the traditional spatial planning method of urban landscape architecture distribution pattern, a spatial planning method of urban landscape architecture distribution pattern based on evolutionary algorithm was proposed. First, we acquire urban landscape remotesensingimages through ETM+ and Landsat TM/OLI images, and use ENVI software to conduct geometric correction, image enhancement and other imageprocessing. Then, we acquire spatial data of landscape distribution pattern from urban landscape green space types, patch area size, number and other aspects. We then use differential evolution algorithm to calculate the fitness value corresponding to the initialised population, extract landscape features, and use mutation operators. The optimal solution is obtained through the three steps of crossover operator and selection operation, which is the optimal spatial planning strategy. The simulation results show that the proposed method has higher precision and shorter planning time in spatial planning of urban landscape architecture distribution pattern.
Visual impairment is one of the most significant challenges facing humanity, Aespecially in an era where information is frequently conveyed through text rather than voice. To address this, the proposed system is desig...
详细信息
Visual impairment is one of the most significant challenges facing humanity, Aespecially in an era where information is frequently conveyed through text rather than voice. To address this, the proposed system is designed to assist individuals with visual impairments. This paper presents the development of a real-time Text-to-Speech (TTS) Aembedded system based on the Raspberry Pi 4. AOur system incorporates a novel approach to enhance the accuracy of text recognition using Optical Character recognition (OCR) from images. Specifically, a series of preprocessing steps are employed, selected dynamically by a decision-making process based on the content of the image. The imageprocessing is handled using OpenCV2, while the conversion of text to speech is achieved through the pyttsx3 Python library. The entire system is implemented and tested on a Raspberry Pi 4, connected to a USB Full HD camera for high-resolution image acquisition, and controlled via the Traffic HAT-LED module. Experimental results demonstrate that our system achieves a minimum accuracy of 88.33% in text recognition from images.
Due to the advantages of high throughput, low latency, and low power consumption, optical neural networks hold great promise in addressing the challenges of energy consumption and computational efficiency faced by cur...
详细信息
Terrain identification of coastal is of great significance for coastal development activities and coastal terrain survey in overseas areas. However, due to the complex characteristics of coastal features, the use of r...
详细信息
With the development of the sensor technology, complementary data of different sources can be easily obtained for various applications. Despite the availability of adequate multisource observation data, for example, h...
详细信息
With the development of the sensor technology, complementary data of different sources can be easily obtained for various applications. Despite the availability of adequate multisource observation data, for example, hyperspectral image (HSI) and light detection and ranging (LiDAR) data, existing methods may lack effective processing on structural information transmission and physical properties alignment, weakening the complementary ability of multiple sources in the collaborative classification task. The complementary information collaboration manner and the redundancy exclusion operator need to be redesigned for strengthening the semantic relatedness of multisources. As a remedy, we propose a structural optimization transmission framework, namely, structural optimization transmission network (SOT-Net), for collaborative land-cover classification of HSI and LiDAR data. Specifically, the SOT-Net is developed with three key modules: 1) cross-attention module;2) dual-modes propagation module;and 3) dynamic structure optimization module. Based on above designs, SOT-Net can take full advantage of the reflectance-specific information of HSI and the detailed edge (structure) representations of multisource data. The inferred transmission plan, which integrates a self-alignment regularizer into the classification task, enhances the robustness of the feature extraction and classification process. Experiments show consistent outperformance of SOT-Net over baselines across three benchmark remotesensing datasets, and the results also demonstrate that the proposed framework can yield satisfying classification result even with small-size training samples.
暂无评论