Vision Mamba (VMamba) has recently attracted great research attention due to its ability to obtain a global receptive field with linear computational complexity. However, similar to Vision Transformer (ViT), due to it...
详细信息
ISBN:
(纸本)9789819785049;9789819785056
Vision Mamba (VMamba) has recently attracted great research attention due to its ability to obtain a global receptive field with linear computational complexity. However, similar to Vision Transformer (ViT), due to its mechanism of dividing patches, it also faces the issue of insufficient description ability of local details. To address this issue, we design in this paper a dual-stream network that combines VMamba and CNN, aiming to enable the network to possess both the global receptive field of VMamba and the local detail description capability of CNN. Both of the two characteristics are crucial for remotesensingimage semantic segmentation. The two streams are supervised and trained through independent loss functions. On the other hand, to enable sufficient information exchange between the two branches, we introduce an auto-scaling fusion module aiming at bridging the semantic gap between VMamba and CNN. Experiments demonstrate that the method proposed in this paper outperforms state-of-the-art methods on multiple remotesensing semantic segmentation datasets.
In this study, we propose an innovative multimodal learning approach that integrates Contrastive Language image Pre-training and large language models to enhance the recognition efficiency of remotesensingimages and...
详细信息
ISBN:
(纸本)9798400718144
In this study, we propose an innovative multimodal learning approach that integrates Contrastive Language image Pre-training and large language models to enhance the recognition efficiency of remotesensingimages and their capacity to generate related professional information. This method has effectively achieved integration of imageprocessing and text generation at a technical level, exhibiting significant application advantages in fields such as automated Geographic Information Systems construction, environmental monitoring, disaster assessment, and geographic science education. The research underscores the advancements of the Contrastive Language image Pre-training model in visual-textual understanding and the technical strengths of large language models in handling complex text tasks. By designing an integrated fusion layer, we have efficiently combined visual features with textual information and conducted a comprehensive evaluation of the model's recognition accuracy and text generation quality on the dataset. Experimental results show that our model achieved a recognition accuracy of 73.7% and a text quality score of 26.6, validating its efficacy and powerful capability in dealing with the complexity and diversity of remotesensingimages. Through the deep integration of Contrastive Language image Pre-training and large language models, this research not only further advances multimodal learning technologies but also opens new perspectives and possibilities for the research and application of remotesensingimagerecognition and related information generation.
Atmospheric parameters are necessary inputs for atmospheric correction, but obtaining these parameters is difficult. To address this challenge, a solution for atmospheric parameter acquisition based on NNAeroG and net...
详细信息
ISBN:
(纸本)9798400707032
Atmospheric parameters are necessary inputs for atmospheric correction, but obtaining these parameters is difficult. To address this challenge, a solution for atmospheric parameter acquisition based on NNAeroG and networked automatic matching was proposed. This solution, combined with QUAAC, enables the atmospheric correction of GF images, thereby achieving full process automation of atmospheric correction. This scheme effectively simplifies the tedious process of obtaining AOD in existing methods and greatly improves the efficiency of atmospheric correction. The atmospheric parameters provided by this program can support multiple atmospheric correction methods, reduce labor-intensive operations, and offer efficient tools for large-scale atmospheric radiation production and research.
Referring remotesensingimage Segmentation (RRSIS) is a new challenge that combines computer vision and natural language processing. Traditional Referring image Segmentation (RIS) approaches have been impeded by the ...
详细信息
ISBN:
(纸本)9798350353006
Referring remotesensingimage Segmentation (RRSIS) is a new challenge that combines computer vision and natural language processing. Traditional Referring image Segmentation (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery, leading to suboptimal segmentation results. To address these challenges, we introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS. RMSIN incorporates an Intra-scale Interaction Module (IIM) to effectively address the fine-grained detail required at multiple scales and a Cross-scale Interaction Module (CIM) for integrating these details coherently across the network. Furthermore, RMSIN employs an Adaptive Rotated Convolution ARC) to account for the diverse orientations of objects, a novel (contribution that significantly enhances segmentation accuracy. To assess the efficacy of RMSIN, we have curated an expansive dataset comprising 17,402 image-caption-mask triplets, which is unparalleled in terms of scale and variety. This dataset not only presents the model with a wide range of spatial and rotational scenarios but also establishes a stringent benchmark for the RRSIS task, ensuring a rigorous evaluation of performance. Experimental evaluations demonstrate the exceptional performance of RM-SIN, surpassing existing state-of-the-art models by a significant margin. Datasets and code are available at https://***/Lsan2401/RMSIN.
This article is the first in a series of publications dedicated to the leading scientific school of Academician V.A. Soifer in the field of processing, analysis, and recognition of images and optical signals. The arti...
详细信息
This article is the first in a series of publications dedicated to the leading scientific school of Academician V.A. Soifer in the field of processing, analysis, and recognition of images and optical signals. The article briefly describes the creation and development of the Samara scientific school of computer imageprocessing. Examples of obtained fundamental results and solved applied problems are given. The most significant publications of the scientific school are listed and analyzed.
The remotesensingimage analysis, classification, and patternrecognition processes all depend on image segmentation. In this research, a search-based convolutional neural network (SBCNN) is used to identification me...
详细信息
The proceedings contain 263 papers. The topics discussed include: improved YOLOv5's remotesensingimage detection algorithm;wavelet transform based polarized image fusion detection of underwater targets;CLIP-driv...
ISBN:
(纸本)9781510680425
The proceedings contain 263 papers. The topics discussed include: improved YOLOv5's remotesensingimage detection algorithm;wavelet transform based polarized image fusion detection of underwater targets;CLIP-driven hierarchical fusion for referring image segmentation;research on continuous monitoring of subdural hematoma based on optical image mapping feature extraction method;generalized image denoising based on MLP denoiser and diffusion model;CAM: consistency adversarial model for image generation with high-frequency image details;DBE-net: double-level boundary enhanced network for temporomandibular joint CBCT images segmentation;an enhanced feature matching multi-temporal port remotesensingimage registration network E-SuperGlue;and application of 3D laser scanning and tilt photography technology in digital landscape surveying and mapping.
Traditional methods for detecting plant diseases and pests are time-consuming, labor-intensive, and require specialized skills and resources, making them insufficient to meet the demands of modern agricultural develop...
详细信息
Traditional methods for detecting plant diseases and pests are time-consuming, labor-intensive, and require specialized skills and resources, making them insufficient to meet the demands of modern agricultural development. To address these challenges, deep learning technologies have emerged as a promising solution for the accurate and timely identification of plant diseases and pests, thereby reducing crop losses and optimizing agricultural resource allocation. By leveraging its advantages in imageprocessing, deep learning technology has significantly enhanced the accuracy of plant disease and pest detection and identification. This review provides a comprehensive overview of recent advancements in applying deep learning algorithms to plant disease and pest detection. It begins by outlining the limitations of traditional methods in this domain, followed by a systematic discussion of the latest developments in applying various deep learning techniques-including image classification, object detection, semantic segmentation, and change detection-to plant disease and pest identification. Additionally, this study highlights the role of large-scale pre-trained models and transfer learning in improving detection accuracy and scalability across diverse crop types and environmental conditions. Key challenges, such as enhancing model generalization, addressing small lesion detection, and ensuring the availability of high-quality, diverse training datasets, are critically examined. Emerging opportunities for optimizing pest and disease monitoring through advanced algorithms are also emphasized. Deep learning technology, with its powerful capabilities in data processing and patternrecognition, has become a pivotal tool for promoting sustainable agricultural practices, enhancing productivity, and advancing precision agriculture.
High spatial resolution (HSR) remotesensingimages contain complex foreground-background relationships, which makes the remotesensing land cover segmentation a special semantic segmentation task. The main challenges...
详细信息
High spatial resolution (HSR) remotesensingimages contain complex foreground-background relationships, which makes the remotesensing land cover segmentation a special semantic segmentation task. The main challenges come from the large-scale variation, complex background samples and imbalanced foreground-background distribution. These issues make recent context modeling methods sub-optimal due to the lack of foreground saliency modeling. To handle these problems, we propose a remotesensing Segmentation framework (RSSFormer), including Adaptive TransFormer Fusion Module, Detail-aware Attention Layer and Foreground Saliency Guided Loss. Specifically, from the perspective of relation-based foreground saliency modeling, our Adaptive Transformer Fusion Module can adaptively suppress background noise and enhance object saliency when fusing multi-scale features. Then our Detail-aware Attention Layer extracts the detail and foreground-related information via the interplay of spatial attention and channel attention, which further enhances the foreground saliency. From the perspective of optimization-based foreground saliency modeling, our Foreground Saliency Guided Loss can guide the network to focus on hard samples with low foreground saliency responses to achieve balanced optimization. Experimental results on LoveDA datasets, Vaihingen datasets, Potsdam datasets and iSAID datasets validate that our method outperforms existing general semantic segmentation methods and remotesensing segmentation methods, and achieves a good compromise between computational overhead and accuracy.
To improve the application efficiency of RGB remotesensingimages in agricultural land resource surveys, a cultivated land segmentation algorithm based on kernel space non-uniform regularization classification and im...
详细信息
暂无评论