The attention mechanism has been widely used and achieved good results in many visual tasks. But the calculations of attention mechanism in vision tasks consume huge spaces and times, which is the obvious disadvantage...
详细信息
Referring remotesensingimage Segmentation (RRSIS) is a new challenge that combines computer vision and natu-ral language processing. Traditional Referring image Seg-mentation (RIS) approaches have been impeded by th...
详细信息
ISBN:
(数字)9798350353006
ISBN:
(纸本)9798350353013
Referring remotesensingimage Segmentation (RRSIS) is a new challenge that combines computer vision and natu-ral language processing. Traditional Referring image Seg-mentation (RIS) approaches have been impeded by the com-plex spatial scales and orientations found in aerial imagery, leading to suboptimal segmentation results. To address these challenges, we introduce the Rotated Multi-Scale In-teraction Network (RMSIN), an innovative approach de-signed for the unique demands of RRSIS. RMSIN incorpo-rates an Intra-scale Interaction Module (iiM) to effectively address the fine-grained detail required at multiple scales and a Cross-scale Interaction Module (CIM) for integrating these details coherently across the network. Furthermore, RMSIN employs an Adaptive Rotated Convolution (ARC) to account for the diverse orientations of objects, a novel contribution that significantly enhances segmentation accu-racy. To assess the efficacy of RMSIN, we have curated an expansive dataset comprising 17,402 image-caption-mask triplets, which is unparalleled in terms of scale and vari-ety. This dataset not only presents the model with a wide range of spatial and rotational scenarios but also estab-lishes a stringent benchmark for the RRSIS task, ensuring a rigorous evaluation of performance. Experimental eval-uations demonstrate the exceptional performance of RM-SIN, surpassing existing state-of-the-art models by a signif-icant margin. Datasets and code are available at https://***/Lsan2401/RMSIN.
patternrecognition techniques are widely used in computer vision, classification of radio signals, and voice recognition. The fractional Fourier transform is used to recognize patterns using binary rings masks and se...
详细信息
ISBN:
(数字)9781510644991
ISBN:
(纸本)9781510644991;9781510644984
patternrecognition techniques are widely used in computer vision, classification of radio signals, and voice recognition. The fractional Fourier transform is used to recognize patterns using binary rings masks and segment images. This technique has the characteristic of being invariant to position and rotation and finally obtaining a one-dimensional signature. On the other hand, Neural Networks are used for patternrecognition based on a deep neural network algorithm. It has the characteristic of training large datasets with millions of images. Artificial Neural Networks(ANNs) are used for several applications such as patternrecognition and classification of input data. In particular, the ANN has been used to evaluate medical images from the brain to assess if the image corresponds to Alzheimer's disease. One disadvantage of the neural network is a large amount of time to learn depending on the number of patterns to be identified or classified and the ability to adapt and recognize patterns. Besides, the fractional Fourier transform cannot analyze a large amount of information. In this work, a comparison between the Artificial Neural Network and the Fractional Fourier Transform is presented to determine which will be the best for recognizing a batch of selected medical images. We propose a reconstruction method using both techniques for precise imagerecognition and the evaluation of their respective metrics such as accuracy, precision, sensitivity, and specificity. The medical images regarding Alzheimer's disease are no dementia, very mild dementia, mild dementia presenting the best perfomance regarding the receiver operating characteristics and moderate dementia was the worst classified related to the number of images of the dataset.
Robustness of different patternrecognition methods is one of the key challenges in autonomous driving, especially when driving in the high variety of road environments and weather conditions, such as gravel roads and...
详细信息
ISBN:
(纸本)9783031064272;9783031064265
Robustness of different patternrecognition methods is one of the key challenges in autonomous driving, especially when driving in the high variety of road environments and weather conditions, such as gravel roads and snowfall. Although one can collect data from these adverse conditions using cars equipped with sensors, it is quite tedious to annotate the data for training. In this work, we address this limitation and propose a CNN-based method that can leverage the steering wheel angle information to improve the road area semantic segmentation. As the steering wheel angle data can be easily acquired with the associated images, one could improve the accuracy of road area semantic segmentation by collecting data in new road environments without manual data annotation. We demonstrate the effectiveness of the proposed approach on two challenging data sets for autonomous driving and show that when the steering task is used in our segmentation model training, it leads to a 0.1-2.9% gain in the road area mIoU (mean Intersection over Union) compared to the corresponding reference transfer learning model.
Recently, the combination of remotesensingimageprocessing and deep learning methods is an increasingly popular trend. In this paper, we combine the existing instance segmentation model Mask R-CNN and the target det...
详细信息
Artificial intelligence (AI) and autonomous edge computing in space are emerging areas of interest to augment capabilities of nanosatellites, where modern sensors generate orders of magnitude more data than can typica...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Artificial intelligence (AI) and autonomous edge computing in space are emerging areas of interest to augment capabilities of nanosatellites, where modern sensors generate orders of magnitude more data than can typically be transmitted to mission control. Here, we present the hardware and software design of an onboard AI subsystem hosted on SpIRIT. The system is optimised for on-board computer vision experiments based on visible light and long wave infrared cameras. This paper highlights the key design choices made to maximise the robustness of the system in harsh space conditions, and their motivation relative to key mission requirements, such as limited compute resources, resilience to cosmic radiation, extreme temperature variations, distribution shifts, and very low transmission bandwidths. The payload, called Loris, consists of six visible light cameras, three infrared cameras, a camera control board and a Graphics processing Unit (GPU) system-on-module. Loris enables the execution of AI models with on-orbit fine-tuning as well as a next-generation image compression algorithm, including progressive coding. This innovative approach not only enhances the data processing capabilities of nanosatellites but also lays the groundwork for broader applications to remotesensing from space.
Transformer architectures have become state-of-the-art models in computer vision and natural language processing. To a significant degree, their success can be attributed to self-supervised pre-training on large scale...
Transformer architectures have become state-of-the-art models in computer vision and natural language processing. To a significant degree, their success can be attributed to self-supervised pre-training on large scale unlabeled datasets. This work investigates the use of self-supervised masked image reconstruction to advance transformer models for hyperspectral remotesensingimagery. To facilitate self-supervised pre-training, we build a large dataset of unlabeled hyperspectral observations from the EnMAP satellite and systematically investigate modifications of the vision transformer architecture to optimally leverage the characteristics of hyperspectral data. We find significant improvements in accuracy on different land cover classification tasks over both standard vision and sequence transformers using (i) blockwise patch embeddings, (ii) spatialspectral self-attention, (iii) spectral positional embeddings and (iv) masked self-supervised pre-training 1 . The resulting model outperforms standard transformer architectures by +5% accuracy on a labeled subset of our EnMAP data and by +15% on Houston2018 hyperspectral dataset, making it competitive with a strong 3D convolutional neural network baseline. In an ablation study on label-efficiency based on the Houston2018 dataset, self-supervised pre-training significantly improves transformer accuracy when little labeled training data is available. The self-supervised model outperforms randomly initialized transformers and the 3D convolutional neural network by +7-8% when only 0.1-10% of the training labels are available.
image registration is a basic problem in image analysis and imageprocessing. image registration has important applications in aerial image fusion, patternrecognition, three-dimensional reconstruction and other field...
ISBN:
(纸本)9781450397148
image registration is a basic problem in image analysis and imageprocessing. image registration has important applications in aerial image fusion, patternrecognition, three-dimensional reconstruction and other fields. Aiming at the problem of low registration accuracy and mismatching in remotesensingimage registration, this paper proposes to use the Involution kernel to improve the ResNext network in the feature extraction stage and combines the SPANet attention mechanism with an improved ResNext network to improve the feature extraction ability of the network. In the feature matching stage, an enhanced matching method is proposed, which uses cross-correlation and nearest neighbor to second nearest neighbor ratio to filter out mismatched points to cope with complex images and background interference. The experimental results show that the proposed algorithm can achieve superior results in a variety of indexes compared with other algorithms, which proves that the proposed algorithm is effective.
Geospatial Copilots unlock unprecedented potential for performing Earth Observation (EO) applications through natural language instructions. However, existing agents rely on overly simplified single tasks and template...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Geospatial Copilots unlock unprecedented potential for performing Earth Observation (EO) applications through natural language instructions. However, existing agents rely on overly simplified single tasks and template-based prompts, creating a disconnect with real-world scenarios. In this work, we present GeoLLM-Engine, an environment for tool-augmented agents with intricate tasks routinely executed by analysts on remotesensing platforms. We enrich our environment with geospatial API tools, dynamic maps/UIs, and external multimodal knowledge bases to properly gauge an agent’s proficiency in interpreting realistic high-level natural language commands and its functional correctness in task completions. By alleviating overheads typically associated with human-in-the-loop benchmark curation, we harness our massively parallel engine across 100 GPT-4-Turbo nodes, scaling to over half a million diverse multi-tool tasks and across 1.1 million satellite images. By moving beyond traditional single-task image-caption paradigms, we investigate state-of-the-art agents and prompting techniques against long-horizon prompts.
This study presents a novel approach for detecting the angles of the rotated rectangles precisely using the hybrid architecture of Convolutional Neural Networks (CNN) with Multi-Layer Perceptron (MLP) and Support Vect...
详细信息
ISBN:
(数字)9798350353266
ISBN:
(纸本)9798350353273
This study presents a novel approach for detecting the angles of the rotated rectangles precisely using the hybrid architecture of Convolutional Neural Networks (CNN) with Multi-Layer Perceptron (MLP) and Support Vector Regression (SVR). This work also shows the comparative assessment between the two hybrid models, CNN & MLP only and CNN & MLP along with the SVR for unrolling the angles of the rectangles. In the automated image analysis and patternrecognition domain, the complexity of rotated rectangles-especially in different orientations and scales-presents formidable challenges. Our study begins with the dataset comprised of 10000 images of rectangles with varying rotation angles and coordinates. Then, CNN, an effective model in the image analysis and computer vision field, effectively captures the spatial dependencies and characteristics of rotated rectangles from the raw images by extracting and learning the hierarchical feature representations. To further process the pieces of information, the MLP and SVR are used, giving the learning model more depth and improving its capacity to recognize complex patterns. Evaluation metrics such as MSE, RMSE, MAPE, MAE, and
$\mathbf{R}^pattern$
determine the model's accuracy. This research enhances the fields of machine learning and imageprocessing and also potentially benefits robotics, com-puter vision, and remotesensing-all of which depend on precise geometric interpretation. The evaluation metrics corroborate that the algorithm based on CNN and MLP along with the SVR has better accuracy compared to the model that relies only on CNN and MLP.
暂无评论