With a spurt of progress in deep learning techniques, convolutional neural network-based and transformer-based methods have yielded impressive performance on the hyperspectral image (HSI) classification tasks. However...
详细信息
With a spurt of progress in deep learning techniques, convolutional neural network-based and transformer-based methods have yielded impressive performance on the hyperspectral image (HSI) classification tasks. However, pixel-level manual annotation is time-consuming and laborious, and the small amount of labeled HSI data brings challenges to deep learning methods. Existing methods use carefully designed network architectures combined with self-supervised or semi-supervised learning to deal with the lack of training samples. Those methods were designed for specific datasets and often needed to tune hyperparameters on new datasets carefully. To tackle this problem, a unified HSI masked autoencoder framework was proposed for HSI classification. Different from existing works, the hyperspectral image masked autoencoder (HSIMAE) framework was pretrained on a large-scale unlabeled HSI dataset, named HSIHybrid, which contained a large amount of HSI data acquired by different sensors. First, to handle the different spectral ranges of HSIs, a group-wise PCA was applied to extract features of HSI spectra and transform them into fixed-length vectors. Then, a modified masked autoencoder was proposed for large-scale pretraining. It utilized separate spatial-spectral encoders followed by fusion blocks to learn spatial correlation and spectral correlation of HSI data. Finally, to leverage the unlabeled data of the target dataset, a dual-branch finetuning framework that used an extra unlabeled branch for mask modeling learning was introduced. Extensive experiments were conducted on four HSI datasets from different hyperspectral sensors. The results demonstrate the superiority of the proposed HSIMAE framework over the state-of-the-art methods, even with very few training samples.
Deep learning methods have shown significant advantages in polarimetric synthetic aperture radar (PolSAR) image classification. However, their performances rely on a large number of labeled data. To alleviate this pro...
详细信息
Deep learning methods have shown significant advantages in polarimetric synthetic aperture radar (PolSAR) image classification. However, their performances rely on a large number of labeled data. To alleviate this problem, this paper proposes a PolSAR image classification method with a masked autoencoder based on Position prediction and Memory tokens (MAPM). First, MAPM designs a masked autoencoder (MAE) based on the transformer for pre-training, which can boost feature learning and improve classification results based on the number of labeled samples. Secondly, since the transformer is relatively insensitive to the order of the input tokens, a position prediction strategy is introduced in the encoder part of the MAE. It can effectively capture subtle differences and discriminate complex, blurry boundaries in PolSAR images. In the fine-tuning stage, the addition of learnable memory tokens can improve classification performance. In addition, L1 loss is used for MAE optimization to enhance the robustness of the model to outliers in PolSAR data. Experimental results show the effectiveness and advantages of the proposed MAPM in PolSAR image classification. Specifically, MAPM achieves performance gains of about 1% in classification accuracy compared with existing methods.
作者:
Liu, JiamingWu, YueGong, MaoguoLiu, ZhixiaoMiao, QiguangMa, WenpingXidian Univ
Sch Comp Sci & Technol Key Lab Collaborat Intelligence Syst Minist Educ Xian 710071 Peoples R China Xidian Univ
Sch Elect Engn Key Lab Collaborat Intelligence Syst Minist Educ Xian 710071 Peoples R China Harbin Engn Univ
Yantai Res Inst Yantai 264006 Peoples R China Xidian Univ
Sch Artificial Intelligence Key Lab Intelligent Percept & Image Understanding Minist Educ Xian 710071 Peoples R China
masked autoencoder (MAE) is a recently widely used self-supervised learning method that has achieved great success in NLP and computer vision. However, the potential advantages of masked pre-training for point cloud u...
详细信息
masked autoencoder (MAE) is a recently widely used self-supervised learning method that has achieved great success in NLP and computer vision. However, the potential advantages of masked pre-training for point cloud understanding have not been fully explored. There is preliminary work on MAE-based point clouds using the Transformer architecture to explore low-level geometric representations in 3D space, which is insufficient for fine-grained decoding completion and downstream tasks. Inspired by multimodality, we propose Inter-MAE, a inter-modal MAE method for self-supervised learning on point clouds. Specifically, we first use Point-MAE as a baseline to partition point clouds into random low percentage of visible and high percentage of masked point patches. Then, a standard Transformer-based autoencoder is built by asymmetric design and shifting mask operations, and latent features are learned from the visible point patches aiming to recover the masked point patches. In addition, we generate image features based on ViT after point cloud rendering to form inter-modal contrastive learning with the decoded features of the completed point patches. Extensive experiments show that the proposed Inter-MAE generates pre-trained models that are effective and exhibit superior results in various downstream tasks. For example, an accuracy of 85.4% is achieved on ScanObjectNN and 86.3% on ShapeNetPart, outperforming other state-of-the-art self-supervised learning methods. Notably, our work establishes for the first time the feasibility of applying image modality to masked point clouds.
Wafer Map Pattern Recognition (WMPR) is a critical aspect of semiconductor manufacturing. It indicates how to improve the manufacturing yields as we probe into the failure issues of the processes. In literature works,...
详细信息
Wafer Map Pattern Recognition (WMPR) is a critical aspect of semiconductor manufacturing. It indicates how to improve the manufacturing yields as we probe into the failure issues of the processes. In literature works, researchers often use balanced datasets with ample datapoints to address WMPR tasks, however, novel defects often emerge with few previous observations in real-world manufacturing. Unfortunately, efforts to solve WMPR problems in few-shot scenarios remain scanty. To bridge this gap, we define a new task, Few Shot Wafer Map Pattern Recognition(FSWMPR), which attempts to learning a classifier to distinguish unseen classes with only a few labeled instances available. In such a task, expeditiously learning transferable feature embeddings is extremely challenging. In this paper, we propose an innovative two-stage strategy to wrestle with the problem of FSWMPR. In the first stage, we leverage a masked autoencoder to obtain efficacious representations of defect wafer map images through reconstructing pixel values of masked patches based on smooth-l1 loss. In the second stage, we create a novel finetuning mechanism, "Dynamic Multi-Loss Adaptation Mechanism", which utilize three cooperative losses to accelerate fast feature transfer for few-shot scenarios. Surprisingly, if three losses are reduced to one comparative loss, we still achieve more competitive accuracy than those meta- learning or finetuning methods, which is worth noting that our two stages involve no label information at all. Extensive experiments and analyses are conducted on WM811K datasets. Compared with other algorithms, our methods offer fresh solutions by creatively integrating self-supervised masked autoencoder with a novel finetune mechanism which is efficacious for FSWMPR.
Polysomnography (PSG) is an indispensable diagnostic tool in sleep medicine, essential for identifying various sleep disorders. By capturing physiological signals, including EEG, EOG, EMG, and cardiorespiratory metric...
详细信息
ISBN:
(纸本)9798350309430
Polysomnography (PSG) is an indispensable diagnostic tool in sleep medicine, essential for identifying various sleep disorders. By capturing physiological signals, including EEG, EOG, EMG, and cardiorespiratory metrics, PSG presents a patient's sleep architecture. However, its dependency on complex equipment and expertise confines its use to specialized clinical settings. Addressing these limitations, our study aims to perform PSG by developing a system that requires only a single EEG measurement. We propose a novel system capable of reconstructing multi-signal PSG from a single-channel EEG based on a masked autoencoder. The masked autoencoder was trained and evaluated using the Sleep-EDF-20 dataset, with mean squared error as the metric for assessing the similarity between original and reconstructed signals. The model demonstrated proficiency in reconstructing multi-signal data. Our results present promise for the development of more accessible and long-term sleep monitoring systems. This suggests the expansion of PSG's applicability, enabling its use beyond the confines of clinics.
Voice conversion (VC) is an important voice forgery method that poses a serious threat to personal privacy protection, especially with remarkable achievements in timbre modification. To support forensic research on co...
详细信息
ISBN:
(纸本)9783031251146;9783031251153
Voice conversion (VC) is an important voice forgery method that poses a serious threat to personal privacy protection, especially with remarkable achievements in timbre modification. To support forensic research on converted speech and further enrich the sources of fake speech, it is imperative to investigate new robust VC methods. VC is also considered a typical style transfer task, where style refers to speaker identity, suggesting that achieving sufficient feature decoupling is the key to obtaining robust performance. However, mainstream decoupling methods based on information-constrained bottlenecks still fail to obtain robust content-style trade-offs. In this paper, we propose a learnable similarity-guided mask (LSGM) algorithm to address the robustness problem. First, to make feature decoupling independent of specific language constructs and more applicable to diverse content, LSGM performs inter-frame feature compression only relying on the similarity of adjacent frames instead of complex inter-frame content correlation. Second, we implement feature compression by masking instead of dimensionality reduction, so no additional modules are needed to convey the speech frame length information. Moreover, we propose MAE-VC by using LSGM, which is an end-to-end masked autoencoder (MAE) with self-supervised representation learning. Experimental results indicate that MAE-VC performs comparable to state-of-the-art methods on speaker similarity and significantly improves the performance on content consistency.
Hyperspectral imaging offers manifold opportunities for applications that may not, or only partially, be achieved within the visual spectrum. Our paper presents a novel approach for Single-Label Hyperspectral Image Cl...
详细信息
ISBN:
(纸本)9798350365474
Hyperspectral imaging offers manifold opportunities for applications that may not, or only partially, be achieved within the visual spectrum. Our paper presents a novel approach for Single-Label Hyperspectral Image Classification, demonstrated through the example of a key challenge faced by agricultural seed producers: seed purity testing. We employ Self-Supervised Learning and masked Image Modeling techniques to tackle this task. Recognizing the challenges and costs associated with acquiring hyperspectral data, we aim to develop a versatile method capable of working with visible, arbitrary combinations of spectral bands (multispectral data) and hyperspectral sensor data. By integrating RGB and hyperspectral data, we leverage the detailed spatial information from RGB images and the rich spectral information from hyperspectral data to enhance the accuracy of seed classification. Through evaluations in various real-life scenarios, we demonstrate the flexibility, scalability, and efficiency of our approach.
Self-supervised learning with masked autoencoders has recently gained popularity for its ability to produce effective image or textual representations, which can be applied to various downstream tasks without retraini...
详细信息
ISBN:
(纸本)9798400701245
Self-supervised learning with masked autoencoders has recently gained popularity for its ability to produce effective image or textual representations, which can be applied to various downstream tasks without retraining. However, we observe that the current masked autoencoder models lack good generalization ability on graph data. To tackle this issue, we propose a novel graph masked autoencoder framework called GiGaMAE. Different from existing masked autoencoders that learn node presentations by explicitly reconstructing the original graph components (e.g., features or edges), in this paper, we propose to collaboratively reconstruct informative and integrated latent embeddings. By considering embeddings encompassing graph topology and attribute information as reconstruction targets, our model could capture more generalized and comprehensive knowledge. Furthermore, we introduce a mutual information based reconstruction loss that enables the effective reconstruction of multiple targets. This learning objective allows us to differentiate between the exclusive knowledge learned from a single target and common knowledge shared by multiple targets. We evaluate our method on three downstream tasks with seven datasets as benchmarks. Extensive experiments demonstrate the superiority of GiGaMAE against state-of-the-art baselines. We hope our results will shed light on the design of foundation models on graph-structured data. Our code is available at: https://***/sycny/GiGaMAE.
Unsupervised anomaly detection in medical images such as chest radiographs is stepping into the spotlight as it mitigates the scarcity of the labor-intensive and costly expert annotation of anomaly data. However, near...
详细信息
ISBN:
(纸本)9783031439063;9783031439070
Unsupervised anomaly detection in medical images such as chest radiographs is stepping into the spotlight as it mitigates the scarcity of the labor-intensive and costly expert annotation of anomaly data. However, nearly all existing methods are formulated as a one-class classification trained only on representations from the normal class and discard a potentially significant portion of the unlabeled data. This paper focuses on a more practical setting, dual distribution anomaly detection for chest X-rays, using the entire training data, including both normal and unlabeled images. Inspired by a modern self-supervised vision transformer model trained using partial image inputs to reconstruct missing image regions- we propose AMAE, a two-stage algorithm for adaptation of the pre-trained masked autoencoder (MAE). Starting from MAE initialization, AMAE first creates synthetic anomalies from only normal training images and trains a lightweight classifier on frozen transformer features. Subsequently, we propose an adaptation strategy to leverage unlabeled images containing anomalies. The adaptation scheme is accomplished by assigning pseudo-labels to unlabeled images and using two separate MAE based modules to model the normative and anomalous distributions of pseudo-labeled images. The effectiveness of the proposed adaptation strategy is evaluated with different anomaly ratios in an unlabeled training set. AMAE leads to consistent performance gains over competing self-supervised and dual distribution anomaly detection methods, setting the new state-of-the-art on three public chest X-ray benchmarks - RSNA, NIH-CXR, and VinDr-CXR.
Redshift prediction is a fundamental task in astronomy, essential for understanding the expansion of the universe and determining the distances of astronomical objects. Accurate redshift prediction plays a crucial rol...
详细信息
ISBN:
(纸本)9798350365627;9798350365610
Redshift prediction is a fundamental task in astronomy, essential for understanding the expansion of the universe and determining the distances of astronomical objects. Accurate redshift prediction plays a crucial role in advancing our knowledge of the cosmos. Machine learning (ML) methods, renowned for their precision and speed, offer promising solutions for this complex task. However, traditional ML algorithms heavily depend on labeled data and task-specific feature extraction. To overcome these limitations, we introduce AstroMAE, an innovative approach that pretrains a vision transformer encoder using a masked autoencoder method on Sloan Digital Sky Survey (SDSS) images. This technique enables the encoder to capture the global patterns within the data without relying on labels. To the best of our knowledge, AstroMAE represents the first application of a masked autoencoder to astronomical data. By ignoring labels during the pretraining phase, the encoder gathers a general understanding of the data. The pretrained encoder is subsequently fine-tuned within a specialized architecture tailored for redshift prediction. We evaluate our model against various vision transformer architectures and CNN-based models, demonstrating the superior performance of AstroMAE's pretrained model and fine-tuning architecture.
暂无评论