检索结果-内蒙古大学图书馆

HSIMAE: A Unified masked autoencoder With Large-Scale Pretraining for Hyperspectral Image Classification

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 2024年 17卷 14064-14079页

作者： Wang, Yue Wen, Ming Zhang, Hailiang Sun, Jinyu Yang, Qiong Zhang, Zhimin Lu, Hongmei Cent South Univ Coll Chem & Chem Engn Changsha 410083 Peoples R China

With a spurt of progress in deep learning techniques, convolutional neural network-based and transformer-based methods have yielded impressive performance on the hyperspectral image (HSI) classification tasks. However, pixel-level manual annotation is time-consuming and laborious, and the small amount of labeled HSI data brings challenges to deep learning methods. Existing methods use carefully designed network architectures combined with self-supervised or semi-supervised learning to deal with the lack of training samples. Those methods were designed for specific datasets and often needed to tune hyperparameters on new datasets carefully. To tackle this problem, a unified HSI masked autoencoder framework was proposed for HSI classification. Different from existing works, the hyperspectral image masked autoencoder (HSIMAE) framework was pretrained on a large-scale unlabeled HSI dataset, named HSIHybrid, which contained a large amount of HSI data acquired by different sensors. First, to handle the different spectral ranges of HSIs, a group-wise PCA was applied to extract features of HSI spectra and transform them into fixed-length vectors. Then, a modified masked autoencoder was proposed for large-scale pretraining. It utilized separate spatial-spectral encoders followed by fusion blocks to learn spatial correlation and spectral correlation of HSI data. Finally, to leverage the unlabeled data of the target dataset, a dual-branch finetuning framework that used an extra unlabeled branch for mask modeling learning was introduced. Extensive experiments were conducted on four HSI datasets from different hyperspectral sensors. The results demonstrate the superiority of the proposed HSIMAE framework over the state-of-the-art methods, even with very few training samples.

关键词： Hyperspectral image (HSI) classification large-scale pretraining masked autoencoder self-supervised learning transformer Hyperspectral image (HSI) classification large-scale pretraining masked autoencoder self-supervised learning transformer

来源：评论

学校读者我要写书评

暂无评论

MAPM:PolSAR Image Classification with masked autoencoder Based on Position Prediction and Memory Tokens

引用

REMOTE SENSING 2024年第22期16卷

作者： Wang, Jianlong Li, Yingying Quan, Dou Hou, Beibei Wang, Zhensong Sima, Haifeng Sun, Junding Henan Polytech Univ Sch Comp Sci & Technol Jiaozuo 454003 Peoples R China Xidian Univ Sch Artificial Intelligence Key Lab Intelligent Percept & Image Understanding Minist Educ Xian 710071 Peoples R China Henan Polytech Univ Sch Software Jiaozuo 454003 Peoples R China

Deep learning methods have shown significant advantages in polarimetric synthetic aperture radar (PolSAR) image classification. However, their performances rely on a large number of labeled data. To alleviate this problem, this paper proposes a PolSAR image classification method with a masked autoencoder based on Position prediction and Memory tokens (MAPM). First, MAPM designs a masked autoencoder (MAE) based on the transformer for pre-training, which can boost feature learning and improve classification results based on the number of labeled samples. Secondly, since the transformer is relatively insensitive to the order of the input tokens, a position prediction strategy is introduced in the encoder part of the MAE. It can effectively capture subtle differences and discriminate complex, blurry boundaries in PolSAR images. In the fine-tuning stage, the addition of learnable memory tokens can improve classification performance. In addition, L1 loss is used for MAE optimization to enhance the robustness of the model to outliers in PolSAR data. Experimental results show the effectiveness and advantages of the proposed MAPM in PolSAR image classification. Specifically, MAPM achieves performance gains of about 1% in classification accuracy compared with existing methods.

关键词： polarimetric SAR masked autoencoder position prediction L1 loss memory tokens

来源：评论

学校读者我要写书评

暂无评论

Inter-Modal masked autoencoder for Self-Supervised Learning on Point Clouds

引用

IEEE TRANSACTIONS ON MULTIMEDIA 2024年 26卷 3897-3908页

作者： Liu, Jiaming Wu, Yue Gong, Maoguo Liu, Zhixiao Miao, Qiguang Ma, Wenping Xidian Univ Sch Comp Sci & Technol Key Lab Collaborat Intelligence Syst Minist Educ Xian 710071 Peoples R China Xidian Univ Sch Elect Engn Key Lab Collaborat Intelligence Syst Minist Educ Xian 710071 Peoples R China Harbin Engn Univ Yantai Res Inst Yantai 264006 Peoples R China Xidian Univ Sch Artificial Intelligence Key Lab Intelligent Percept & Image Understanding Minist Educ Xian 710071 Peoples R China

masked autoencoder (MAE) is a recently widely used self-supervised learning method that has achieved great success in NLP and computer vision. However, the potential advantages of masked pre-training for point cloud understanding have not been fully explored. There is preliminary work on MAE-based point clouds using the Transformer architecture to explore low-level geometric representations in 3D space, which is insufficient for fine-grained decoding completion and downstream tasks. Inspired by multimodality, we propose Inter-MAE, a inter-modal MAE method for self-supervised learning on point clouds. Specifically, we first use Point-MAE as a baseline to partition point clouds into random low percentage of visible and high percentage of masked point patches. Then, a standard Transformer-based autoencoder is built by asymmetric design and shifting mask operations, and latent features are learned from the visible point patches aiming to recover the masked point patches. In addition, we generate image features based on ViT after point cloud rendering to form inter-modal contrastive learning with the decoded features of the completed point patches. Extensive experiments show that the proposed Inter-MAE generates pre-trained models that are effective and exhibit superior results in various downstream tasks. For example, an accuracy of 85.4% is achieved on ScanObjectNN and 86.3% on ShapeNetPart, outperforming other state-of-the-art self-supervised learning methods. Notably, our work establishes for the first time the feasibility of applying image modality to masked point clouds.

关键词： Point cloud compression Transformers Task analysis Standards Computer architecture Decoding Self-supervised learning Self-supervision masked autoencoder joint multimodality point cloud understanding

来源：评论

学校读者我要写书评

暂无评论

masked autoencoder with dynamic multi-loss adaptation mechanism for few shot wafer map pattern recognition

引用

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE 2024年 137卷

作者： Liang, Qi Zhou, Jian Wang, Yonglin Tongji Univ Sch Mech Engn Shanghai 201804 Peoples R China

Wafer Map Pattern Recognition (WMPR) is a critical aspect of semiconductor manufacturing. It indicates how to improve the manufacturing yields as we probe into the failure issues of the processes. In literature works, researchers often use balanced datasets with ample datapoints to address WMPR tasks, however, novel defects often emerge with few previous observations in real-world manufacturing. Unfortunately, efforts to solve WMPR problems in few-shot scenarios remain scanty. To bridge this gap, we define a new task, Few Shot Wafer Map Pattern Recognition(FSWMPR), which attempts to learning a classifier to distinguish unseen classes with only a few labeled instances available. In such a task, expeditiously learning transferable feature embeddings is extremely challenging. In this paper, we propose an innovative two-stage strategy to wrestle with the problem of FSWMPR. In the first stage, we leverage a masked autoencoder to obtain efficacious representations of defect wafer map images through reconstructing pixel values of masked patches based on smooth-l1 loss. In the second stage, we create a novel finetuning mechanism, "Dynamic Multi-Loss Adaptation Mechanism", which utilize three cooperative losses to accelerate fast feature transfer for few-shot scenarios. Surprisingly, if three losses are reduced to one comparative loss, we still achieve more competitive accuracy than those meta- learning or finetuning methods, which is worth noting that our two stages involve no label information at all. Extensive experiments and analyses are conducted on WM811K datasets. Compared with other algorithms, our methods offer fresh solutions by creatively integrating self-supervised masked autoencoder with a novel finetune mechanism which is efficacious for FSWMPR.

关键词： Few shot wafer map pattern recognition masked autoencoder Multi-loss Few-shot learning

来源：评论

学校读者我要写书评

暂无评论

Multi-Signal Reconstruction Using masked autoencoder From EEG During Polysomnography 12

Multi-Signal Reconstruction Using Masked Autoencoder From EE...

引用

12th International Winter Conference on Brain-Computer Interface (BCI)

作者： Kweon, Young-Seok Shin, Gi-Hwan Kwak, Heon-Gyu Jo, Ha-Na Korea Univ Dept Brain & Cognit Engn Seoul South Korea Korea Univ Dept Artificial Intelligence Seoul South Korea

ISBN: (纸本)9798350309430

Polysomnography (PSG) is an indispensable diagnostic tool in sleep medicine, essential for identifying various sleep disorders. By capturing physiological signals, including EEG, EOG, EMG, and cardiorespiratory metrics, PSG presents a patient's sleep architecture. However, its dependency on complex equipment and expertise confines its use to specialized clinical settings. Addressing these limitations, our study aims to perform PSG by developing a system that requires only a single EEG measurement. We propose a novel system capable of reconstructing multi-signal PSG from a single-channel EEG based on a masked autoencoder. The masked autoencoder was trained and evaluated using the Sleep-EDF-20 dataset, with mean squared error as the metric for assessing the similarity between original and reconstructed signals. The model demonstrated proficiency in reconstructing multi-signal data. Our results present promise for the development of more accessible and long-term sleep monitoring systems. This suggests the expansion of PSG's applicability, enabling its use beyond the confines of clinics.

关键词： polysomnography electroencephalogram masked autoencoder

来源：评论

学校读者我要写书评

暂无评论

Voice Conversion Using Learnable Similarity-Guided masked autoencoder 21st

Voice Conversion Using Learnable Similarity-Guided Masked Au...

引用

21st International Workshop on Digital-Forensics and Watermarking (IWDW)

作者： Gu, Yewei Zhao, Xianfeng Yi, Xiaowei Xiao, Junchao Chinese Acad Sci Inst Informat Engn State Key Lab Informat Secur Beijing 100195 Peoples R China Univ Chinese Acad Sci Sch Cyber Secur Beijing 100195 Peoples R China

ISBN: (纸本)9783031251146;9783031251153

Voice conversion (VC) is an important voice forgery method that poses a serious threat to personal privacy protection, especially with remarkable achievements in timbre modification. To support forensic research on converted speech and further enrich the sources of fake speech, it is imperative to investigate new robust VC methods. VC is also considered a typical style transfer task, where style refers to speaker identity, suggesting that achieving sufficient feature decoupling is the key to obtaining robust performance. However, mainstream decoupling methods based on information-constrained bottlenecks still fail to obtain robust content-style trade-offs. In this paper, we propose a learnable similarity-guided mask (LSGM) algorithm to address the robustness problem. First, to make feature decoupling independent of specific language constructs and more applicable to diverse content, LSGM performs inter-frame feature compression only relying on the similarity of adjacent frames instead of complex inter-frame content correlation. Second, we implement feature compression by masking instead of dimensionality reduction, so no additional modules are needed to convey the speech frame length information. Moreover, we propose MAE-VC by using LSGM, which is an end-to-end masked autoencoder (MAE) with self-supervised representation learning. Experimental results indicate that MAE-VC performs comparable to state-of-the-art methods on speaker similarity and significantly improves the performance on content consistency.

关键词： Voice conversion Feature decoupling Style transfer Learnable similarity-guided mask masked autoencoder

来源：评论

学校读者我要写书评

暂无评论

BiMAE - A Bimodal masked autoencoder Architecture for Single-Label Hyperspectral Image Classification

BiMAE - A Bimodal Masked Autoencoder Architecture for Single...

引用

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

作者： Kukushkin, Maksim Bogdan, Martin Schmid, Thomas Univ Leipzig Augustuspl 10 D-04109 Leipzig Germany Martin Luther Univ Halle Wittenberg Univ Pl 10 Halle Saale Germany

ISBN: (纸本)9798350365474

Hyperspectral imaging offers manifold opportunities for applications that may not, or only partially, be achieved within the visual spectrum. Our paper presents a novel approach for Single-Label Hyperspectral Image Classification, demonstrated through the example of a key challenge faced by agricultural seed producers: seed purity testing. We employ Self-Supervised Learning and masked Image Modeling techniques to tackle this task. Recognizing the challenges and costs associated with acquiring hyperspectral data, we aim to develop a versatile method capable of working with visible, arbitrary combinations of spectral bands (multispectral data) and hyperspectral sensor data. By integrating RGB and hyperspectral data, we leverage the detailed spatial information from RGB images and the rich spectral information from hyperspectral data to enhance the accuracy of seed classification. Through evaluations in various real-life scenarios, we demonstrate the flexibility, scalability, and efficiency of our approach.

关键词： hyperspectral classification hyperspectral imaging masked autoencoder masked modeling multimodal masked autoencoder seed purity testing self-supervised learning

来源：评论

学校读者我要写书评

暂无评论

GiGaMAE: Generalizable Graph masked autoencoder via Collaborative Latent Space Reconstruction 23

GiGaMAE: Generalizable Graph Masked Autoencoder via Collabor...

引用

32nd ACM International Conference on Information and Knowledge Management (CIKM)

作者： Shi, Yucheng Dong, Yushun Tan, Qiaoyu Li, Jundong Liu, Ninghao Univ Georgia Athens GA USA Univ Virginia Charlottesville VA USA Texas A&M Univ College Stn TX USA

ISBN: (纸本)9798400701245

Self-supervised learning with masked autoencoders has recently gained popularity for its ability to produce effective image or textual representations, which can be applied to various downstream tasks without retraining. However, we observe that the current masked autoencoder models lack good generalization ability on graph data. To tackle this issue, we propose a novel graph masked autoencoder framework called GiGaMAE. Different from existing masked autoencoders that learn node presentations by explicitly reconstructing the original graph components (e.g., features or edges), in this paper, we propose to collaboratively reconstruct informative and integrated latent embeddings. By considering embeddings encompassing graph topology and attribute information as reconstruction targets, our model could capture more generalized and comprehensive knowledge. Furthermore, we introduce a mutual information based reconstruction loss that enables the effective reconstruction of multiple targets. This learning objective allows us to differentiate between the exclusive knowledge learned from a single target and common knowledge shared by multiple targets. We evaluate our method on three downstream tasks with seven datasets as benchmarks. Extensive experiments demonstrate the superiority of GiGaMAE against state-of-the-art baselines. We hope our results will shed light on the design of foundation models on graph-structured data. Our code is available at: https://***/sycny/GiGaMAE.

关键词： Self-supervised Learning Graph Mining masked autoencoder

来源：评论

学校读者我要写书评

暂无评论

AMAE: Adaptation of Pre-trained masked autoencoder for Dual-Distribution Anomaly Detection in Chest X-Rays 26th

AMAE: Adaptation of Pre-trained Masked Autoencoder for Dual-...

引用

26th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)

作者： Bozorgtabar, Behzad Mahapatra, Dwarikanath Thiran, Jean-Philippe Ecole Polytech Fed Lausanne EPFL Lausanne Switzerland Incept Inst AI IIAI Abu Dhabi U Arab Emirates Lausanne Univ Hosp CHUV Lausanne Switzerland

ISBN: (纸本)9783031439063;9783031439070

Unsupervised anomaly detection in medical images such as chest radiographs is stepping into the spotlight as it mitigates the scarcity of the labor-intensive and costly expert annotation of anomaly data. However, nearly all existing methods are formulated as a one-class classification trained only on representations from the normal class and discard a potentially significant portion of the unlabeled data. This paper focuses on a more practical setting, dual distribution anomaly detection for chest X-rays, using the entire training data, including both normal and unlabeled images. Inspired by a modern self-supervised vision transformer model trained using partial image inputs to reconstruct missing image regions- we propose AMAE, a two-stage algorithm for adaptation of the pre-trained masked autoencoder (MAE). Starting from MAE initialization, AMAE first creates synthetic anomalies from only normal training images and trains a lightweight classifier on frozen transformer features. Subsequently, we propose an adaptation strategy to leverage unlabeled images containing anomalies. The adaptation scheme is accomplished by assigning pseudo-labels to unlabeled images and using two separate MAE based modules to model the normative and anomalous distributions of pseudo-labeled images. The effectiveness of the proposed adaptation strategy is evaluated with different anomaly ratios in an unlabeled training set. AMAE leads to consistent performance gains over competing self-supervised and dual distribution anomaly detection methods, setting the new state-of-the-art on three public chest X-ray benchmarks - RSNA, NIH-CXR, and VinDr-CXR.

关键词： Anomaly detection Chest X-ray masked autoencoder

来源：评论

学校读者我要写书评

暂无评论

AstroMAE: Redshift Prediction Using a masked autoencoder with a Novel Fine-Tuning Architecture 20

AstroMAE: Redshift Prediction Using a Masked Autoencoder wit...

引用

20th IEEE International Conference on E-Science (E-Science)

作者： Fathkouhi, Amirreza Dolatpour Fox, Geoffrey Charles Univ Virginia Dept Comp Sci Charlottesville VA 22903 USA

ISBN: (纸本)9798350365627;9798350365610

Redshift prediction is a fundamental task in astronomy, essential for understanding the expansion of the universe and determining the distances of astronomical objects. Accurate redshift prediction plays a crucial role in advancing our knowledge of the cosmos. Machine learning (ML) methods, renowned for their precision and speed, offer promising solutions for this complex task. However, traditional ML algorithms heavily depend on labeled data and task-specific feature extraction. To overcome these limitations, we introduce AstroMAE, an innovative approach that pretrains a vision transformer encoder using a masked autoencoder method on Sloan Digital Sky Survey (SDSS) images. This technique enables the encoder to capture the global patterns within the data without relying on labels. To the best of our knowledge, AstroMAE represents the first application of a masked autoencoder to astronomical data. By ignoring labels during the pretraining phase, the encoder gathers a general understanding of the data. The pretrained encoder is subsequently fine-tuned within a specialized architecture tailored for redshift prediction. We evaluate our model against various vision transformer architectures and CNN-based models, demonstrating the superior performance of AstroMAE's pretrained model and fine-tuning architecture.

关键词： masked autoencoder Redshift prediction SDSS Self-supervised learning Fine-tuning Deep learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：