检索结果-内蒙古大学图书馆

MaskRecon: High-quality human reconstruction via masked autoencoders using a single RGB-D image

NEUROCOMPUTING 2024年 609卷

作者： Li, Xing Fan, Yangyu Guo, Zhe Rao, Zhibo Duan, Yu Liu, Shiya Northwestern Polytech Univ Sch Elect & Informat Xian 710129 Peoples R China Nanchang Hangkong Univ Sch Informat Engn Nanchang 330063 Peoples R China Northwestern Polytech Univ Sch Comp Sci Xian 710129 Peoples R China Content Prod Ctr Virtual Real Beijing 101318 Peoples R China

In this paper, we explore reconstructing high-quality clothed 3D humans from a single RGB-D image, assuming that virtual humans can be represented by front-view and back-view depths. Due to the scarcity of captured real RGB-D human images, we employ rendered images to train our method. However, rendered images lack background with significant depth variation in silhouettes, leading to shape prediction inaccuracies and noise. To mitigate this issue, we introduce a pseudo-multi-task framework, which incorporates a Conditional Generative Adversarial Network (CGAN) to infer back-view RGB-D images and a self-supervised masked autoencoder (MAE) to capture latent structural information of the human body. Additionally, we propose a Multi-scale Feature Fusion (MFF) module to effectively merge structural information and conditional features at various scales. Our method surpasses many existing techniques, as demonstrated through evaluations on the Thuman, RenderPeople, and BUFF datasets. Notably, our approach excels in reconstructing high-quality human models, even under challenging conditions such as complex poses and loose clothing, both on rendered and real-world images. Codes are available at https://***/Archaic-Atom/MaskRecon.

关键词： Reconstruct clothed 3D human Single RGB-D image Pseudo-multi-task framework masked autoencoder Multi-scale feature fusion

来源：评论

学校读者我要写书评

暂无评论

LoMAE: Simple Streamlined Low-Level masked autoencoders for Robust, Generalized, and Interpretable Low-Dose CT Denoising

引用

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS 2024年第11期28卷 6815-6827页

作者： Wang, Dayang Han, Shuo Xu, Yongshun Wu, Zhan Zhou, Li Morovati, Bahareh Yu, Hengyong Univ Massachusetts Lowell Dept Elect & Comp Engn Lowell MA 01854 USA Southeast Univ Lab Image Sci & Technol Nanjing 210096 Peoples R China Southeast Univ Key Lab Comp Network & Informat Integrat Minist Educ Nanjing 210096 Peoples R China

Low-dose computed tomography (LDCT) offers reduced X-ray radiation exposure but at the cost of compromised image quality, characterized by increased noise and artifacts. Recently, transformer models emerged as a promising avenue to enhance LDCT image quality. However, the success of such models relies on a large amount of paired noisy and clean images, which are often scarce in clinical settings. In computer vision and natural language processing, masked autoencoders (MAE) have been recognized as a powerful self-pretraining method for transformers, due to their exceptional capability to extract representative features. However, the original pretraining and fine-tuning design fails to work in low-level vision tasks like denoising. In response to this challenge, we redesign the classical encoder-decoder learning model and facilitate a simple yet effective streamlined low-level vision MAE, referred to as LoMAE, tailored to address the LDCT denoising problem. Moreover, we introduce an MAE-GradCAM method to shed light on the latent learning mechanisms of the MAE/LoMAE. Additionally, we explore the LoMAE's robustness and generability across a variety of noise levels. Experimental findings show that the proposed LoMAE enhances the denoising capabilities of the transformer and substantially reduce their dependency on high-quality, ground-truth data. It also demonstrates remarkable robustness and generalizability over a spectrum of noise levels. In summary, the proposed LoMAE provides promising solutions to the major issues in LDCT including interpretability, ground truth data dependency, and model robustness/generalizability.

关键词： Noise reduction Noise Transformers Computed tomography Decoding Robustness Data models Low-dose CT masked autoencoder self-pretraining transformer

来源：评论

学校读者我要写书评

暂无评论

EXTENDING AUDIO masked autoencoderS TOWARD AUDIO RESTORATION

EXTENDING AUDIO MASKED AUTOENCODERS TOWARD AUDIO RESTORATION

引用

IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

作者： Zhong, Zhi Shi, Hao Hirano, Masato Shimada, Kazuki Tateishi, Kazuya Shibuya, Takashi Takahashi, Shusuke Mitsufuji, Yuki Sony Grp Corp Tokyo Japan Kyoto Univ Kyoto Japan Sony Res Kyoto Japan

ISBN: (纸本)9798350323726

Audio classification and restoration are among major downstream tasks in audio signal processing. However, restoration derives less of a benefit from pretrained models compared to the overwhelming success of pretrained models in classification tasks. Due to such unbalanced benefits, there has been rising interest in how to improve the performance of pretrained models for restoration tasks, e.g., speech enhancement (SE). Previous works have shown that the features extracted by pretrained audio encoders are effective for SE tasks, but these speech-specialized encoder-only models usually require extra decoders to become compatible with SE, and involve complicated pretraining procedures or complex data augmentation. Therefore, in pursuit of a universal audio model, the audio masked autoencoder (MAE) whose backbone is the autoencoder of Vision Transformers (ViT-AE), is extended from audio classification to SE, a representative restoration task with well-established evaluation standards. ViT-AE learns to restore masked audio signal via a mel-to-mel mapping during pretraining, which is similar to restoration tasks like SE. We propose variations of ViT-AE for a better SE performance, where the mel-to-mel variations yield high scores in non-intrusive metrics and the STFT-oriented variation is effective at intrusive metrics such as PESQ. Different variations can be used in accordance with the scenarios. Comprehensive evaluations reveal that MAE pretraining is beneficial to SE tasks and help the ViT-AE to better generalize to out-of-domain distortions. We further found that large-scale noisy data of general audio sources, rather than clean speech, is sufficiently effective for pretraining.

关键词： Audio classification audio restoration speech enhancement masked autoencoder vision transformer

来源：评论

学校读者我要写书评

暂无评论

Tackling Missing Modalities in Audio-Visual Representation Learning Using masked autoencoders 25

Tackling Missing Modalities in Audio-Visual Representation L...

引用

25th Interspeech Conference

作者： Chochlakis, Georgios Lavania, Chandrashekhar Mathur, Prashant Han, Kyu J. Univ Southern Calif Los Angeles CA 90007 USA AWS AI Labs Seattle WA USA Amazon Seattle WA USA

Audio-visual representations leverage information from both modalities to produce joint representations. Such representations have demonstrated their usefulness in a variety of tasks. However, both modalities incorporated in the learned model might not necessarily be present all the time during inference. In this work, we study whether and how we can make existing models, trained under pristine conditions, robust to partial modality loss without retraining them. We propose to use a curriculum trained masked autoencoder, to impute features of missing input segments. We show that fine-tuning of classification heads with the imputed features makes the base models robust on multiple downstream tasks like emotion recognition and Lombard speech recognition. Among the 12 cases evaluated, our method outperforms strong baselines in 10 instances.

关键词： video speech masked autoencoder missing modality

来源：评论

学校读者我要写书评

暂无评论

LR-MAE: Locate while Reconstructing with masked autoencoders for Point Cloud Self-supervised Learning

LR-MAE: Locate while Reconstructing with Masked Autoencoders...

引用

IEEE International Conference on Multimedia and Expo (ICME)

作者： Ji, Huizhen Zha, Yaohua Liao, Qingmin Tsinghua Univ Tsinghua Shenzhen Int Grad Sch Shenzhen Peoples R China

ISBN: (纸本)9798350390155;9798350390162

As an efficient self-supervised pre-training approach, masked autoencoder (MAE) has shown promising improvement across various 3D point cloud understanding tasks. However, the pretext task of existing point-based MAE is to reconstruct the geometry of masked points only, hence it learns features at lower semantic levels which is not appropriate for high-level downstream tasks. To address this challenge, we propose a novel self-supervised approach named Locate while Reconstructing with masked autoencoders (LR-MAE). Specifically, a multi-head decoder is designed to simultaneously localize the global position of masked patches while reconstructing masked points, aimed at learning better semantic features that align with downstream tasks. Moreover, we design a random query patch detection strategy for 3D object detection tasks in the pre-training stage, which significantly boosts the model performance with faster convergence speed. Extensive experiments show that our LR-MAE achieves superior performance on various point cloud understanding tasks. By fine-tuning on downstream datasets, LRMAE outperforms the Point-MAE baseline by 3.65% classification accuracy on the ScanObjectNN dataset, and significantly exceeds the 3DETR baseline by 6.1% AP50 on the ScanNetV2 dataset. Code is available at https://***/cathy-ji/LR-MAE.

关键词： Point cloud Self-supervised learning masked autoencoder Transformer

来源：评论

学校读者我要写书评

暂无评论

Bootstrapped masked autoencoders for Vision BERT Pretraining 1

引用

17th European Conference on Computer Vision (ECCV)

作者： Dong, Xiaoyi Bao, Jianmin Zhang, Ting Chen, Dongdong Zhang, Weiming Yuan, Lu Chen, Dong Wen, Fang Yu, Nenghai Univ Sci & Technol China Hefei Peoples R China Microsoft Res Asia Beijing Peoples R China Microsoft Cloud AI Redmond WA 98052 USA

ISBN: (数字)9783031200564

ISBN: (纸本)9783031200557;9783031200564

We propose bootstrapped masked autoencoders (BootMAE), a new approach for vision BERT pretraining. BootMAE improves the original masked autoencoders (MAE) with two core designs: 1) momentum encoder that provides online feature as extra BERT prediction targets;2) target-aware decoder that tries to reduce the pressure on the encoder to memorize target-specific information in BERT pretraining. The first design is motivated by the observation that using a pretrained MAE to extract the features as the BERT prediction target for masked tokens can achieve better pretraining performance. Therefore, we add a momentum encoder in parallel with the original MAE encoder, which bootstraps the pretraining performance by using its own representation as the BERT prediction target. In the second design, we introduce target-specific information (e.g., pixel values of unmasked patches) from the encoder directly to the decoder to reduce the pressure on the encoder of memorizing the target-specific information. Thus, the encoder focuses on semantic modeling, which is the goal of BERT pretraining, and does not need to waste its capacity in memorizing the information of unmasked tokens related to the prediction target. Through extensive experiments, our BootMAE achieves 84.2% Top-1 accuracy on ImageNet-1K with ViT-B backbone, outperforming MAE by +0.8% under the same pre-training epochs. BootMAE also gets +1.0 mIoU improvements on semantic segmentation on ADE20K and +1.3 box AP, +1.4 mask AP improvement on object detection and segmentation on COCO dataset. Code is released at https://***/LightDXY/BootMAE.

关键词： Vision transformer Bert pre-training Bootstrap masked autoencoder

来源：评论

学校读者我要写书评

暂无评论

Unsupervised Pre-Training Using masked autoencoders for ECG Analysis

Unsupervised Pre-Training Using Masked Autoencoders for ECG ...

引用

2023 IEEE Biomedical Circuits and Systems Conference, BioCAS 2023

作者： Wang, Guoxin Wang, Qingyuan Iyer, Ganesh Neelakanta Nag, Avishek John, Deepu University College Dublin School of Electrical and Electronic Engineering Dublin 4 Ireland National University of Singapore Department of Computer Science Singapore

ISBN: (纸本)9798350300260

Unsupervised learning methods have become increasingly important in deep learning due to their demonstrated large utilization of datasets and higher accuracy in computer vision and natural language processing tasks. There is a growing trend to extend unsupervised learning methods to other domains, which helps to utilize a large amount of unlabelled data. This paper proposes an unsupervised pre-training technique based on masked autoencoder (MAE) for electrocardiogram (ECG) signals. In addition, we propose a task-specific fine-tuning to form a complete framework for ECG analysis. The framework is high-level, universal, and not individually adapted to specific model architectures or tasks. Experiments are conducted using various model architectures and large-scale datasets, resulting in an accuracy of 94.39% on the MITDB dataset for ECG arrhythmia classification task. The result shows a better performance for the classification of previously unseen data for the proposed approach compared to fully supervised methods. © 2023 IEEE.

关键词： Big Data Electrocardiogram masked autoencoder Unsupervised Learning

来源：评论

学校读者我要写书评

暂无评论

SAGHOG: Self-supervised autoencoder for Generating HOG Features for Writer Retrieval 18th

SAGHOG: Self-supervised Autoencoder for Generating HOG Featu...

引用

18th International Conference on Document Analysis and Recognition (ICDAR)

作者： Peer, Marco Kleber, Florian Sablatnig, Robert TU Wien Comp Vis Lab Vienna Austria

ISBN: (纸本)9783031705359;9783031705366

This paper introduces Saghog, a self-supervised pretraining strategy for writer retrieval using HOG features of the binarized input image. Our preprocessing involves the application of the Segment Anything technique to extract handwriting from various datasets, ending up with about 24k documents, followed by training a vision transformer on reconstructing masked patches of the handwriting. Saghog is then finetuned by appending NetRVLAD as an encoding layer to the pretrained encoder. Evaluation of our approach on three historical datasets, Historical-WI, HisFrag20, and GRK-Papyri, demonstrates the effectiveness of Saghog for writer retrieval. Additionally, we provide ablation studies on our architecture and evaluate un- and supervised finetuning. Notably, on HisFrag20, Saghog outperforms related work with a mAP of 57.2% - a margin of 11.6% to the current state of the art, showcasing its robustness on challenging data, and is competitive on even small datasets, e.g. GRK-Papyri, where we achieve a Top-1 accuracy of 58.0%.

关键词： Writer Retrieval Self-Supervised Learning masked autoencoder Document Analysis

来源：评论

学校读者我要写书评

暂无评论

Attentive Symmetric autoencoder for Brain MRI Segmentation 25th

Attentive Symmetric Autoencoder for Brain MRI Segmentation

引用

25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI)

作者： Huang, Junjia Li, Haofeng Li, Guanbin Wan, Xiang Chinese Univ Hong Kong Shenzhen Res Inst Big Data Shenzhen Peoples R China Sun Yat Sen Univ Sch Comp Sci & Engn Guangzhou Peoples R China Pazhou Lab Guangzhou 510330 Peoples R China

ISBN: (纸本)9783031164439;9783031164422

Self-supervised learning methods based on image patch reconstruction have witnessed great success in training auto-encoders, whose pre-trained weights can be transferred to fine-tune other downstream tasks of image understanding. However, existing methods seldom study the various importance of reconstructed patches and the symmetry of anatomical structures, when they are applied to 3D medical images. In this paper we propose a novel Attentive Symmetric Auto-encoder (ASA) based on Vision Transformer (ViT) for 3D brain MRI segmentation tasks. We conjecture that forcing the auto-encoder to recover informative image regions can harvest more discriminative representations, than to recover smooth image patches. Then we adopt a gradient based metric to estimate the importance of each image patch. In the pre-training stage, the proposed auto-encoder pays more attention to reconstruct the informative patches according to the gradient metrics. Moreover, we resort to the prior of brain structures and develop a Symmetric Position Encoding (SPE) method to better exploit the correlations between long-range but spatially symmetric regions to obtain effective features. Experimental results show that our proposed attentive symmetric auto-encoder outperforms the state-of-the-art self-supervised learning methods and medical image segmentation models on three brain MRI segmentation benchmarks.

关键词： masked autoencoder Self-supervised learning Brain MRI segmentation Position encoding

来源：评论

学校读者我要写书评

暂无评论

Swin MAE: masked autoencoders for small datasets

引用

COMPUTERS IN BIOLOGY AND MEDICINE 2023年 161卷 107037-107037页

作者： Xu, Zi'an Dai, Yin Liu, Fayu Chen, Weibing Liu, Yue Shi, Lifu Liu, Sheng Zhou, Yuhang Northeastern Univ Shenyang Peoples R China China Med Univ Shenyang Peoples R China Liaoning Jiayin Med Technol Co Shenyang Peoples R China

The development of deep learning models in medical image analysis is majorly limited by the lack of large -sized and well-annotated datasets. Unsupervised learning does not require labels and is more suitable for solving medical image analysis problems. However, most unsupervised learning methods must be applied to large datasets. To make unsupervised learning applicable to small datasets, we proposed Swin MAE, a masked autoencoder with Swin Transformer as its backbone. Even on a dataset of only a few thousand medical images, Swin MAE can still learn useful semantic features purely from images without using any pre-trained models. It can equal or even slightly outperform the supervised model obtained by Swin Transformer trained on ImageNet in the transfer learning results of downstream tasks. Compared to MAE, Swin MAE brought a performance improvement of twice and five times for downstream tasks on BTCV and our parotid dataset, respectively. The code is publicly available at https://***/Zian-Xu/Swin-MAE.

关键词： masked autoencoder Small dataset Unsupervised learning MAE Swin transformer

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：