检索结果-内蒙古大学图书馆

11th International Workshop on Ophthalmic Medical image Analysis

作者： Pissas, Theodoros Marquez-Neila, Pablo Wolfe, Sebastian Zinkernagel, Martin Sznitman, Raphael Univ Bern Bern Switzerland Inselspital Bern Dept Ophthalmol Bern Switzerland

ISBN: (纸本)9783031731181;9783031731198

This work explores the effectiveness of masked image modelling for learning representations of retinal OCT images. To this end, we leverage masked Autoencoders (MAE), a simple and scalable method for self-supervised learning, to obtain a powerful and general representation for OCT images by training on 700K OCT images from 41K patients collected under real world clinical settings. We also provide the first extensive evaluation for a model of OCT on a challenging battery of 6 downstream tasks. Our model achieves strong performance when fully fine-tuned but can also serve as a versatile frozen feature extractor for many tasks using lightweight adapters. Furthermore, we propose an extension of the MAE pretraining to fuse OCT with an auxiliary modality, namely, IR fundus images and learn a joint model for both. We demonstrate our approach improves performance on a multimodal downstream application. Our experiments utilize most publicly available OCT datasets, thus enabling future comparisons. Our code and model weights are publicly available https://***/TheoPis/MIM_OCT.

关键词： masked image modelling Multimodal learning OCT IR

来源：评论

学校读者我要写书评

暂无评论

MIM-OOD: Generative masked image modelling for Out-of-Distribution Detection in Medical images 3rd

MIM-OOD: Generative Masked Image Modelling for Out-of-Distri...

引用

3rd Workshop on Deep Generative Models for Medical image Computing and Computer Assisted Intervention (DGM4MICCAI) at the 26th International Conference on Medical image Computing and Computer Assisted Intervention (MICCAI)

作者： Marimont, Sergio Naval Siomos, Vasilis Tarroni, Giacomo City Univ London CitAI Res Ctr London England Imperial Coll London BioMedIA London England

ISBN: (纸本)9783031537660;9783031537677

Unsupervised Out-of-Distribution (OOD) detection consists in identifying anomalous regions in images leveraging only models trained on images of healthy anatomy. An established approach is to tokenize images and model the distribution of tokens with Auto-Regressive (AR) models. AR models are used to 1) identify anomalous tokens and 2) inpaint anomalous representations with in-distribution tokens. However, AR models are slow at inference time and prone to error accumulation issues which negatively affect OOD detection performance. Our novel method, MIM-OOD, overcomes both speed and error accumulation issues by replacing the AR model with two task-specific networks: 1) a transformer optimized to identify anomalous tokens and 2) a transformer optimized to in-paint anomalous tokens using masked image modelling (MIM). Our experiments with brain MRI anomalies show that MIM-OOD substantially outperforms AR models (DICE 0.458 vs 0.301) while achieving a nearly 25x speedup (9.5 s vs 244 s).

关键词： out-of-distribution detection unsupervised learning masked image modelling

来源：评论

学校读者我要写书评

暂无评论

Rethinking masked image modelling for medical image representation

引用

MEDICAL image ANALYSIS 2024年 98卷 103304页

作者： Xie, Yutong Gu, Lin Harada, Tatsuya Zhang, Jianpeng Xia, Yong Wu, Qi Univ Adelaide Adelaide Australia RIKEN AIP Tokyo Japan Univ Tokyo RCAST Tokyo Japan Zhejiang Univ Coll Comp Sci & Technol Hangzhou Peoples R China Northwestern Polytech Univ Sch Comp Sci & Engn Xian 710072 Peoples R China Northwestern Polytech Univ Ningbo Inst Ningbo 315048 Peoples R China

masked image modelling (MIM), a form of self-supervised learning, has garnered significant success in computer vision by improving image representations using unannotated data. Traditional MIMs typically employ a strategy of random sampling across the image. However, this random masking technique may not be ideally suited for medical imaging, which possesses distinct characteristics divergent from natural images. In medical imaging, particularly in pathology, disease-related features are often exceedingly sparse and localized, while the remaining regions appear normal and undifferentiated. Additionally, medical images frequently accompany reports, directly pinpointing pathological changes' location. Inspired by this, we propose M asked medical ed ical I mage M odelling (MedIM), a novel approach, to our knowledge, the first research that employs radiological reports to guide the masking and restore the informative areas of images, encouraging the network to explore the stronger semantic representations from medical images. We introduce two mutual comprehensive masking strategies, knowledge-driven masking (KDM), and sentence-driven masking (SDM). KDM uses Medical Subject Headings (MeSH) words unique to radiology reports to identify symptom clues mapped to MeSH words ( e.g. , cardiac, edema, vascular, pulmonary) and guide the mask generation. Recognizing that radiological reports often comprise several sentences detailing varied findings, SDM integrates sentence-level information to identify key regions for masking. MedIM reconstructs images informed by this masking from the KDM and SDM modules, promoting a comprehensive and enriched medical image representation. Our extensive experiments on seven downstream tasks covering multi-label/class image classification, pneumothorax segmentation, and medical image-report analysis, demonstrate that MedIM with report-guided masking achieves competitive performance. Our method substantially outperforms imageNet pre-training, MIM-

关键词： Medical image representations masked image modelling Visual-language pre-training

来源：评论

学校读者我要写书评

暂无评论

Investigating Self-Supervised Methods for Label-Efficient Learning

引用

INTERNATIONAL JOURNAL OF COMPUTER VISION 2025年 1-16页

作者： Nandam, Srinivasa Rao Atito, Sara Feng, Zhenhua Kittler, Josef Awais, Muhammed Univ Surrey Surrey Inst People Ctr AI PAI Guildford England Univ Surrey Ctr Vis Speech & Signal Proc CVSSP Guildford GU2 7XH Surrey England Jiangnan Univ Sch Artificial Intelligence & Comp Sci Wuxi 214122 Jiangsu Peoples R China

Vision transformers combined with self-supervised learning have enabled the development of models which scale across large datasets for several downstream tasks, including classification, segmentation, and detection. However, the potential of these models for low-shot learning across several downstream tasks remains largely under explored. In this work, we conduct a systematic examination of different self-supervised pretext tasks, namely contrastive learning, clustering, and masked image modelling, to assess their low-shot capabilities by comparing different pretrained models. In addition, we explore the impact of various collapse avoidance techniques, such as centring, ME-MAX, and sinkhorn, on these downstream tasks. Based on our detailed analysis, we introduce a framework that combines mask image modelling and clustering as pretext tasks. This framework demonstrates superior performance across all examined low-shot downstream tasks, including multi-class classification, multi-label classification and semantic segmentation. Furthermore, when testing the model on large-scale datasets, we show performance gains in various tasks.

关键词： Vision transformers Self-supervised learning Low-shot learning masked image modelling

来源：评论

学校读者我要写书评

暂无评论

Robust Semi-Supervised 3D Medical image Segmentation With Diverse Joint-Task Learning and Decoupled Inter-Student Learning

引用

IEEE TRANSACTIONS ON MEDICAL IMAGING 2024年第6期43卷 2317-2331页

作者： Zhou, Quan Yu, Bin Xiao, Feng Ding, Mingyue Wang, Zhiwei Zhang, Xuming Huazhong Univ Sci & Technol Coll Life Sci & Technol Dept Biomed Engn Wuhan 430074 Peoples R China Wuhan Univ Zhongnan Hosp Dept Radiol Wuhan 430071 Peoples R China Huazhong Univ Sci & Technol Collaborat Innovat Ctr Biomed Engn Sch Engn Sci MoE Key Lab Biomed Photon Wuhan 430074 Peoples R China

Semi-supervised segmentation is highly significant in 3D medical image segmentation. The typical solutions adopt a teacher-student dual-model architecture, and they constrain the two models' decision consistency on the same segmentation task. However, the scarcity of medical samples can lower the diversity of tasks, reducing the effectiveness of consistency constraint. The issue can further worsen as the weights of the models gradually become synchronized. In this work, we have proposed to construct diverse joint-tasks using masked image modelling for enhancing the reliability of the consistency constraint, and develop a novel architecture consisting of a single teacher but multiple students to enjoy the additional knowledge decoupled from the synchronized weights. Specifically, the teacher and student models 'see' varied randomly-masked versions of an input, and are trained to segment the same targets but reconstruct different missing regions concurrently. Such joint-task of segmentation and reconstruction can have the two learners capture related but complementary features to derive instructive knowledge when constraining their consistency. Moreover, two extra students join the original one to perform an inter-student learning. The three students share the same encoding but different decoding designs, and learn decoupled knowledge by constraining their mutual consistencies, preventing themselves from suboptimally converging to the biased predictions of the dictatorial teacher. Experimental on four medical datasets show that our approach performs better than six mainstream semi-supervised methods. Particularly, our approach achieves at least 0.61% and 0.36% higher Dice and Jaccard values, respectively, than the most competitive approach on our in-house dataset. The code will be released at https://***/zxmboshi/DDL.

关键词： image segmentation Three-dimensional displays Task analysis Training Predictive models Synchronization Electronics packaging Semi-supervised 3D image segmentation joint-tasks masked image modelling multiple students

来源：评论

学校读者我要写书评

暂无评论

Label-free Anomaly Detection in Aerial Agricultural images with masked image Modeling

Label-free Anomaly Detection in Aerial Agricultural Images w...

引用

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

作者： Shikhar, Sambal Sobti, Anupam Plaksha Univ Mohali Punjab India

ISBN: (纸本)9798350365474

Detecting various types of stresses (nutritional, water, nitrogen, etc.) in agricultural fields is critical for farmers to ensure maximum productivity. However, stresses show up in different shapes and sizes across different crop types and varieties. Hence, this is posed as an anomaly detection task in agricultural images. Accurate anomaly detection in agricultural UAV images is vital for early identification of field irregularities. Traditional supervised learning faces challenges in adapting to diverse anomalies, necessitating extensive annotated data. In this work, we overcome this limitation with self-supervised learning using a masked image modeling approach. masked Autoencoders (MAE) extract meaningful normal features from unlabeled image samples which produces high reconstruction error for the abnormal pixels during reconstruction. To remove the need of using only "normal" data while training, we use an anomaly suppression loss mechanism that effectively minimizes the reconstruction of anomalous pixels and allows the model to learn anomalous areas without explicitly separating "normal" images for training. Evaluation on the Agriculture-Vision data challenge shows a 6.3% mIOU score improvement in comparison to prior state of the art in unsupervised and self-supervised methods. A single model generalizes across all the anomaly categories in the Agri-Vision Challenge Dataset [5].

关键词： Anomaly Detection Computer Vision masked image modelling Precision Agriculture UAV images

来源：评论

学校读者我要写书评

暂无评论

Self-supervised learning for RGB-D object tracking

引用

PATTERN RECOGNITION 2024年 155卷

作者： Zhu, Xue-Feng Xu, Tianyang Atito, Sara Awais, Muhammad Wu, Xiao-Jun Feng, Zhenhua Kittler, Josef Jiangnan Univ Sch Artificial Intelligence & Comp Sci Wuxi 214122 Jiangsu Peoples R China Univ Surrey Ctr Vis Speech & Signal Proc Guildford GU2 7XH England Univ Surrey Sch Comp Sci & Elect Engn Guildford GU2 7XH England

Recently, there has been a growing interest in RGB-D object tracking thanks to its promising performance achieved by combining visual information with auxiliary depth cues. However, the limited volume of annotated RGB-D tracking data for offline training has hindered the development of a dedicated end -to -end RGB-D tracker design. Consequently, the current state-of-the-art RGB-D trackers mainly rely on the visual branch to support the appearance modelling, with the depth map utilised for elementary information fusion or failure reasoning of online tracking. Despite the achieved progress, the current paradigms for RGB-D tracking have not fully harnessed the inherent potential of depth information, nor fully exploited the synergy of vision -depth information. Considering the availability of ample unlabelled RGB-D data and the advancement in self supervised learning, we address the problem of self -supervised learning for RGB-D object tracking. Specifically, an RGB-D backbone network is trained on unlabelled RGB-D datasets using masked image modelling. To train the network, the masking mechanism creates a selective occlusion of the input visible image to force the corresponding aligned depth map to help with discerning and learning vision -depth cues for the reconstruction of the masked visible image. As a result, the pre -trained backbone network is capable of cooperating with crucial visual and depth features of the diverse objects and background in the RGB-D image. The intermediate RGB-D features output by the pre -trained network can effectively be used for object tracking. We thus embed the pre -trained RGB-D network into a transformer -based tracking framework for stable tracking. Comprehensive experiments and the analysis of the results obtained on several RGB-D tracking datasets demonstrate the effectiveness and superiority of the proposed RGB-D self -supervised learning framework and the following tracking approach.

关键词： RGB-D tracking Self-supervised learning masked image modelling

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：