This work explores the effectiveness of masked image modelling for learning representations of retinal OCT images. To this end, we leverage masked Autoencoders (MAE), a simple and scalable method for self-supervised l...
详细信息
ISBN:
(纸本)9783031731181;9783031731198
This work explores the effectiveness of masked image modelling for learning representations of retinal OCT images. To this end, we leverage masked Autoencoders (MAE), a simple and scalable method for self-supervised learning, to obtain a powerful and general representation for OCT images by training on 700K OCT images from 41K patients collected under real world clinical settings. We also provide the first extensive evaluation for a model of OCT on a challenging battery of 6 downstream tasks. Our model achieves strong performance when fully fine-tuned but can also serve as a versatile frozen feature extractor for many tasks using lightweight adapters. Furthermore, we propose an extension of the MAE pretraining to fuse OCT with an auxiliary modality, namely, IR fundus images and learn a joint model for both. We demonstrate our approach improves performance on a multimodal downstream application. Our experiments utilize most publicly available OCT datasets, thus enabling future comparisons. Our code and model weights are publicly available https://***/TheoPis/MIM_OCT.
Unsupervised Out-of-Distribution (OOD) detection consists in identifying anomalous regions in images leveraging only models trained on images of healthy anatomy. An established approach is to tokenize images and model...
详细信息
ISBN:
(纸本)9783031537660;9783031537677
Unsupervised Out-of-Distribution (OOD) detection consists in identifying anomalous regions in images leveraging only models trained on images of healthy anatomy. An established approach is to tokenize images and model the distribution of tokens with Auto-Regressive (AR) models. AR models are used to 1) identify anomalous tokens and 2) inpaint anomalous representations with in-distribution tokens. However, AR models are slow at inference time and prone to error accumulation issues which negatively affect OOD detection performance. Our novel method, MIM-OOD, overcomes both speed and error accumulation issues by replacing the AR model with two task-specific networks: 1) a transformer optimized to identify anomalous tokens and 2) a transformer optimized to in-paint anomalous tokens using masked image modelling (MIM). Our experiments with brain MRI anomalies show that MIM-OOD substantially outperforms AR models (DICE 0.458 vs 0.301) while achieving a nearly 25x speedup (9.5 s vs 244 s).
masked image modelling (MIM), a form of self-supervised learning, has garnered significant success in computer vision by improving image representations using unannotated data. Traditional MIMs typically employ a stra...
详细信息
masked image modelling (MIM), a form of self-supervised learning, has garnered significant success in computer vision by improving image representations using unannotated data. Traditional MIMs typically employ a strategy of random sampling across the image. However, this random masking technique may not be ideally suited for medical imaging, which possesses distinct characteristics divergent from natural images. In medical imaging, particularly in pathology, disease-related features are often exceedingly sparse and localized, while the remaining regions appear normal and undifferentiated. Additionally, medical images frequently accompany reports, directly pinpointing pathological changes' location. Inspired by this, we propose M asked medical ed ical I mage M odelling (MedIM), a novel approach, to our knowledge, the first research that employs radiological reports to guide the masking and restore the informative areas of images, encouraging the network to explore the stronger semantic representations from medical images. We introduce two mutual comprehensive masking strategies, knowledge-driven masking (KDM), and sentence-driven masking (SDM). KDM uses Medical Subject Headings (MeSH) words unique to radiology reports to identify symptom clues mapped to MeSH words ( e.g. , cardiac, edema, vascular, pulmonary) and guide the mask generation. Recognizing that radiological reports often comprise several sentences detailing varied findings, SDM integrates sentence-level information to identify key regions for masking. MedIM reconstructs images informed by this masking from the KDM and SDM modules, promoting a comprehensive and enriched medical image representation. Our extensive experiments on seven downstream tasks covering multi-label/class image classification, pneumothorax segmentation, and medical image-report analysis, demonstrate that MedIM with report-guided masking achieves competitive performance. Our method substantially outperforms imageNet pre-training, MIM-
Vision transformers combined with self-supervised learning have enabled the development of models which scale across large datasets for several downstream tasks, including classification, segmentation, and detection. ...
详细信息
Vision transformers combined with self-supervised learning have enabled the development of models which scale across large datasets for several downstream tasks, including classification, segmentation, and detection. However, the potential of these models for low-shot learning across several downstream tasks remains largely under explored. In this work, we conduct a systematic examination of different self-supervised pretext tasks, namely contrastive learning, clustering, and masked image modelling, to assess their low-shot capabilities by comparing different pretrained models. In addition, we explore the impact of various collapse avoidance techniques, such as centring, ME-MAX, and sinkhorn, on these downstream tasks. Based on our detailed analysis, we introduce a framework that combines mask imagemodelling and clustering as pretext tasks. This framework demonstrates superior performance across all examined low-shot downstream tasks, including multi-class classification, multi-label classification and semantic segmentation. Furthermore, when testing the model on large-scale datasets, we show performance gains in various tasks.
Semi-supervised segmentation is highly significant in 3D medical image segmentation. The typical solutions adopt a teacher-student dual-model architecture, and they constrain the two models' decision consistency o...
详细信息
Semi-supervised segmentation is highly significant in 3D medical image segmentation. The typical solutions adopt a teacher-student dual-model architecture, and they constrain the two models' decision consistency on the same segmentation task. However, the scarcity of medical samples can lower the diversity of tasks, reducing the effectiveness of consistency constraint. The issue can further worsen as the weights of the models gradually become synchronized. In this work, we have proposed to construct diverse joint-tasks using masked image modelling for enhancing the reliability of the consistency constraint, and develop a novel architecture consisting of a single teacher but multiple students to enjoy the additional knowledge decoupled from the synchronized weights. Specifically, the teacher and student models 'see' varied randomly-masked versions of an input, and are trained to segment the same targets but reconstruct different missing regions concurrently. Such joint-task of segmentation and reconstruction can have the two learners capture related but complementary features to derive instructive knowledge when constraining their consistency. Moreover, two extra students join the original one to perform an inter-student learning. The three students share the same encoding but different decoding designs, and learn decoupled knowledge by constraining their mutual consistencies, preventing themselves from suboptimally converging to the biased predictions of the dictatorial teacher. Experimental on four medical datasets show that our approach performs better than six mainstream semi-supervised methods. Particularly, our approach achieves at least 0.61% and 0.36% higher Dice and Jaccard values, respectively, than the most competitive approach on our in-house dataset. The code will be released at https://***/zxmboshi/DDL.
Detecting various types of stresses (nutritional, water, nitrogen, etc.) in agricultural fields is critical for farmers to ensure maximum productivity. However, stresses show up in different shapes and sizes across di...
详细信息
ISBN:
(纸本)9798350365474
Detecting various types of stresses (nutritional, water, nitrogen, etc.) in agricultural fields is critical for farmers to ensure maximum productivity. However, stresses show up in different shapes and sizes across different crop types and varieties. Hence, this is posed as an anomaly detection task in agricultural images. Accurate anomaly detection in agricultural UAV images is vital for early identification of field irregularities. Traditional supervised learning faces challenges in adapting to diverse anomalies, necessitating extensive annotated data. In this work, we overcome this limitation with self-supervised learning using a maskedimage modeling approach. masked Autoencoders (MAE) extract meaningful normal features from unlabeled image samples which produces high reconstruction error for the abnormal pixels during reconstruction. To remove the need of using only "normal" data while training, we use an anomaly suppression loss mechanism that effectively minimizes the reconstruction of anomalous pixels and allows the model to learn anomalous areas without explicitly separating "normal" images for training. Evaluation on the Agriculture-Vision data challenge shows a 6.3% mIOU score improvement in comparison to prior state of the art in unsupervised and self-supervised methods. A single model generalizes across all the anomaly categories in the Agri-Vision Challenge Dataset [5].
Recently, there has been a growing interest in RGB-D object tracking thanks to its promising performance achieved by combining visual information with auxiliary depth cues. However, the limited volume of annotated RGB...
详细信息
Recently, there has been a growing interest in RGB-D object tracking thanks to its promising performance achieved by combining visual information with auxiliary depth cues. However, the limited volume of annotated RGB-D tracking data for offline training has hindered the development of a dedicated end -to -end RGB-D tracker design. Consequently, the current state-of-the-art RGB-D trackers mainly rely on the visual branch to support the appearance modelling, with the depth map utilised for elementary information fusion or failure reasoning of online tracking. Despite the achieved progress, the current paradigms for RGB-D tracking have not fully harnessed the inherent potential of depth information, nor fully exploited the synergy of vision -depth information. Considering the availability of ample unlabelled RGB-D data and the advancement in self supervised learning, we address the problem of self -supervised learning for RGB-D object tracking. Specifically, an RGB-D backbone network is trained on unlabelled RGB-D datasets using masked image modelling. To train the network, the masking mechanism creates a selective occlusion of the input visible image to force the corresponding aligned depth map to help with discerning and learning vision -depth cues for the reconstruction of the masked visible image. As a result, the pre -trained backbone network is capable of cooperating with crucial visual and depth features of the diverse objects and background in the RGB-D image. The intermediate RGB-D features output by the pre -trained network can effectively be used for object tracking. We thus embed the pre -trained RGB-D network into a transformer -based tracking framework for stable tracking. Comprehensive experiments and the analysis of the results obtained on several RGB-D tracking datasets demonstrate the effectiveness and superiority of the proposed RGB-D self -supervised learning framework and the following tracking approach.
暂无评论