检索结果-内蒙古大学图书馆

ASiT: Local-Global Audio Spectrogram Vision Transformer for Event Classification

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 2024年 32卷 3684-3693页

作者： Ahmed, Sara Atito Ali Awais, Muhammad Wang, Wenwu Plumbley, Mark D. Kittler, Josef Univ Surrey CVSSP Guildford GU2 5XH Surrey England Univ Surrey Surrey Inst People Ctr AI Guildford GU27XH Surrey England

Transformers, which were originally developed for natural language processing, have recently generated significant interest in the computer vision and audio communities due to their flexibility in learning long-range relationships. Constrained by the data hungry nature of transformers and the limited amount of labelled data, most transformer-based models for audio tasks are finetuned from ImageNet pretrained models, despite the huge gap between the domain of natural images and audio. This has motivated the research in self-supervised pretraining of audio transformers, which reduces the dependency on large amounts of labeled data and focuses on extracting concise representations of audio spectrograms. In this paper, we propose Local-Global Audio Spectrogram vIsion Transformer, namely ASiT, a novel self-supervised learning framework that captures local and global contextual information by employing group masked model learning and self-distillation. We evaluate our pretrained models on both audio and speech classification tasks, including audio event classification, keyword spotting, and speaker identification. We further conduct comprehensive ablation studies, including evaluations of different pretraining strategies. The proposed ASiT framework significantly boosts the performance on all tasks and sets a new state-of-the-art performance in five audio and speech classification tasks, outperforming recent methods, including the approaches that use additional datasets for pretraining.

关键词： Spectrogram Transformers Task analysis Image reconstruction Computational modeling Context modeling Similarity learning Self-supervised learning vision transformers audio spectrogram group masked model learning audio classification

来源：评论

学校读者我要写书评

暂无评论

INVESTIGATING SELF-SUPERVISED METHODS FOR LABEL-EFFICIENT learning 31

INVESTIGATING SELF-SUPERVISED METHODS FOR LABEL-EFFICIENT LE...

引用

2024 International Conference on Image Processing

作者： Nandam, Srinivasa Atito, Sara Feng, Zhenhua Kittler, Josef Awais, Muhammad Univ Surrey Surrey Inst People Ctr AI Guildford GU2 7XH Surrey England Univ Surrey Ctr Vis Speech & Signal Proc CVSSP Guildford Surrey England

ISBN: (纸本)9798350349405;9798350349399

Vision transformers combined with self-supervised learning have enabled the development of models which scale across large datasets for several downstream tasks like classification, segmentation and detection. The low-shot learning capability of these models, across several low-shot downstream tasks, has been largely under explored. We perform a system level study of different self supervised pretext tasks, namely contrastive learning, clustering, and masked image modelling for their low-shot capabilities by comparing the pretrained models. In addition we also study the effects of collapse avoidance methods, namely centring, ME-MAX, sinkhorn, on these downstream tasks. Based on our detailed analysis, we introduce a framework involving both mask image modelling and clustering as pretext tasks, which performs better across all low-shot downstream tasks, including multi-class classification, multi-label classification and semantic segmentation. Furthermore, when testing the model on full scale datasets, we show performance gains in multi-class classification, multi-label classification and semantic segmentation.

关键词： Self-supervised learning Vision Transformers group masked model learning Deep learning

来源：评论

学校读者我要写书评

暂无评论

GMML IS ALL YOU NEED 30

GMML IS ALL YOU NEED

引用

30th IEEE International Conference on Image Processing (ICIP)

作者： Atito, Sara Awais, Muhammed Nandam, Srinivasa Kittler, Josef Univ Surrey Ctr Vis Speech & Signal Proc CVSSP Guildford Surrey England Univ Surrey Surrey Inst People Centred AI Guildford GU2 7XH Surrey England

ISBN: (纸本)9781728198354

Vision transformers (ViTs) have generated significant interest in the computer vision community because of their flexibility in exploiting contextual information, whether it is sharply confined local, or long range global. However, they are known to be data hungry and therefore often pretrained on large-scale datasets, e.g. JFT-300M or ImageNet. An ideal learning method would perform best regardless of the size of the dataset, a property lacked by current learning methods, with merely a few existing works studying ViTs with limited data. We propose group masked model learning (GMML), a self-supervised learning (SSL) method that is able to train ViTs and achieve state-of-the-art (SOTA) performance when pre-trained with limited data. The GMML uses the information conveyed by all concepts in the image. This is achieved by manipulating randomly groups of connected tokens, successively covering different meaningful parts of the image content, and then recovering the hidden information from the visible part of the concept. Unlike most of the existing SSL approaches, GMML does not require momentum encoder, nor relies on careful implementation details such as large batches and gradient stopping. Pretraining, finetuning, and evaluation codes are available under: https://***/GMML.

关键词： Self-supervised learning Vision Transformers group masked model learning Deep learning

来源：评论

学校读者我要写书评

暂无评论

SB-SSL: Slice-Based Self-supervised Transformers for Knee Abnormality Classification from MRI 1st

SB-SSL: Slice-Based Self-supervised Transformers for Knee Ab...

引用

1st Workshop on Medical Image learning with Limited and Noisy Data (MILLanD)

作者： Atito, Sara Anwar, Syed Muhammad Awais, Muhammad Kittler, Josef Univ Surrey Ctr Vis Speech & Signal Proc CVSSP Guildford England Childrens Natl Hosp Washington DC USA Surrey Inst People Centred AI Guildford England

ISBN: (纸本)9783031167607;9783031167591

The availability of large scale data with high quality ground truth labels is a challenge when developing supervised machine learning solutions for healthcare domain. Although, the amount of digital data in clinical workflows is increasing, most of this data is distributed on clinical sites and protected to ensure patient privacy. Radiological readings and dealing with large-scale clinical data puts a significant burden on the available resources, and this is where machine learning and artificial intelligence play a pivotal role. Magnetic Resonance Imaging (MRI) for musculoskeletal (MSK) diagnosis is one example where the scans have a wealth of information, but require a significant amount of time for reading and labeling. Self-supervised learning (SSL) can be a solution for handling the lack of availability of ground truth labels, but generally requires a large amount of training data during the pretraining stage. Herein, we propose a slice-based self-supervised deep learning framework (SB-SSL), a novel slice-based paradigm for classifying abnormality using knee MRI scans. We show that for a limited number of cases (<1000), our proposed framework is capable to identify anterior cruciate ligament tear with an accuracy of 89.17% and an AUC of 0.954, outperforming state-of-the-art without usage of external data during pretraining. This demonstrates that our proposed framework is suited for SSL in the limited data regime.

关键词： Self-supervised learning group masked model learning masked autoencoders Knee abnormality Transformers MRI

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：