检索结果-内蒙古大学图书馆

Image Retrieval Based on Vision Transformer and masked Learning

Journal of Donghua University(English Edition) 2023年第5期40卷 539-547页

作者：李锋潘煌圣盛守祥王国栋 College of Computer Science and Technology Donghua UniversityShanghai 201620China Huafang Co. Ltd.Binzhou 256617China

Deep convolutional neural networks(DCNNs)are widely used in content-based image retrieval(CBIR)because of the advantages in image feature ***,the training of deep neural networks requires a large number of labeled data,which limits the ***-supervised learning is a more general approach in unlabeled scenarios.A method of fine-tuning feature extraction networks based on masked learning is *** autoencoders(MAE)are used in the fine-tune vision transformer(ViT)*** addition,the scheme of extracting image descriptors is *** encoder of the MAE uses the ViT to extract global features and performs self-supervised fine-tuning by reconstructing masked area *** method works well on category-level image retrieval datasets with marked improvements in instance-level *** the instance-level datasets Oxford5k and Paris6k,the retrieval accuracy of the base model is improved by 7%and 17%compared to that of the original model,respectively.

关键词： content-based image retrieval vision transformer masked autoencoder feature extraction

来源：评论

学校读者我要写书评

暂无评论

Semi-Supervised Multimodal Emotion Recognition with Expression MAE 23

Semi-Supervised Multimodal Emotion Recognition with Expressi...

引用

31st ACM International Conference on Multimedia (MM)

作者： Cheng, Zebang Lin, Yuxiang Chen, Zhaoru Li, Xiang Mao, Shuyi Zhang, Fan Ding, Daijun Zhang, Bowen Peng, Xiaojiang Shenzhen Technol Univ Shenzhen Peoples R China

ISBN: (纸本)9798400701085

The Multimodal Emotion Recognition (MER 2023) challenge aims to recognize emotion with audio, language, and visual signals, facilitating innovative technologies of affective computing. This paper presents our submission approach on the Semi-Supervised Learning Sub-Challenge (MER-SEMI). First, with large-scale unlabeled emotional videos, we train both image-based and video-based masked autoencoders to extract visual features, which termed as expression MAE (expMAE) for simplicity. The expMAE features are found to be largely complementary with other official baseline features. Second, since there is only a few labeled data, we use a classifier to generate pseudo labels for unlabeled videos which have high confidence for a certain category. In addition, we also explore several advanced large models for cross-feature extraction like CLIP, and apply factorized bilinear pooling (FBP) for multimodal feature fusion. Our methods finally achieved 88.55% in F1 score on MER-SEMI, ranking second place among all participating teams.

关键词： Multimodal Emotion Recognition Semi-Supervised Learning masked autoencoder

来源：评论

学校读者我要写书评

暂无评论

Graph Transformer for Recommendation 23

Graph Transformer for Recommendation

引用

46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)

作者： Li, Chaoliu Xia, Lianghao Ren, Xubin Ye, Yaowen Xu, Yong Huang, Chao South China Univ Technol Guangzhou Peoples R China Univ Hong Kong Hong Kong Peoples R China

ISBN: (纸本)9781450394086

This paper presents a novel approach to representation learning in recommender systems by integrating generative self-supervised learning with graph transformer architecture. We highlight the importance of high-quality data augmentation with relevant self-supervised pretext tasks for improving performance. Towards this end, we propose a newapproach that automates the self-supervision augmentation process through a rationale-aware generative SSL that distills informative user-item interaction patterns. The proposed recommender with Graph TransFormer (GFormer) that offers parameterized collaborative rationale discovery for selective augmentation while preserving global-aware user-item relationships. In GFormer, we allow the rationale-aware SSL to inspire graph collaborative filtering with task-adaptive invariant rationalization in graph transformer. The experimental results reveal that our GFormer has the capability to consistently improve the performance over baselines on different datasets. Several in-depth experiments further investigate the invariant rationale-aware augmentation from various aspects. The source code for this work is publicly available at: https://***/HKUDS/GFormer.

关键词： Recommendation Graph Transformer masked autoencoder

来源：评论

学校读者我要写书评

暂无评论

Pre-training with Synthetic Patterns for Audio

Pre-training with Synthetic Patterns for Audio

引用

2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025

作者： Ishikawa, Yuchi Komatsu, Tatsuya Aoki, Yoshimitsu LY Corporation Tokyo Japan Keio University Kanagawa Japan

ISBN: (纸本)9798350368741

In this paper, we propose to pre-train audio encoders using synthetic patterns instead of real audio data. Our proposed framework consists of two key elements. The first one is masked autoencoder (MAE), a self-supervised learning framework that learns from reconstructing data from randomly masked counterparts. MAEs tend to focus on low-level information such as visual patterns and regularities within data. Therefore, it is unimportant what is portrayed in the input, whether it be images, audio mel-spectrograms, or even synthetic patterns. This leads to the second key element, which is synthetic data. Synthetic data, unlike real audio, is free from privacy and licensing infringement issues. By combining MAEs and synthetic patterns, our framework enables the model to learn generalized feature representations without real data, while addressing the issues related to real audio. To evaluate the efficacy of our framework, we conduct extensive experiments across a total of 13 audio tasks and 17 synthetic datasets. The experiments provide insights into which types of synthetic patterns are effective for audio. Our results demonstrate that our framework achieves performance comparable to models pre-trained on AudioSet-2M and partially outperforms image-based pre-training methods. © 2025 IEEE.

关键词： audio masked autoencoder self-supervised learning synthetic data

来源：评论

学校读者我要写书评

暂无评论

masked Scale-Recurrent Network for Incomplete Blurred Image Restoration 1

引用

32nd International Conference on Artificial Neural Networks (ICANN)

作者： Zhu, Jingzhou Chao, Wentao Yang, Dong Beijing Normal Univ Sch Artificial Intelligence Beijing Peoples R China

ISBN: (数字)9783031442230

ISBN: (纸本)9783031442223;9783031442230

Muilti-scale learning has been demonstrated to be an excellent deblurring approach in image restoration according to recent studies. It makes the optimization of the function easier to achieve the global optimum. In order to restore an image that is both incomplete and blurry, we propose a masked Scale-Recurrent Network (MSRN) in this paper, a restoration method based on multi-scale learning and an asymmetric autoencoder. It implements restoration in an end-to-end manner without any prior knowledge or other given conditions. Firstly, we process the GoPro dataset and obtain a dataset of incomplete images. And then, we perform a self-supervised reconstruction pre-training on the autoencoder, with a series of resblocks that increase the quality of the input image and improve the representation learning in the latent space. Finally, on the processed data, we train the model and finish the adjustment of the entire network. Compared with classical multi-scale learning, we introduce masks to help the model train more efficiently by focusing on essential regions of the image. It is also shown that MSRN has successful image restoration capability as well as robustness, as demonstrated in our experiments.

关键词： image restoration masked autoencoder residual network scale-recurrent network multi-scale learning

来源：评论

学校读者我要写书评

暂无评论

Partial Reconstruction Error for Deepfake Detection

Partial Reconstruction Error for Deepfake Detection

引用

2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025

作者： Zhang, Yufei Meng, Zheling Peng, Bo Dong, Jing Chu, Beilin Wang, Wei School of Cyberspace Security Beijing University of Posts and Telecommunications Beijing China Institute of Automation Chinese Academy of Sciences Beijing China

ISBN: (纸本)9798350368741

The rapid development of deepfake technology poses a formidable challenge to personal privacy and security, underscoring the urgent need for deepfake detection. Recently, the methods based on the reconstruction error, such as DIRE and RECCE, achieve impressive performance in forgery detection. However, their performance on facial forgery datasets is relatively poor. The reconstruction process is performed on the whole images, neglecting contextual information for reconstruction. In this paper, we propose Partial Reconstruction Error to perform deepfake detection based on the reconstruction of masked regions in an image. In this way, contextual information helps to reveal the inconsistencies between the original and reconstructed regions thereby improving the detection performance. This method outperforms the best global reconstruction-based approaches on the FF++, Celeb-DF, and DiFF datasets by 4.00%, 2.83%, and 2.67%, respectively. © 2025 IEEE.

关键词： Deepfake Detection masked autoencoder Partial Reconstruction

来源：评论

学校读者我要写书评

暂无评论

MAEDAY: MAE for few- and zero-shot AnomalY-Detection

引用

COMPUTER VISION AND IMAGE UNDERSTANDING 2024年 241卷

作者： Schwartz, Eli Arbelle, Assaf Karlinsky, Leonid Harary, Sivan Scheidegger, Florian Doveh, Sivan Giryes, Raja IBM Res Haifa Israel Tel Aviv Univ Tel Aviv Israel MIT IBM Watson AI Lab Cambridge MA USA

We propose using masked Auto -Encoder (MAE), a transformer model self-supervisedly trained on image inpainting, for anomaly detection (AD). Assuming anomalous regions are harder to reconstruct compared with normal regions. MAEDAY is the first image-reconstruction-based anomaly detection method that utilizes a pre-trained model, enabling its use for Few-Shot Anomaly Detection (FSAD). We also show the same method works surprisingly well for the novel tasks of Zero-Shot AD (ZSAD) and Zero-Shot Foreign Object Detection (ZSFOD), where no normal samples are available.

关键词： Anomaly-detection masked autoencoder Foreign object detection

来源：评论

学校读者我要写书评

暂无评论

Cross-Dataset Model Training for Hyperspectral Image Classification Using Self-Supervised Learning

引用

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 2024年 62卷

作者： Bai, Jing Zhou, Zichen Chen, Zheng Xiao, Zhu Wei, Erlong Wen, Yihong Jiao, Licheng Xidian Univ Sch Artificial Intelligence Minist Educ Xian 710071 Peoples R China Xidian Univ Key Lab Intelligent Percept & Image Understanding Minist Educ Xian 710071 Peoples R China Xiaomi Shanghai 200030 Peoples R China Hunan Univ Chongqing Res Inst Changsha 410082 Peoples R China Hunan Univ Coll Comp Sci & Elect Engn Changsha 410082 Peoples R China 54th Res Inst CETC Shijiazhuang 050081 Hebei Peoples R China

With the development of deep learning and the increase in the amount of data, general artificial intelligence models have become a popular research area nowadays. When facing a new application scenario, a pretraining general model can often show better performance than models trained with new data on its own. However, because of the specificity of the differences in hyperspectral image data bands, the current hyperspectral image classification (HSIC) field has not proposed a better general model training solution, and it is difficult to utilize the information of the existing hyperspectral datasets for model training in the face of a new scenario. In order to solve this problem, this article proposes a generalized hyperspectral classification model training method, which effectively completes the training of hyperspectral classification models across datasets by adaptive channel module and masked self-supervised pretraining method, and can pretrain and fine-tune hyperspectral classification models using multiple datasets. The adaptive channel module is able to solve the band difference problem of using hyperspectral datasets across datasets, and the masked self-supervised learning method solves the label difference and labeling difficulties of training models across datasets. Experimental results on multiple datasets show that the method proposed in this article can effectively use a large amount of data to complete the pretraining of hyperspectral classification models, and the fine-tuning results on downstream datasets have certain advantages relative to current advanced deep learning methods.

关键词： Hyperspectral imaging Training Data models Feature extraction Image classification Adaptation models Transformers Self-supervised learning Supervised learning Deep learning Classification cross dataset general model hyperspectral image masked autoencoder self-supervised learning

来源：评论

学校读者我要写书评

暂无评论

A blood cell classification method based on MAE and active learning

引用

BIOMEDICAL SIGNAL PROCESSING AND CONTROL 2024年 90卷

作者： Lu, Qinghang Wang, Bangyao He, Quanhui Zhang, Qingmao Guo, Liang Li, Jiaming Li, Jie Ma, Qiongxiong South China Normal Univ Sch Informat & Optoelect Sci & Engn Guangdong Prov Key Lab Nanophoton Funct Mat & Devi Guangzhou 510006 Peoples R China Southern Med Univ Nanfang Hosp Dept Hematol Guangzhou 510515 Peoples R China

Cell morphology analysis is a crucial diagnostic tool for identifying blood diseases, including acute leukemia. However, the traditional analysis process is time-consuming and requires significant investment in labor and expertise from laboratory doctors. In recent years, deep learning-based automatic blood cell classification techniques have gained popularity. But acquiring image data and annotations in the medical field is often challenging and costly. With the increasing use of deep learning techniques in clinical practice, it has become vital to ensure both accuracy and high-quality annotations. To address these challenges, this paper proposes a blood cell classification method based on masked autoencoder (MAE) and active learning (AL), namely MAE4AL. This method utilizes the self-supervised loss of MAE and sample uncertainty to select the most valuable samples for labeling. A comprehensive comparison is conducted between our method and the state -of-the-art blood cell classification technique, which employed ResNeXt. Remarkably, our proposed approach achieves comparable classification performance to ResNeXt when utilizing only 20% of the labeled data. When employing half of the labeled data, our method achieves a classification accuracy of 96.36%, surpassing the ResNeXt model trained with 100% labeled data by 0.79%.

关键词： Deep learning Blood cell classification masked autoencoder Active learning

来源：评论

学校读者我要写书评

暂无评论

A masked graph neural network model for real-time gastric polyp detection in Healthcare 4.0

引用

JOURNAL OF INDUSTRIAL INFORMATION INTEGRATION 2024年 34卷

作者： Huang, Junjun Saw, Shier Nee He, Tianran Feng, Wei Loo, Chu Kiong Univ Malaya Fac Comp Sci & Informat Technol Kuala Lumpur 50603 Malaysia Annoland Technol PTE LTD Singapore 068902 Singapore Tongji Univ Sch Elect & Informat Engn Shanghai 201804 Peoples R China Fuzhou Univ Zhicheng Coll Fac FinTech Fuzhou 350000 Peoples R China

The emergence of Healthcare 4.0 brings convenience to the diagnosis of gastric polyps patients. The computer aided gastric polyp detection model can automatically locate the position of gastric polyps in gastroscopic images, which helps endoscopists to detect gastric polyps in time and reduce the rate of missed diagnosis. The deep learning model has achieved remarkable success in the field of gastroscopic images, however, it still has the following problems to be solved. Firstly, the model based on the convolutional neural network only analyzes the underlying pixels of the gastroscopic image to locate the polyp, which does not take into account the spatial and positional information contained in the anatomical structure of the gastroscopic image. Secondly, although the number of gastroscopic images is huge, the number of manually annotated gastric polyp images is very small, which makes the deep learning model prone to overfitting. Therefore, in this work, we propose a masked graph neural network model (MGNN) for real-time detecting the location of polyps in gastroscopic images in the Healthcare 4.0. The MGNN model novelly utilizes the graph structure and graph convolution operations to extract spatial location information and semantic information of the gastroscopic images. The information from masked self-training is additionally considered in the prediction value stage to compensate for the deficiency in the number of manually labeled gastric polyp images. In this way, the MGNN model can automatically learn the essential features of gastroscopic images without labeling data. The effectiveness of the MGNN model has been verified on real gastroscope images.

关键词： Gastric polyp detection Graph neural network masked autoencoder Convolutional neural network

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：