Deep convolutional neural networks(DCNNs)are widely used in content-based image retrieval(CBIR)because of the advantages in image feature ***,the training of deep neural networks requires a large number of labeled dat...
详细信息
Deep convolutional neural networks(DCNNs)are widely used in content-based image retrieval(CBIR)because of the advantages in image feature ***,the training of deep neural networks requires a large number of labeled data,which limits the ***-supervised learning is a more general approach in unlabeled scenarios.A method of fine-tuning feature extraction networks based on masked learning is *** autoencoders(MAE)are used in the fine-tune vision transformer(ViT)*** addition,the scheme of extracting image descriptors is *** encoder of the MAE uses the ViT to extract global features and performs self-supervised fine-tuning by reconstructing masked area *** method works well on category-level image retrieval datasets with marked improvements in instance-level *** the instance-level datasets Oxford5k and Paris6k,the retrieval accuracy of the base model is improved by 7%and 17%compared to that of the original model,respectively.
The Multimodal Emotion Recognition (MER 2023) challenge aims to recognize emotion with audio, language, and visual signals, facilitating innovative technologies of affective computing. This paper presents our submissi...
详细信息
ISBN:
(纸本)9798400701085
The Multimodal Emotion Recognition (MER 2023) challenge aims to recognize emotion with audio, language, and visual signals, facilitating innovative technologies of affective computing. This paper presents our submission approach on the Semi-Supervised Learning Sub-Challenge (MER-SEMI). First, with large-scale unlabeled emotional videos, we train both image-based and video-based masked autoencoders to extract visual features, which termed as expression MAE (expMAE) for simplicity. The expMAE features are found to be largely complementary with other official baseline features. Second, since there is only a few labeled data, we use a classifier to generate pseudo labels for unlabeled videos which have high confidence for a certain category. In addition, we also explore several advanced large models for cross-feature extraction like CLIP, and apply factorized bilinear pooling (FBP) for multimodal feature fusion. Our methods finally achieved 88.55% in F1 score on MER-SEMI, ranking second place among all participating teams.
This paper presents a novel approach to representation learning in recommender systems by integrating generative self-supervised learning with graph transformer architecture. We highlight the importance of high-qualit...
详细信息
ISBN:
(纸本)9781450394086
This paper presents a novel approach to representation learning in recommender systems by integrating generative self-supervised learning with graph transformer architecture. We highlight the importance of high-quality data augmentation with relevant self-supervised pretext tasks for improving performance. Towards this end, we propose a newapproach that automates the self-supervision augmentation process through a rationale-aware generative SSL that distills informative user-item interaction patterns. The proposed recommender with Graph TransFormer (GFormer) that offers parameterized collaborative rationale discovery for selective augmentation while preserving global-aware user-item relationships. In GFormer, we allow the rationale-aware SSL to inspire graph collaborative filtering with task-adaptive invariant rationalization in graph transformer. The experimental results reveal that our GFormer has the capability to consistently improve the performance over baselines on different datasets. Several in-depth experiments further investigate the invariant rationale-aware augmentation from various aspects. The source code for this work is publicly available at: https://***/HKUDS/GFormer.
In this paper, we propose to pre-train audio encoders using synthetic patterns instead of real audio data. Our proposed framework consists of two key elements. The first one is masked autoencoder (MAE), a self-supervi...
详细信息
Muilti-scale learning has been demonstrated to be an excellent deblurring approach in image restoration according to recent studies. It makes the optimization of the function easier to achieve the global optimum. In o...
详细信息
ISBN:
(数字)9783031442230
ISBN:
(纸本)9783031442223;9783031442230
Muilti-scale learning has been demonstrated to be an excellent deblurring approach in image restoration according to recent studies. It makes the optimization of the function easier to achieve the global optimum. In order to restore an image that is both incomplete and blurry, we propose a masked Scale-Recurrent Network (MSRN) in this paper, a restoration method based on multi-scale learning and an asymmetric autoencoder. It implements restoration in an end-to-end manner without any prior knowledge or other given conditions. Firstly, we process the GoPro dataset and obtain a dataset of incomplete images. And then, we perform a self-supervised reconstruction pre-training on the autoencoder, with a series of resblocks that increase the quality of the input image and improve the representation learning in the latent space. Finally, on the processed data, we train the model and finish the adjustment of the entire network. Compared with classical multi-scale learning, we introduce masks to help the model train more efficiently by focusing on essential regions of the image. It is also shown that MSRN has successful image restoration capability as well as robustness, as demonstrated in our experiments.
The rapid development of deepfake technology poses a formidable challenge to personal privacy and security, underscoring the urgent need for deepfake detection. Recently, the methods based on the reconstruction error,...
详细信息
We propose using masked Auto -Encoder (MAE), a transformer model self-supervisedly trained on image inpainting, for anomaly detection (AD). Assuming anomalous regions are harder to reconstruct compared with normal reg...
详细信息
We propose using masked Auto -Encoder (MAE), a transformer model self-supervisedly trained on image inpainting, for anomaly detection (AD). Assuming anomalous regions are harder to reconstruct compared with normal regions. MAEDAY is the first image-reconstruction-based anomaly detection method that utilizes a pre-trained model, enabling its use for Few-Shot Anomaly Detection (FSAD). We also show the same method works surprisingly well for the novel tasks of Zero-Shot AD (ZSAD) and Zero-Shot Foreign Object Detection (ZSFOD), where no normal samples are available.
With the development of deep learning and the increase in the amount of data, general artificial intelligence models have become a popular research area nowadays. When facing a new application scenario, a pretraining ...
详细信息
With the development of deep learning and the increase in the amount of data, general artificial intelligence models have become a popular research area nowadays. When facing a new application scenario, a pretraining general model can often show better performance than models trained with new data on its own. However, because of the specificity of the differences in hyperspectral image data bands, the current hyperspectral image classification (HSIC) field has not proposed a better general model training solution, and it is difficult to utilize the information of the existing hyperspectral datasets for model training in the face of a new scenario. In order to solve this problem, this article proposes a generalized hyperspectral classification model training method, which effectively completes the training of hyperspectral classification models across datasets by adaptive channel module and masked self-supervised pretraining method, and can pretrain and fine-tune hyperspectral classification models using multiple datasets. The adaptive channel module is able to solve the band difference problem of using hyperspectral datasets across datasets, and the masked self-supervised learning method solves the label difference and labeling difficulties of training models across datasets. Experimental results on multiple datasets show that the method proposed in this article can effectively use a large amount of data to complete the pretraining of hyperspectral classification models, and the fine-tuning results on downstream datasets have certain advantages relative to current advanced deep learning methods.
Cell morphology analysis is a crucial diagnostic tool for identifying blood diseases, including acute leukemia. However, the traditional analysis process is time-consuming and requires significant investment in labor ...
详细信息
Cell morphology analysis is a crucial diagnostic tool for identifying blood diseases, including acute leukemia. However, the traditional analysis process is time-consuming and requires significant investment in labor and expertise from laboratory doctors. In recent years, deep learning-based automatic blood cell classification techniques have gained popularity. But acquiring image data and annotations in the medical field is often challenging and costly. With the increasing use of deep learning techniques in clinical practice, it has become vital to ensure both accuracy and high-quality annotations. To address these challenges, this paper proposes a blood cell classification method based on masked autoencoder (MAE) and active learning (AL), namely MAE4AL. This method utilizes the self-supervised loss of MAE and sample uncertainty to select the most valuable samples for labeling. A comprehensive comparison is conducted between our method and the state -of-the-art blood cell classification technique, which employed ResNeXt. Remarkably, our proposed approach achieves comparable classification performance to ResNeXt when utilizing only 20% of the labeled data. When employing half of the labeled data, our method achieves a classification accuracy of 96.36%, surpassing the ResNeXt model trained with 100% labeled data by 0.79%.
The emergence of Healthcare 4.0 brings convenience to the diagnosis of gastric polyps patients. The computer aided gastric polyp detection model can automatically locate the position of gastric polyps in gastroscopic ...
详细信息
The emergence of Healthcare 4.0 brings convenience to the diagnosis of gastric polyps patients. The computer aided gastric polyp detection model can automatically locate the position of gastric polyps in gastroscopic images, which helps endoscopists to detect gastric polyps in time and reduce the rate of missed diagnosis. The deep learning model has achieved remarkable success in the field of gastroscopic images, however, it still has the following problems to be solved. Firstly, the model based on the convolutional neural network only analyzes the underlying pixels of the gastroscopic image to locate the polyp, which does not take into account the spatial and positional information contained in the anatomical structure of the gastroscopic image. Secondly, although the number of gastroscopic images is huge, the number of manually annotated gastric polyp images is very small, which makes the deep learning model prone to overfitting. Therefore, in this work, we propose a masked graph neural network model (MGNN) for real-time detecting the location of polyps in gastroscopic images in the Healthcare 4.0. The MGNN model novelly utilizes the graph structure and graph convolution operations to extract spatial location information and semantic information of the gastroscopic images. The information from masked self-training is additionally considered in the prediction value stage to compensate for the deficiency in the number of manually labeled gastric polyp images. In this way, the MGNN model can automatically learn the essential features of gastroscopic images without labeling data. The effectiveness of the MGNN model has been verified on real gastroscope images.
暂无评论