Dynamic facial expression recognition (DFER) is essential to the development of intelligent and empathetic machines. Prior efforts in this field mainly fall into supervised learning paradigm, which is severely restric...
详细信息
ISBN:
(纸本)9798400701085
Dynamic facial expression recognition (DFER) is essential to the development of intelligent and empathetic machines. Prior efforts in this field mainly fall into supervised learning paradigm, which is severely restricted by the limited labeled data in existing datasets. Inspired by recent unprecedented success of masked autoencoders (e.g., VideoMAE), this paper proposes MAE-DFER, a novel self-supervised method which leverages large-scale self-supervised pretraining on abundant unlabeled data to largely advance the development of DFER. Since the vanilla Vision Transformer (ViT) employed in VideoMAE requires substantial computation during fine-tuning, MAE-DFER develops an efficient local-global interaction Transformer (LGI-Former) as the encoder. Moreover, in addition to the standalone appearance content reconstruction in VideoMAE, MAE-DFER also introduces explicit temporal facial motion modeling to encourage LGI-Former to excavate both static appearance and dynamic motion information. Extensive experiments on six datasets show that MAE-DFER consistently outperforms state-of-the-art supervised methods by significant margins (e.g., +6.30% UAR on DFEW and +8.34% UAR on MAFW), verifying that it can learn powerful dynamic facial representations via large-scale self-supervised pre-training. Besides, it has comparable or even better performance than VideoMAE, while largely reducing the computational cost (about 38% FLOPs). We believe MAE-DFER has paved a new way for the advancement of DFER and can inspire more relevant research in this field and even other related tasks. Codes and models are publicly available at https://***/sunlicai/MAE-DFER.
Malware traffic classification (MTC) is one of the promising methods to ensure the cybersecurity, which involves identifying and categorizing network traffic to distinguish between benign and malicious activity. Tradi...
详细信息
Accurate cancer survival prediction enables clinicians to tailor treatment regimens based on individual patient prognoses, effectively mitigating over-treatment and inefficient medical resource allocation. Recently, t...
详细信息
It is difficult to establish a classification and recognition model of machinery and equipment based on labeled samples in the actual industrial environment because of the imperfect fault modes and data missing. To so...
详细信息
It is difficult to establish a classification and recognition model of machinery and equipment based on labeled samples in the actual industrial environment because of the imperfect fault modes and data missing. To solve this problem, a semisupervised anomaly detection method based on masked autoencoders of distribution estimation (MADE) is designed. First, the Mel-frequency cepstrum coefficient (MFCC) is employed to extract fault features from vibration signals of rolling bearings. Then, a group of mask matrices are set on each hidden layer to overcome the perfect reconstruction problem of the autoencoders' input, and the full-connection probability of reconstruction is used to replace the reconstruction error and adopted as the anomaly score. Finally, the diagnostic threshold is determined according to the Youden index. Experimental results show that the MADE method can extract fault-sensitive features from a noisy industrial environment and introduce mask matrices renders to make the network autoregressive, thus solving the problem of perfect reconstruction of autoencoders. It is verified based on three rolling bearing datasets that the accuracy, precision, recall, and F1-score of the proposed method are confirmed to be all 100%. Moreover, the accuracy of the proposed method is 17.19% higher than that of the memory-inhibition method on the rolling bearing dataset provided by the Center for Intelligent Maintenance Systems (IMS) in University of Cincinnati (USA). The accuracy of the proposed method is also improved compared with other state-of-the-art anomaly detection methods.
Accurate hyperspectral remote sensing information is essential for feature identification and detection. Nevertheless, the hyperspectral imaging mechanism poses challenges in balancing the trade-off between spatial an...
详细信息
Accurate hyperspectral remote sensing information is essential for feature identification and detection. Nevertheless, the hyperspectral imaging mechanism poses challenges in balancing the trade-off between spatial and spectral resolution. Hardware improvements are cost-intensive and depend on strict environmental conditions and extra equipment. Recent spectral imaging methods have attempted to directly reconstruct hyperspectral information from widely available multispectral images. However, fixed mapping approaches used in previous spectral reconstruction models limit their reconstruction quality and generalizability, especially dealing with missing or contaminated bands. Moreover, data-hungry issues plague increasingly complex data-driven spectral reconstruction methods. This paper proposes SpectralMAE, a novel spectral reconstruction model that can take arbitrary combinations of bands as input and improve the utilization of data sources. In contrast to previous spectral reconstruction techniques, SpectralMAE explores the application of a self-supervised learning paradigm and proposes a masked autoencoder architecture for spectral dimensions. To further enhance the performance for specific sensor inputs, we propose a training strategy by combining random masking pre-training and fixed masking fine-tuning. Empirical evaluations on five remote sensing datasets demonstrate that SpectralMAE outperforms state-of-the-art methods in both qualitative and quantitative metrics.
Self-supervised learning is attracting large attention in point cloud understanding. However, exploring discriminative and transferable features still remains challenging due to their nature of irregularity. We propos...
详细信息
Self-supervised learning is attracting large attention in point cloud understanding. However, exploring discriminative and transferable features still remains challenging due to their nature of irregularity. We propose a geometrically and adaptively masked autoencoder on point clouds for self-supervised learning, termed PointGame. PointGame contains two core components: GATE and EAT. GATE stands for the geometrical and adaptive token embedding module;it not only absorbs the conventional wisdom of geometric descriptors that capture the surface shape effectively, but also exploits adaptive saliency to focus on the salient part of a point cloud. EAT stands for the external attention-based transformer encoder with linear computational complexity, which increases the efficiency of the whole pipeline. Unlike cutting-edge unsupervised learning models, PointGame leverages geometric descriptors to perceive surface shapes and adaptively mines discriminative features from training data. PointGame showcases clear advantages over its competitors on various downstream tasks under both global and local fine-tuning strategies. The code and pretrained models will be publicly available.
Convolutional neural networks (CNN) may not be ideal for extracting global temporal features from nonstationary Electroencephalogram (EEG) signals. The application of the masking -based method in EEG classification is...
详细信息
Convolutional neural networks (CNN) may not be ideal for extracting global temporal features from nonstationary Electroencephalogram (EEG) signals. The application of the masking -based method in EEG classification is not well studied, and there is a shortage of commonly accepted models for verifying inter -individual results in motor imagery classification tasks. The MAE-EEG-Transformer, a transformer with masking mechanism, is proposed in this article. It pre -trains by randomly masking signals and forces the model to learn semantic features. The pre -trained encoder module is fine-tuned and moved to the classification task to obtain the category of EEG signals. The effectiveness of features with and without pre -training is compared using t-SNE visualization to demonstrate pre-training's inter -subject efficacy. The MAE EEG Transformer was extensively evaluated across three prevalent datasets in EEG -based motor imagery, demonstrating performance comparable to state-of-the-art models while requiring only approximately 20% of the computational cost (results in Table 1, 2, 3 and 4).
High-quality data is essential for effective operation and maintenance of wind farms. However, data missing is a persistent issue in the supervisory control and data acquisition (SCADA) system, which seriously affects...
详细信息
High-quality data is essential for effective operation and maintenance of wind farms. However, data missing is a persistent issue in the supervisory control and data acquisition (SCADA) system, which seriously affects the data quality. To tackle the two limitations of current missing data imputation methods: the gap between training tasks and imputation tasks, and the inadequate extraction of correlations within SCADA data, this work proposes a data-driven framework named multiscale-attention masked autoencoder (MAMAE) for missing data imputation of wind turbines. The MAMAE employs masked autoencoding as a self-supervised training method, bridging the gap between the training and imputing task. Additionally, considering the importance of correlations in imputation for the SCADA data, a multiscale attention architecture built upon transformer is employed. Comprising four transformer stages, each applying attention mechanisms at distinct scales, the multiscale attention efficiently extracts feature, turbine, and temporal correlations. To ameliorate the problem of large computation cost caused by increased sequence length in different scales, localized attention is implemented in shifted windows, reducing the computational complexity from quadratic to a linear relationship with the sequence length. Furthermore, a turbine correlation-based feature combination method is proposed to coordinate with the multiscale attention and introduce turbine correlations into the imputation process. Experiments were conducted on a SCADA dataset collected in a real-world wind farm. The results show that the proposed method achieves higher accuracy than existing methods in most cases (especially in the cases with band missing and feature missing) and the ablation experiments verify the effectiveness of each proposed modification in improving accuracy or efficiency.
Pansharpening requires the fusion of a low-spatial-resolution multispectral (LRMS) image and a panchromatic (PAN) image with rich spatial details to obtain a high-spatial-resolution multispectral (HRMS) image. Recentl...
详细信息
Pansharpening requires the fusion of a low-spatial-resolution multispectral (LRMS) image and a panchromatic (PAN) image with rich spatial details to obtain a high-spatial-resolution multispectral (HRMS) image. Recently, deep learning (DL)-based models have been proposed to tackle this problem and have made considerable progress. However, most existing methods rely on the conventional observation model, which treats LRMS as a blurred and downsampled version of HRMS. This observation model may lead to unsatisfactory performance and limited generalization ability at full-resolution evaluation, resulting in severe spectral and spatial distortion, as we observed that while DL-based models show significant improvement over traditional models on reduced-resolution evaluation, their performances deteriorate significantly at full resolution. In this article, we rethink the observation model and present a novel perspective from HRMS to LRMS and propose a pixel-wise ensembled masked autoencoder (PEMAE) to restore HRMS. Specifically, we consider LRMS as the result of pixel-wise masking on HRMS. Thus, LRMS can be seen as a natural input of a masked autoencoder. By ensembling the reconstruction results of multiple masking patterns, PEMAE obtains HRMS with both spectral information of LRMS and spatial details of PAN. In addition, we employ a linear cross-attention mechanism to replace the regular self-attention to reduce the computation to linear time complexity. Extensive experiments demonstrate that PEMAE outperforms state-of-the-art (SOTA) methods in terms of quantitative and visual performance at both reduced- and full-resolution evaluations. The codes are available at https://***/yc-cui/PEMAE.
Identifying the cognitive workload of operators is crucial in complex human-automation collaboration systems. An excessive workload can lead to fatigue or accidents, while an insufficient workload may diminish situati...
详细信息
Identifying the cognitive workload of operators is crucial in complex human-automation collaboration systems. An excessive workload can lead to fatigue or accidents, while an insufficient workload may diminish situational awareness and efficiency. However, existing supervised learning-based methods for workload recognition are ineffective when dealing with imperfect input data, such as missing or noisy data, which is not practical in real applications. This study introduces a robust Electroencephalogram (EEG)-enabled cognitive workload recognition model using self-supervised learning. The proposed method, DMAEEG, combines the training strategies of denoising autoencoders and masked autoencoders, demonstrating strong robustness against noisy and incomplete data. More specifically, we adopt the temporal convolutional network and multi-head self- attention mechanisms as the backbone, effectively capturing both the temporal and spatial features from EEG. Extensive experiments are conducted to verify the effectiveness and robustness of the proposed method on an open dataset and a self-collected dataset. The results indicate that DMAEEG performs superior to other state-of-the-art across various evaluation metrics. Moreover, DMAEEG maintains high accuracy in workload inference even when EEG signals are corrupted with a high masking ratio or strong noises. This signifies its superiority in capturing robust intrinsic patterns from imperfect EEG data. The proposed method significantly contributes to decoding EEG signals for workload recognition in real-world applications, thereby enhancing the safety and reliability of human-automation interactions.
暂无评论