Missing data is an issue that can negatively impact any task performed with the available data and it is often found in real -world domains such as healthcare. One of the most common strategies to address this issue i...
详细信息
Missing data is an issue that can negatively impact any task performed with the available data and it is often found in real -world domains such as healthcare. One of the most common strategies to address this issue is to perform imputation, where the missing values are replaced by estimates. Several approaches based on statistics and machine learning techniques have been proposed for this purpose, including deep learning architectures such as generative adversarial networks and autoencoders. In this work, we propose a novel siamese neural network suitable for missing data imputation, which we call siamese autoencoder-based Approach for Imputation (SAEI). Besides having a deep autoencoder architecture, SAEI also has a custom loss function and triplet mining strategy that are tailored for the missing data issue. The proposed SAEI approach is compared to seven state-of-the-art imputation methods in an experimental setup that comprises 14 heterogeneous datasets of the healthcare domain injected with Missing Not At Random values at a rate between 10% and 60%. The results show that SAEI significantly outperforms all the remaining imputation methods for all experimented settings, achieving an average improvement of 35%. This work is an extension of the article siamese autoencoder-Based Approach for Missing Data Imputation [1] presented at the International Conference on Computational Science 2023. It includes new experiments focused on runtime, generalization capabilities, and the impact of the imputation in classification tasks, where the results show that SAEI is the imputation method that induces the best classification results, improving the F1 scores for 50% of the used datasets.
Electroencephalogram (EEG)-based neurofeedback has been widely studied for tinnitus therapy in recent years. Most existing research relies on experts' cognitive prediction, and studies based on machine learning an...
详细信息
Electroencephalogram (EEG)-based neurofeedback has been widely studied for tinnitus therapy in recent years. Most existing research relies on experts' cognitive prediction, and studies based on machine learning and deep learning are either data-hungry or not well generalizable to new subjects. In this paper, we propose a robust, data-efficient model for distinguishing tinnitus from the healthy state based on EEG-based tinnitus neurofeedback. We propose trend descriptor, a feature extractor with lower fineness, to reduce the effect of electrode noises on EEG signals, and a siamese encoder-decoder network boosted in a supervised manner to learn accurate alignment and to acquire high-quality transferable mappings across subjects and EEG signal channels. Our experiments show the proposed method significantly outperforms state-of-the-art algorithms when analyzing subjects' EEG neurofeedback to 90dB and 100dB sound, achieving an accuracy of 91.67%-94.44% in predicting tinnitus and control subjects in a subject-independent setting. Our ablation studies on mixed subjects and parameters show the method's stability in performance.
Although deep autoencoders excel at extracting intricate features, their application in process monitoring is limited by the requirement for large sample sizes and interpretability of latent representations. This work...
详细信息
Although deep autoencoders excel at extracting intricate features, their application in process monitoring is limited by the requirement for large sample sizes and interpretability of latent representations. This work presents a special deep learning structure named siamese network to detect abnormal deviations in nonlinear dynamic processes. By leveraging the capability of siamese architecture to process multiple inputs simultaneously, the training sample size expands exponentially, which enhances the learning potential of the model. Furthermore, a long short-term memory unit is integrated to enable the capture of long-term process dynamics. To refine the distribution of latent features extracted from diverse data types, a contrastive loss function is proposed, which strengthens the model's fault detection capabilities and enhances its interpretation of latent representations. Then T2 statistic is established on the latent space to perform fault detection. The effectiveness of the method is demonstrated through case studies on simulation processes and an industrial process.
Word embeddings are used as building blocks for a wide range of natural language processing and information retrieval tasks. These embeddings are usually represented as continuous vectors, requiring significant memory...
详细信息
ISBN:
(纸本)9781450387354
Word embeddings are used as building blocks for a wide range of natural language processing and information retrieval tasks. These embeddings are usually represented as continuous vectors, requiring significant memory capacity and computationally expensive similarity measures. In this study, we introduce a novel method for semantic hashing continuous vector representations into lower-dimensional Hamming space while explicitly preserving semantic information between words. This is achieved by introducing a siamese autoencoder combined with a novel semantic preserving loss function. We show that our quantization model induces only a 4% loss of semantic information over continuous representations and outperforms the baseline models on several word similarity and sentence classification tasks. Finally, we show through cluster analysis that our method learns binary representations where individual bits hold interpretable semantic information. In conclusion, binary quantization of word embeddings significantly decreases time and space requirements while offering new possibilities through exploiting semantic information of individual bits in downstream information retrieval tasks.
暂无评论