An effective approach for voice conversion (VC) is to disentangle linguistic content from other components in the speech signal. The effectiveness of variational autoencoder (VAE) based VC (VAE-VC), for instance, stro...
详细信息
An effective approach for voice conversion (VC) is to disentangle linguistic content from other components in the speech signal. The effectiveness of variational autoencoder (VAE) based VC (VAE-VC), for instance, strongly relies on this principle. In our prior work, we proposed a cross-domain VAE-VC (CDVAE-VC) framework, which utilized acoustic features of different properties, to improve the performance of VAE-VC. We believed that the success came from more disentangled latent representations. In this article, we extend the CDVAE-VC framework by incorporating the concept of adversarial learning, in order to further increase the degree of disentanglement, thereby improving the quality and similarity of converted speech. More specifically, we first investigate the effectiveness of incorporating the generative adversarial networks (GANs) with CDVAE-VC. Then, we consider the concept of domain adversarial training and acid an explicit constraint to the latent representation, realized by a speaker classifier, to explicitly eliminate the speaker information that resides in the latent code. Experimental results confirm that the degree of disentanglement of the learned latent representation can he enhanced by both GANs and the speaker classifier. Meanwhile, subjective evaluation results in terms of quality and similarity scores demonstrate the effectiveness of our proposed methods.
Detecting anomalies accurately in time series data has been receiving considerable attention due to its enormous potential for a wide array of applications. Numerous unsupervised anomaly detection methods for time ser...
详细信息
Detecting anomalies accurately in time series data has been receiving considerable attention due to its enormous potential for a wide array of applications. Numerous unsupervised anomaly detection methods for time series have been developed because of the difficulty of obtaining accurate labels. However, most existing unsupervised approaches suffer from the problem of anomaly contamination, which results in models that are unable to learn the normal pattern well and further deteriorate the performance of detection methods. To this end, a novel unsupervised method, called Self-adversarial variational autoencoder with Spectral Residual (SaVAE-SR), is introduced for time series anomaly detection in this paper. The SaVAE-SR first produces labels for unlabeled training data using the spectral residual technique to identify the most critical anomalies. A VAE model with a modified loss that can leverage label information to remove the influence of anomalous points is then trained in a self-adversarial manner, enabling the model to self-evaluate the learning of complex data distribution and improve itself accordingly. Specifically, the encoder acts as an encoder to approximate the posterior of latent variables and as a discriminator to evaluate the generative ability of the generator and improve itself accordingly. The generator is trained to capture the underlying data distribution and attempts to produce real samples to deceive the discriminator. The encoder and generator of the model compete with each other just like the behavior of GANs but work together under the theoretical framework of VAEs. As a result, the SaVAE-SR model combines the respective strengths of the VAE and adversarial training but does not require an additional discriminator, which makes the whole model very compact. Extensive experiments on five datasets demonstrate the superiority of the proposed method over the existing state-of-the-art methods. (c) 2021 Elsevier B.V. All rights reserved.
The embedding representation of the case text represent text as vector which consist information of original texts abundantly. Text embedding representation usually uses text statistical features or content features a...
详细信息
The embedding representation of the case text represent text as vector which consist information of original texts abundantly. Text embedding representation usually uses text statistical features or content features alone. However, case texts have characteristics that include similar structure, repeated words, and different text lengths. And the statistical feature or content feature cannot represent case text efficiently. In this paper, we propose a joint variational autoencoder (VAE) to represent case text embedding representation. We consider the statistical features and content features of case texts together, and use VAE to align the two features into the same space. We compare our representations with existing methods in terms of quality, relationship, and efficiency. The experiment results show that our method has achieved good results, which have higher performance than the model using single feature.
In many industries, statistical process monitoring techniques play a key role in improving processes through variation reduction and defect prevention. Modern large-scale industrial processes require appropriate monit...
详细信息
In many industries, statistical process monitoring techniques play a key role in improving processes through variation reduction and defect prevention. Modern large-scale industrial processes require appropriate monitoring techniques that can efficiently address high-dimensional nonlinear processes. Such processes have been successfully monitored with several latent variable-based methods. However, because these monitoring methods use Hotelling's T-2 statistics in the reduced space, a normality assumption underlies the construction of these tools. This assumption has limited the use of latent variable-based monitoring charts in both nonlinear and nonnormal situations. In this study, we propose a variational autoencoder (VAE) as a monitoring method that can address both nonlinear and nonnormal situations in high-dimensional processes. VAE is appropriate for T-2 charts because it causes the reduced space to follow a multivariate normal distribution. The effectiveness and applicability of the proposed VAE-based chart were demonstrated through experiments on simulated data and real data from a thin-film-transistor liquid-crystal display process.
User attributes, such as gender and education, face severe incompleteness in social networks. Attribute inference aims to infer users' missing attribute labels based on observed data to make this valuable data usa...
详细信息
User attributes, such as gender and education, face severe incompleteness in social networks. Attribute inference aims to infer users' missing attribute labels based on observed data to make this valuable data usable for downstream tasks like user profiling and personalized recommendation. Recently, variational autoencoder (VAE), an end-to-end deep generative model, has shown promising performance by handling the problem in a semi-supervised way. However, VAEs can easily suffer from over-fitting and over-smoothing when applied to attribute inference. Specifically, VAE implemented with multi-layer perceptron (MLP) can only reconstruct input data but fail to infer missing parts. While using the trending graph neural networks (GNNs) as encoder has the problem that GNNs aggregate redundant information from the neighborhood and generate indistinguishable user representations, known as over-smoothing. In this paper, we propose an attribute Inference model based on Adversarial VAE (Infer-AVAE) to cope with these issues. Specifically, to overcome over-smoothing, Infer-AVAE unifies MLP and GNNs in the encoder to learn positive and negative latent representations respectively. Meanwhile, an adversarial network is trained to distinguish the two representations, and GNNs are trained to aggregate less noise for more robust representations through adversarial training. Finally, to relieve over-fitting, mutual information constraint is introduced as a regularizer for the decoder to make better use of auxiliary information in representations and generate outputs not limited by observations. We evaluate our model on four real world social network datasets, and experimental results demonstrate that our model averagely outperforms baselines by 7.0% in accuracy. (c) 2022 Elsevier B.V. All rights reserved.
This article presents an emotion-regularized conditional variational autoencoder (Emo-CVAE) model for generating emotional conversation responses. In conventional CVAE-based emotional response generation, emotion labe...
详细信息
This article presents an emotion-regularized conditional variational autoencoder (Emo-CVAE) model for generating emotional conversation responses. In conventional CVAE-based emotional response generation, emotion labels are simply used as additional conditions in prior, posterior and decoder networks. Considering that emotion styles are naturally entangled with semantic contents in the language space, the Emo-CVAE model utilizes emotion labels to regularize the CVAE latent space by introducing an extra emotion prediction network. In the training stage, the estimated latent variables are required to predict the emotion labels and token sequences of the input responses simultaneously. Experimental results show that our Emo-CVAE model can learn a more informative and structured latent space than a conventional CVAE model and output responses with better content and emotion performance than baseline CVAE and sequence-to-sequence (Seq2Seq) models.
Predicting the remaining useful life (RUL) is a critical step before the decision-making process and developing maintenance strategies. As a result, it is frequently impacted by uncertainty in a practical context and ...
详细信息
Predicting the remaining useful life (RUL) is a critical step before the decision-making process and developing maintenance strategies. As a result, it is frequently impacted by uncertainty in a practical context and may cause issues. This article proposes a new hybrid deep architecture that predicts when an in-service machine will fail to overcome the latter problem, allowing for an improved data analysis and dimensionality reduction capability providing better spatial distributions of features and increasing interpretability. A deep convolutional variational autoencoder with an attention mechanism (ACVAE) has been developed and tested using the aero-engine C-MAPSS dataset. We defined two adapted threshold settings (alpha 1, alpha 2) by analyzing the spatial distribution and minimizing the overlapping area between the degradation classes. To reduce the conflict zone, we used the soft voting classifier. The performance of our visual explainable deep learning model has reached a higher level of accuracy compared with previous existing models.
We propose a topic-word-constrained sentence-generation model with a variational autoencoder and convolutional neural network. It can generate sentences conditioned on a given topic distribution and a certain word. Un...
详细信息
We propose a topic-word-constrained sentence-generation model with a variational autoencoder and convolutional neural network. It can generate sentences conditioned on a given topic distribution and a certain word. Unlike the vanilla variational autoencoder that assumes a standard Gaussian prior for the latent code, our model specifies the prior for the topic latent code as multiple Gaussian distributions, where each Gaussian distribution corresponds to a topic vector parameterized by a convolutional neural topic model. For word constraints, the decoder in the variational autoencoder generates sentences back-ward and forward starting from a given word. The topic latent space is arranged by the similarity of topic vectors, and the topic latent code restricts the sentence latent code through a loss term, through which expanded semantically meaningful latent spaces can be learned and provide topic guidance while gener-ating sentences. Experimental results show that our model can generate coherent and diverse sentences related to given topics and words, while also avoiding the Kullback-Leibler divergence collapse problem. Moreover, it outperforms alternative approaches in terms of sentence reconstruction, latent space prop-erty and the quality, diversity, and topic controllability of generated sentences.(c) 2022 Elsevier B.V. All rights reserved.
The selection and training of aircraft pilots has high standards, long training cycles, high resource consumption, high risk, and high elimination rate. It is the particularly urgent and important requirement for the ...
详细信息
The selection and training of aircraft pilots has high standards, long training cycles, high resource consumption, high risk, and high elimination rate. It is the particularly urgent and important requirement for the current talent training strategy of national and military to increase efficiency and speed up all aspects of pilot training, reduce the training cycle and reduce the elimination rate. To this end, this paper uses deep variational auto-encoder network and adaptive dynamic time warping algorithms as support to explore the establishment of an integrated evaluation system for flight maneuver recognition and quality evaluation, solve the industry difficulty faced by current flight training data mining applications, and achieve accurate recognition and reliable quality evaluation of flight regimes under the background of high mobility. It will fully explore the benefits of existing airborne flight data for military trainee pilots, support the personalized and accurate training of flight talents, and reduce the rate of talent elimination.
Single-cell RNA sequencing is used to analyze the gene expression data of individual cells, thereby adding to existing knowledge of biological phenomena. Accordingly, this technology is widely used in numerous biomedi...
详细信息
Single-cell RNA sequencing is used to analyze the gene expression data of individual cells, thereby adding to existing knowledge of biological phenomena. Accordingly, this technology is widely used in numerous biomedical studies. Recently, the variational autoencoder has emerged and has been adopted for the analysis of single-cell data owing to its high capacity to manage large-scale data. Many different variants of the variational autoencoder have been applied, and have yielded superior results. However, because it is nonlinear, the model does not provide parameters that can be used to explain the underlying biological patterns. In this paper, we propose an interpretable nonnegative matrix factorization method that decomposes parameters into those shared across cells and those that are cell-specific. Effective nonlinear dimension reduction was achieved via a variational autoencoder applied to the cell-specific parameters. In addition to achieving nonlinear dimension reduction, our model could estimate the cell-type-specific gene expression. To improve the estimation accuracy, we introduced log-regularization, which reflects the single-cell property. Overall, our approach displayed excellent performance in a simulation study and in real data analyses, while maintaining good biological interpretability.
暂无评论