Recently, variational autoencoder (VAE), a deep representation learning (DRL) model, has been used to perform speech enhancement (SE). However, to the best of our knowledge, current VAE-based SE methods only apply VAE...
详细信息
ISBN:
(纸本)9781665405409
Recently, variational autoencoder (VAE), a deep representation learning (DRL) model, has been used to perform speech enhancement (SE). However, to the best of our knowledge, current VAE-based SE methods only apply VAE to model speech signal, while noise is modeled using the traditional non-negative matrix factorization (NMF) model. One of the most important reasons for using NMF is that these VAE-based methods cannot disentangle the speech and noise latent variables from the observed signal. Based on Bayesian theory, this paper derives a novel variational lower bound for VAE, which ensures that VAE can be trained in supervision, and can disentangle speech and noise latent variables from the observed signal. This means that the proposed method can apply the VAE to model both speech and noise signals, which is totally different from the previous VAE-based SE works. More specifically, the proposed DRL method can learn to impose speech and noise signal priors to different sets of latent variables for SE. The experimental results show that the proposed method can not only disentangle speech and noise latent variables from the observed signal, but also obtain a higher scale-invariant signal-to-distortion ratio and speech quality score than the similar deep neural network-based (DNN) SE method.
Inverse design is an efficient and powerful design tool in the aircraft industry, however, most of the methods require physically meaningful pressure distributions as an input which deeply relies on designer expertise...
详细信息
Inverse design is an efficient and powerful design tool in the aircraft industry, however, most of the methods require physically meaningful pressure distributions as an input which deeply relies on designer expertise. In this paper, it was proposed to reduce the two-dimensional coordinate value data and pressure distribution data of the airfoil through the variational autoencoder. The model maps high-dimensional data to low-dimensional space, and extracted the low-dimensional manifold structure of high-dimensional data. Test cases of a low-speed airfoil and a transonic airfoil were used for pressure distribution prediction. The result shows that the VAE can achieve high accuracy for pressure distribution prediction. A framework for inverse design of airfoils was also established, and the objective function was the difference between the target pressure and the design pressure. Using a global optimization algorithm to optimization in the low-dimensional space, and a physically meaningful aerodynamic shape and pressure distribution was obtained by the trained model. The VAE model acted like a surrogate model, and the hidden space dimension is low, so the global optimal solution can be efficiently found when the number of populations and iteration steps are small. In the method, the target pressure distribution was defined without a strong dependence on the designer's experience, achieving a rapid inverse design at the minute level.
In this paper, we propose Squeezed Convolutional variational autoencoder (SCVAE) for anomaly detection in time series data for Edge Computing in Industrial Internet of Things (IIoT). The proposed model is applied to l...
详细信息
ISBN:
(纸本)9781538653845
In this paper, we propose Squeezed Convolutional variational autoencoder (SCVAE) for anomaly detection in time series data for Edge Computing in Industrial Internet of Things (IIoT). The proposed model is applied to labeled time series data from UCI datasets for exact performance evaluation, and applied to real world data for indirect model performance comparison. In addition, by comparing the models before and after applying Fire Modules from SqueezeNet, we show that model size and inference times are reduced while similar levels of performance is maintained.
The curse of dimensionality is a fundamental difficulty in anomaly detection for high dimensional data. To deal with this problem, the autoencoder based approach is an elegant solution. However, existing works require...
详细信息
ISBN:
(纸本)9781728173030
The curse of dimensionality is a fundamental difficulty in anomaly detection for high dimensional data. To deal with this problem, the autoencoder based approach is an elegant solution. However, existing works require a clean training dataset that is not always guaranteed in real scenarios. In this paper, we propose a novel anomaly detection method named RVAE-ABFA (robust variational autoencoder with attention based feature adaptation for high dimensional data anomaly detection), which significantly improves the anomaly detection performance when training data is contaminated. Rather than only utilize reconstruction error, we take the learned low dimensional embeddings generated by variational autoencoder into consideration. In RVAE-ABFA, the learned low dimensional embeddings are helpful to detect anomalies in contaminated data because of the ability of variational inference. We also propose an ABFA (attention based feature adaptation) mechanism to adjust the weights of low dimensional embeddings and reconstruction error. Furthermore, we adopt the adversarial training criterion to perform variational inference by the adversarial network named RAAE-ABFA (robust adversarial autoencoder with attention based feature adaptation for high dimensional data anomaly detection) in which we can generate extra samples when training data is not enough. Experimental results on several benchmark datasets show that the proposed method significantly outperforms state-of-the-art unsupervised anomaly detection methods and is more robust when training data is contaminated.
This paper proposes an autoregressive speech synthesis model based on the variational autoencoder incorporating latent sequence representation for acoustic and linguistic features and the structure of a hidden semi-Ma...
详细信息
ISBN:
(纸本)9781665405409
This paper proposes an autoregressive speech synthesis model based on the variational autoencoder incorporating latent sequence representation for acoustic and linguistic features and the structure of a hidden semi-Markov model (HSMM). Although autoregressive models can provide efficient and accurate modeling of acoustic features, they have exposure bias, i.e., the mismatch between training (teacher-forcing) and inference (free-running). To overcome this problem, we introduce an autoregressive latent variable sequence, rather than using autoregressive generation of observations. Latent representation of alignment using HSMM-based structured attention mechanism enables the use of a completely consistent training algorithm for acoustic modeling with explicit duration models. Experimental results indicate that the proposed model outperformed baselines in subjective naturalness.
To address the issue of one-to-many mapping from phoneme sequences to acoustic features in expressive speech synthesis, this paper proposes a method of discourse-level prosody modeling with a variational autoencoder (...
详细信息
ISBN:
(纸本)9781665405409
To address the issue of one-to-many mapping from phoneme sequences to acoustic features in expressive speech synthesis, this paper proposes a method of discourse-level prosody modeling with a variational autoencoder (VAE) based on the non-autoregressive architecture of FastSpeech. In this method, phone-level prosody codes are extracted from prosody features by combining VAE with Fast-Speech, and are predicted using discourse-level text features together with BERT embeddings. The continuous wavelet transform (CWT) in FastSpeech2 for F0 representation is not necessary anymore. Experimental results on a Chinese audiobook dataset show that our proposed method can effectively take advantage of discourse-level linguistic information and has outperformed FastSpeech2 on the naturalness and expressiveness of synthetic speech.
Paradigm-shifting systems such as cyber-physical systems, collect data of high-or ultrahigh-dimensionality tremendously. Detecting outliers in this type of systems provides indicative understanding in wide-ranging dom...
详细信息
ISBN:
(纸本)9781538650356
Paradigm-shifting systems such as cyber-physical systems, collect data of high-or ultrahigh-dimensionality tremendously. Detecting outliers in this type of systems provides indicative understanding in wide-ranging domains such as system health monitoring, information security, etc. Previous dimensionality reduction based outlier detection methods suffer from the incapability of well preserving the critical information in the low-dimensional latent space, mainly because they generally assume an isotropic Gaussian distribution as prior and fail to mine the intrinsic multimodality in high dimensional data. Moreover, most of the schemes decouple the model learning process, resulting in suboptimal performance. To tackle these challenges, in this paper, we propose a unified Unsupervised Gaussian Mixture variational autoencoder for outlier detection. Specifically, a variational autoencoder firstly trains a generative distribution and extracts reconstruction based features. Then we adopt a deep brief network to estimate the component mixture probabilities by the latent distribution and extracted features, which is further used by the Gaussian mixture model to estimate sample densities with the Expectation-Maximization ( EM) algorithm. The inference model is optimized jointly with the variational autoencoder, the deep brief network, and the Gaussian mixture model. Afterwards, the proposed detector identifies outliers when the estimated sample density exceeds a learned threshold. Extensive simulations on six public benchmark datasets show that the proposed framework outperforms state-of-the-art outlier detection schemes and achieves, on average, 27% improvements in F1 score.
Training generative adversarial networks (GANs) relies on the game between the generator and the discriminator, so the improvement of the discriminator can promote the improvement of the generator. A variational autoe...
详细信息
ISBN:
(纸本)9798350349122;9798350349115
Training generative adversarial networks (GANs) relies on the game between the generator and the discriminator, so the improvement of the discriminator can promote the improvement of the generator. A variational autoencoder (VAE) is useful for classification because it can learn the probability distribution of an image and provide latent variables as output features. Therefore, we propose a new network structure using a variational autoencoder in the GAN discriminator for super-resolution (SR). This network uses the latent variables generated by the VAE to extract the probability distribution of a reconstructed image and an original high-resolution (HR) image so the latent variables are used as features for discrimination. In addition, we propose to train the whole GAN and the VAE network alternately to optimize the network parameters. We verify the proposed method on five GAN-based image super-resolution methods. Experimental results show that the proposed algorithm leads to state-of-the-art results when plugged into existing GAN based SR methods.
Reinforcement learning has shown great potential in generalizing over raw sensory data using only a single neural network for value optimization. There are several challenges in the current state-of-the-art reinforcem...
详细信息
ISBN:
(纸本)9783030041915;9783030041908
Reinforcement learning has shown great potential in generalizing over raw sensory data using only a single neural network for value optimization. There are several challenges in the current state-of-the-art reinforcement learning algorithms that prevent them from converging towards the global optima. It is likely that the solution to these problems lies in short- and long-term planning, exploration and memory management for reinforcement learning algorithms. Games are often used to benchmark reinforcement learning algorithms as they provide a flexible, reproducible, and easy to control environment. Regardless, few games feature a state-space where results in exploration, memory, and planning are easily perceived. This paper presents The Dreaming variational autoencoder (DVAE), a neural network based generative modeling architecture for exploration in environments with sparse feedback. We further present Deep Maze, a novel and flexible maze engine that challenges DVAE in partial and fully-observable state-spaces, long-horizon tasks, and deterministic and stochastic problems. We show initial findings and encourage further work in reinforcement learning driven by generative exploration.
Sequential recommendation as an emerging topic has attracted increasing attention due to its important practical significance. Models based on deep learning and attention mechanism have achieved good performance in se...
详细信息
ISBN:
(纸本)9781450383127
Sequential recommendation as an emerging topic has attracted increasing attention due to its important practical significance. Models based on deep learning and attention mechanism have achieved good performance in sequential recommendation. Recently, the generative models based on variational autoencoder (VAE) have shown the unique advantage in collaborative filtering. In particular, the sequential VAE model as a recurrent version of VAE can effectively capture temporal dependencies among items in user sequence and perform sequential recommendation. However, VAE-based models suffer from a common limitation that the representational ability of the obtained approximate posterior distribution is limited, resulting in lower quality of generated samples. This is especially true for generating sequences. To solve the above problem, in this work, we propose a novel method called Adversarial and Contrastive variational autoencoder (ACVAE) for sequential recommendation. Specifically, we first introduce the adversarial training for sequence generation under the Adversarial variational Bayes (AVB) framework, which enables our model to generate high-quality latent variables. Then, we employ the contrastive loss. The latent variables will be able to learn more personalized and salient characteristics by minimizing the contrastive loss. Besides, when encoding the sequence, we apply a recurrent and convolutional structure to capture global and local relationships in the sequence. Finally, we conduct extensive experiments on four real-world datasets. The experimental results show that our proposed ACVAE model outperforms other state-of-the-art methods.
暂无评论