In this paper, we present a novel technique for a non-parallel voice conversion (VC) with the use of cyclic variational autoencoder (CycleVAE)-based spectral modeling. In a variational autoencoder (VAE) framework, a l...
详细信息
In this paper, we present a novel technique for a non-parallel voice conversion (VC) with the use of cyclic variational autoencoder (CycleVAE)-based spectral modeling. In a variational autoencoder (VAE) framework, a latent space, usually with a Gaussian prior, is used to encode a set of input features. In a VAE-based VC, the encoded latent features are fed into a decoder, along with speaker-coding features, to generate estimated spectra with either the original speaker identity (reconstructed) or another speaker identity (converted). Due to the non-parallel modeling condition, the converted spectra can not be directly optimized, which heavily degrades the performance of a VAE-based VC. In this work, to overcome this problem, we propose to use CycleVAE-based spectral model that indirectly optimizes the conversion flow by recycling the converted features back into the system to obtain corresponding cyclic reconstructed spectra that can be directly optimized. The cyclic flow can be continued by using the cyclic reconstructed features as input for the next cycle. The experimental results demonstrate the effectiveness of the proposed CycleVAE-based VC, which yields higher accuracy of converted spectra, generates latent features with higher correlation degree, and significantly improves the quality and conversion accuracy of the converted speech.
This paper proposes a Group Latent Embedding for Vector Quantized variational autoencoders (VQ-VAE) used in non-parallel Voice Conversion (VC). Previous studies have shown that VQ-VAE can generate high-quality VC synt...
详细信息
This paper proposes a Group Latent Embedding for Vector Quantized variational autoencoders (VQ-VAE) used in non-parallel Voice Conversion (VC). Previous studies have shown that VQ-VAE can generate high-quality VC syntheses when it is paired with a powerful decoder. However, in a conventional VQ-VAE, adjacent atoms in the embedding dictionary can represent entirely different phonetic content. Therefore, the VC syntheses can have mispronunciations and distortions whenever the output of the encoder is quantized to an atom representing entirely different phonetic content. To address this issue, we propose an approach that divides the embedding dictionary into groups and uses the weighted average of atoms in the nearest group as the latent embedding. We conducted both objective and subjective experiments on the non-parallel CSTR VCTK corpus. Results show that the proposed approach significantly improves the acoustic quality of the VC syntheses compared to the traditional VQ-VAE (13.7% relative improvement) while retaining the voice identity of the target speaker.
This work presents the novel multi-modal variational autoencoder approach M(2)VAE which is derived from the complete marginal joint log-likelihood. This allows the end-to-end training of Bayesian information fusion on...
详细信息
ISBN:
(纸本)9780996452786
This work presents the novel multi-modal variational autoencoder approach M(2)VAE which is derived from the complete marginal joint log-likelihood. This allows the end-to-end training of Bayesian information fusion on raw data for all subsets of a sensor setup. Furthermore, we introduce the concept of in-place fusion applicable to distributed sensing where latent embeddings of observations need to be fused with new data. To facilitate in-place fusion even on raw data, we introduced the concept of a re-encoding loss that stabilizes the decoding and makes visualization of latent statistics possible. We also show that the M(2)VAE finds a coherent latent embedding, such that a single nave Bayes classifier performs equally well on all permutations of a bi-modal Mixture-of-Gaussians signal. Finally, we show that our approach outperforms current VAE approaches on a bi-modal MNIST & fashion-MNIST data set and works sufficiently well as a preprocessing on a tri-modal simulated camera & LiDAR data set from the Gazebo simulator.
Detecting anomalies using a variational autoencoder (VAE) suffers from catastrophic forgetting when trained on a continually growing set of normal data where only the most recently added data is available. Solving thi...
详细信息
ISBN:
(纸本)9781479981311
Detecting anomalies using a variational autoencoder (VAE) suffers from catastrophic forgetting when trained on a continually growing set of normal data where only the most recently added data is available. Solving this problem would allow the use of the VAE for anomaly detection in settings where it is difficult or even impossible to retain all normal data at the same time. We propose an efficient extension of a method for continual learning which alleviates catastrophic forgetting for anomaly detection using a VAE. We show on some anomaly detection problems that the definition of normal data can be continually expanded without requiring all previously seen data.
Software defect prediction (SDP) is a beneficial task to save limited resources in the software testing stage for improving software quality. However, the imbalanced distribution in defect datasets could be a challeng...
详细信息
ISBN:
(纸本)9781728108728
Software defect prediction (SDP) is a beneficial task to save limited resources in the software testing stage for improving software quality. However, the imbalanced distribution in defect datasets could be a challenge for often machine learning algorithms, an effect on the performance of the algorithms. To overcome this issue, oversampling techniques from the minority class has been adopted. In this work, we suggest a new oversampling method, which trained a variational autoencoder (VAE) to generate synthesized samples aimed for output mimicked minority samples that were then combined with training dataset into an augmented training dataset. In the experiments, we explored ten SDP datasets from the PROMISE freely accessible repository. We measured the performance of the proposed method by comparing it with state-of-the-art oversampling techniques including Random Over-Sampling, SMOTE, Borderline-SMOTE, and ADASYN. Based on the investigation results, the proposed method provides better mean performance of SDP models between all examined techniques.
Track geometry is one of the most important health indices in the maintenance of rail tracks. Visual inspection and inspection using a track-geometry car are two common approaches to inspect track geometry. Presently,...
详细信息
ISBN:
(纸本)9781510625969
Track geometry is one of the most important health indices in the maintenance of rail tracks. Visual inspection and inspection using a track-geometry car are two common approaches to inspect track geometry. Presently, using accelerations from in-service trains has become a popular track inspection approach, because it is a low-cost way to monitor the rail tracks more frequently. However, due to the noise presented in the collected accelerations, detecting anomalies using manually designed features often results in many false alarms. In this paper, we propose a learning-based anomaly detection approach for monitoring the longitude elevation of track geometry from the dynamic response of an in-service train. We consider the track geometry with a sudden change as an anomaly, measured by the signal energy of slopes of the track geometry. The proposed approach uses a variational autoencoder (VAE) to detect the anomaly. The VAE takes accelerations as input and learns a mapping from the frequency-domain representation of acceleration signals to a low-dimensional latent space that represents the distribution of the observed data. The reconstruction probability, which measures the variability of the distribution of the input data, is used as an anomaly score for indicating how well the input follows the normal pattern. Compared to distance- and density-based anomaly detection methods, such as K-nearest neighbor and clustering, the VAE-based anomaly detection is robust to measurement noise and prevents overfitting because it captures the underlying distribution of the data in a low-dimensional space. Furthermore, the VAE-based method does not require model-specific thresholds for detecting anomalies because it uses a probabilistic measurement instead of reconstruction error as the anomaly score. We validate the proposed VAE-based approach on the vibration dataset from an in-service train. We show that this approach outperforms a baseline model (an autoencoder-based anomaly de
This paper describes a deep generative approach to joint chord and key estimation for music signals. The limited amount of music signals with complete annotations has been the major bottleneck in supervised multi-task...
详细信息
This paper describes a deep generative approach to joint chord and key estimation for music signals. The limited amount of music signals with complete annotations has been the major bottleneck in supervised multi-task learning of a classification model. To overcome this limitation, we integrate the supervised multi-task learning approach with the unsupervised autoencoding approach in a mutually complementary manner. Considering the typical process of music composition, we formulate a hierarchical latent variable model that sequentially generates keys, chords, and chroma vectors. The keys and chords are assumed to follow a language model that represents their relationships and dynamics. In the framework of amortized variational inference (AVI), we introduce a classification model that jointly infers discrete chord and key labels and a recognition model that infers continuous latent features. These models are combined to form a variational autoencoder (VAE) and are trained jointly in a (semi-)supervised manner, where the generative and language models act as regularizers for the classification model. We comprehensively investigate three different architectures for the chord and key classification model, and three different architectures for the language model. Experimental results demonstrate that the VAE-based multi-task learning improves chord estimation as well as key estimation.
Development of an efficient peptide design method is crucial for tackling medical problems, such as designing antimicrobial peptides for combating drug resistant pathogens and anticancer peptides for various cancers. ...
详细信息
ISBN:
(纸本)9781728108582
Development of an efficient peptide design method is crucial for tackling medical problems, such as designing antimicrobial peptides for combating drug resistant pathogens and anticancer peptides for various cancers. Here, we present variational autoencoder (VAE) coupled with a Softmax function having a temperature factor (1) for high-throughput design of novel functional peptides. VAE is a generative machine learning model, which has proved to be useful for generating peptide sequences. In this study, we additionally use a Softmax function with T to facilitate determining the most probable amino acids at each position of peptide sequences to be generated, which is difficult to achieve using a conventional VAE. In particular, by manipulating T in the Softmax function, we select biologically most feasible peptides with a desired function. This method is demonstrated for designing novel antimicrobial and anticancer peptides in this study. The method presented herein should be useful for designing various peptides with a desired function upon availability of relevant datasets.
This paper proposed a learning-based approach to reveal diversity possible appearances under the missing area of an occluded unseen image. In general, there are a lot of possible facial appearances for the missing are...
详细信息
ISBN:
(纸本)9781728141923
This paper proposed a learning-based approach to reveal diversity possible appearances under the missing area of an occluded unseen image. In general, there are a lot of possible facial appearances for the missing area;for example, a male with a scarf, it is difficult to predict he has a beard in the covered area or not? In this paper, we propose a novel method for facial image inpainting, which generates the missing facial appearance by conditioning on the observable appearance. Given a trained standard variational autoencoder (VAE) for un-occluded face generation. To be specified, we search for the possible set of VAE coding vector for the current occluded input image, and the predicted coding should be robust to the missing area. The possible facial appearance set is then recovered through the decoder of VAE model. Experiments show that our method successfully predicts recovered results in large missing regions;these results are diverse, and all are reasonable to be consistent with the observable facial area, i.e., both the facial geometry and the personal characteristics are preserved.
Anomalies in data often convey critical information that can be leveraged in a variety of applications. For the military engaged in combat, this can amount to identifying threats early and preserving a lethal edge ove...
详细信息
Anomalies in data often convey critical information that can be leveraged in a variety of applications. For the military engaged in combat, this can amount to identifying threats early and preserving a lethal edge over an adversary. In other more benign cases it can corrupt data integrity and lead to ineffective application of other data analysis techniques. To tackle the problem of anomaly detection, there are several common methods provided in statistics and machine learning literature, including variational autoencoders (VAEs). Using a VAE, we develop a novel objective function to improve its performance detecting anomalies. Additionally, we introduce a modeling pipeline that works in the fully unsupervised context, where one does not know the true proportion of anomalies present in the data. To construct this pipeline, we fit reconstruction errors using a Gaussian mixture model (GMM) and select the model whose characteristics best match our performance metrics. Using our approach, we observe an increase in anomalies detected against a standard objective function, and we measure an average improvement of 0.4021 in F1 scores. We show our findings using four labeled benchmark data sets and apply our conclusions on an open-source, unlabeled data set taken from ***.
暂无评论