This paper proposes a non-parallel cross-lingual voice conversion (CLVC) model that can mimic voice while continuously controlling speaker individuality on the basis of the variational autoencoder (VAE) and star gener...
详细信息
This paper proposes a non-parallel cross-lingual voice conversion (CLVC) model that can mimic voice while continuously controlling speaker individuality on the basis of the variational autoencoder (VAE) and star generative adversarial network (StarGAN). Most studies on CLVC only focused on mimicking a particular speaker voice without being able to arbitrarily modify the speaker individuality. In practice, the ability to generate speaker individuality may be more useful than just mimicking voice. Therefore, the proposed model reliably extracts the speaker embedding from different languages using a VAE. An F0 injection method is also introduced into our model to enhance the F0 modeling in the cross-lingual setting. To avoid the over-smoothing degradation problem of the conventional VAE, the adversarial training scheme of the StarGAN is adopted to improve the training-objective function of the VAE in a CLVC task. Objective and subjective measurements confirm the effectiveness of the proposed model and F0 injection method. Furthermore, speaker-similarity measurement on fictitious voices reveal a strong linear relationship between speaker individuality and interpolated speaker embedding, which indicates that speaker individuality can be controlled with our proposed model.
Anomaly detection is indispensable for ensuring the reliable operation of grid -connected photovoltaic (PV) systems. This study introduces a semi -supervised deep learning approach for fault detection in such systems....
详细信息
Anomaly detection is indispensable for ensuring the reliable operation of grid -connected photovoltaic (PV) systems. This study introduces a semi -supervised deep learning approach for fault detection in such systems. The method leverages a variational autoencoder (VAE) to extract features and identify anomalies. By training the VAE on normal operation data, a compact latent space representation is created. Abnormal observations, indicating faults, exhibit distinct feature vectors in this latent space. Multiple anomaly detection algorithms, including Isolation Forest, Epileptic Envelope, Local Outlier Factor, and One -Class SVM, are employed to discern normal and abnormal observations. This semi -supervised approach only requires fault -free data for training, without labeled faults, making it attractive in practice. A publicly available dataset, the Gridconnected PV System Faults (GPVS-Faults) dataset, which includes data from a PV plant operating in both maximum power point tracking (MPPT) and intermediate power point tracking (IPPT) switching modes, is used for evaluation. The proposed approach is assessed across various fault scenarios, such as partial shading, inverter faults, and MPPT/IPPT controller faults in boost converters. The outcomes underscore the effectiveness of VAE-based techniques in accurately identifying these faults, with accuracy rates reaching up to 92.90% for MPPT mode and 92.99% for IPPT mode, thus contributing to the robustness of fault detection in grid -connected PV systems.
In practical engineering, obtaining labeled high-quality fault samples poses challenges. Conventional fault diagnosis methods based on deep learning struggle to discern the underlying causes of mechanical faults from ...
详细信息
In practical engineering, obtaining labeled high-quality fault samples poses challenges. Conventional fault diagnosis methods based on deep learning struggle to discern the underlying causes of mechanical faults from a fine-grained perspective, due to the scarcity of annotated data. To tackle those issue, we propose a novel semi-supervised Gaussian Mixed variational autoencoder method, SeGMVAE, aimed at acquiring unsupervised representations that can be transferred across fine-grained fault diagnostic tasks, enabling the identification of previously unseen faults using only the small number of labeled samples. Initially, Gaussian mixtures are introduced as a multimodal prior distribution for the variational autoencoder. This distribution is dynamically optimized for each task through an expectation-maximization (EM) algorithm, constructing a latent representation of the bridging task and unlabeled samples. Subsequently, a set variational posterior approach is presented to encode each task sample into the latent space, facilitating meta-learning. Finally, semi-supervised EM integrates the posterior of labeled data by acquiring task-specific parameters for diagnosing unseen faults. Results from two experiments demonstrate that SeGMVAE excels in identifying new fine-grained faults and exhibits outstanding performance in cross-domain fault diagnosis across different machines. Our code is available at https://***/zhiqan/SeGMVAE.
Deep learning has gained significant attention in medical image segmentation. However, the limited availability of annotated training data presents a challenge to achieving accurate results. In efforts to overcome thi...
详细信息
Deep learning has gained significant attention in medical image segmentation. However, the limited availability of annotated training data presents a challenge to achieving accurate results. In efforts to overcome this challenge, data augmentation techniques have been proposed. However, the majority of these approaches primarily focus on image generation. For segmentation tasks, providing both images and their corresponding target masks is crucial, and the generation of diverse and realistic samples remains a complex task, especially when working with limited training datasets. To this end, we propose a new end-to-end hybrid architecture based on Hamiltonian variational autoencoders (HVAE) and a discriminative regularization to improve the quality of generated images. Our method provides an accurate estimation of the joint distribution of the images and masks, resulting in the generation of realistic medical images with reduced artifacts and off-distribution instances. As generating 3D volumes requires substantial time and memory, our architecture operates on a slice- by-slice basis to segment 3D volumes, capitalizing on the richly augmented dataset. Experiments conducted on two public datasets, BRATS (MRI modality) and HECKTOR (PET modality), demonstrate the efficacy of our proposed method on different medical imaging modalities with limited data.
In this paper, we investigate two algorithms for variational autoencoder (VAE)-based underdetermined multichannel source separation. We previously extended the multichannel VAE (MVAE) method for determined multichanne...
详细信息
ISBN:
(纸本)9781665405409
In this paper, we investigate two algorithms for variational autoencoder (VAE)-based underdetermined multichannel source separation. We previously extended the multichannel VAE (MVAE) method for determined multichannel source separation and proposed the generalized MVAE (GMVAE) method for underdetermined multichannel source separation. The GMVAE method employs a conditional VAE (CVAE) as the source model representing the power spectrograms of the underlying sources present in a mixture. While we developed a convergence-guaranteed parameter estimation algorithm using a majorization-minimization/minorization-maximization (MM) algorithm, an expectation-maximization (EM) algorithm also allows us to design another algorithm with the same property. However, a comparison of the MM-based and EM-based algorithms has not yet been revealed. To elucidate this, we investigate the MM-based and EM-based algorithms for the GMVAE method, using an improved CVAE variant called auxiliary classifier VAE (ACVAE). The experimental results suggest that the EM-based algorithm takes less computational cost, achieving comparable separation performance with the MM-based algorithm.
With the emergence of AI(artificial intelligence), it is becoming more and more critical for organizations to utilize it to their advantage. However, organizations that possess a decent amount of data might not have t...
详细信息
ISBN:
(纸本)9781665485555
With the emergence of AI(artificial intelligence), it is becoming more and more critical for organizations to utilize it to their advantage. However, organizations that possess a decent amount of data might not have the technical competence to perform machine learning, and vice versa. Hence, it is reasonable for the two kinds of organizations to work together to realize the value of the data. With the increasing concern over data privacy, regulations such as GDPR(General Data Protection Regulation) prevent an organization from sharing data with another unless the data is processed to the point that the individuals in the data are not identifiable. Various ways of data anonymization have been proposed and developed, including the ones that utilize neural networks to achieve the goal, like AE, VAE, and GAN. With the addition of a differential privacy framework like TensorFlow Privacy, privacy can be guaranteed, but data still needs to be usable after privacy protection measures are deployed. The present study aims to integrate TensorFlow Privacy into the synthetic data generation process and evaluate its usefulness for daily use in the industries. Since TensorFlow Privacy brings a provable privacy guarantee to synthetic data, the present study focuses on the evaluation of data utility. TensorFlow is widely used for machine learning in the industry and academically. TensorFlow Privacy, which is also developed by Google, can prove to be a valuable addition to the synthetic data generation pipeline. The result shows that VAE with TensorFlow Privacy 1) generates synthetic data with good data utility in most cases in terms of descriptive statistics and machine learning classification tasks, and 2) The customizable TensorFlow Privacy parameters work as intended in terms of privacy-utility trade-off.
The prediction of port container throughput has a significant impact on many of the port's operations. However, accurate prediction of throughput is a difficult problem due to the complexity of the port environmen...
详细信息
ISBN:
(数字)9783031001260
ISBN:
(纸本)9783031001260;9783031001253
The prediction of port container throughput has a significant impact on many of the port's operations. However, accurate prediction of throughput is a difficult problem due to the complexity of the port environment and the uncertainty of port operations. In this paper, we proposed an approach combining self-attention mechanism and variational autoencoder to forecast the operating time of each container. First, we used self-attention mechanism to capture the features between adjacent containers. Then to reduce the influence of missing data, we designed a variational autoencoder (VAE) module to model the latent variables in the port. Finally, the output layer combined the results of these two parts to obtain the final forecast of the loading and discharging time of containers. The throughput of the entire port can be inferred from the forecasted container operation time. Furthermore, we also proposed dynamic programming algorithms to estimate the distribution of the throughput with the help of variational autoencoder module. Experiment results on port throughput prediction in the real-world datasets show that our approach has superior performance at prediction accuracy. Moreover, experiments conducted at different time intervals demonstrate the effectiveness of our approach on various time scales. And the effectiveness of the dynamic programming algorithms is demonstrated through our case study.
Recent advances in synthetic speech quality have enabled us to train text-to-speech (TTS) systems by using synthetic corpora. However, merely increasing the amount of synthetic data is not always advantageous for impr...
详细信息
Recent advances in synthetic speech quality have enabled us to train text-to-speech (TTS) systems by using synthetic corpora. However, merely increasing the amount of synthetic data is not always advantageous for improving training efficiency. Our aim in this study is to selectively choose synthetic data that are beneficial to the training process. In the proposed method, we first adopt a variational autoencoder whose posterior distribution is utilized to extract latent features representing acoustic similarity between the recorded and synthetic corpora. By using those learned features, we then train a ranking support vector machine (RankSVM) that is well known for effectively ranking relative attributes among binary classes. By setting the recorded and synthetic ones as two opposite classes, RankSVM is used to determine how the synthesized speech is acoustically similar to the recorded data. Then, synthetic TTS data, whose distribution is close to the recorded data, are selected from large-scale synthetic corpora. By using these data for retraining the TTS model, the synthetic quality can be significantly improved. Objective and subjective evaluation results show the superiority of the proposed method over the conventional methods.
An objective of mechanical design is to obtain a shape that satisfies specific requirements. In the present work, we achieve this goal using a conditional variational autoencoder (CVAE). The method enables us to analy...
详细信息
An objective of mechanical design is to obtain a shape that satisfies specific requirements. In the present work, we achieve this goal using a conditional variational autoencoder (CVAE). The method enables us to analyze the relationship between aerodynamic performance and the shape of aerodynamic parts, and to explore new designs for the parts. In the CVAE model, a shape is fed as an input and the corresponding aerodynamic performance index is fed as a continuous label. Then, shapes are generated by specifying the continuous label and latent vector. When CVAE is applied to mechanical design, it is desired to draw shapes that reproduce the specified aerodynamic performance. In ordinal CVAE, the model is trained to minimize reconstruction loss and latent loss, and it is usually optimized considering the sum of these losses. However, the present study shows that the optimal network is not always optimal in terms of reproducing the aerodynamic performance. The proposed method is verified using two numerical examples: a two-dimensional (2D) airfoil and a turbine blade. In the airfoil example, we demonstrate the effects of latent dimension, and in the turbine design example, we demonstrate that the proposed method can be applied to a real turbine design problem and reduce the design time.
The quantification of uncertainty in civil structures poses a significant challenge in contemporary research due to the substantial computational demands involved. This study introduces an innovative approach for upda...
详细信息
The quantification of uncertainty in civil structures poses a significant challenge in contemporary research due to the substantial computational demands involved. This study introduces an innovative approach for updating the finite element model (FEM) and quantifying uncertainties in civil structures through the synergistic use of variational autoencoder (VAE) and polynomial chaos expansion (PCE). Within this framework, the unknown parameters inherent to the structural FEM are represented as latent variables and can be effectively inferred through the VAE. These latent variables are modeled using a multivariate Gaussian distribution. In the proposed methodology, the PCE serves to approximate the log-likelihood function associated with the latent variables, facilitating the derivation of the analytic expression for the variational lower bound. By maximizing this variational lower bound, both the mean and standard deviation can be readily determined. To assess the accuracy and computational efficiency of the proposed technique, numerical analyses are performed on a cantilever beam and a steel pedestrian bridge. Furthermore, the effectiveness of the proposed approach is validated through its application to damage identification within a benchmark model. Significantly, the results indicate that the proposed method offers superior computational efficiency compared to the conventional VAE approach. Notably, the findings reveal that employing a high-order PCE is unnecessary;rather, a low-order PCE suffices for precise parameter identification. Consequently, the proposed methodology necessitates only a limited dataset for training to ascertain the PCE coefficients, thereby enhancing its practical applicability and efficiency.
暂无评论