Text-based speech editing systems are developed to enable users to modify speech based on the transcript. Existing state-of-the-art editing systems based on neural networks do partial inferences with no exception, tha...
详细信息
Text-based speech editing systems are developed to enable users to modify speech based on the transcript. Existing state-of-the-art editing systems based on neural networks do partial inferences with no exception, that is, only generate new words that need to be replaced or inserted. This manner usually leads to the prosody of the edited part being inconsistent with the surrounding speech and a failure to handle the alteration of intonation. To address these problems, we propose a cross-utterance conditioned coherent speech editing system, that first does the entire reasoning at the inference time. Our proposed system can generate speech by utilizing speaker information, context, acoustic features, and the mel-spectrogram from the original audio. Experiments conducted on subjective and objective metrics demonstrate that our approach outperforms the baseline on various editing operations regarding naturalness and prosody consistency.
Personalized federated learning (PFL) jointly trains a variety of local models through balancing between knowledge sharing across clients and model personalization per client. This paper addresses PFL via explicit dis...
详细信息
ISBN:
(纸本)9781665468916
Personalized federated learning (PFL) jointly trains a variety of local models through balancing between knowledge sharing across clients and model personalization per client. This paper addresses PFL via explicit disentangling latent representations into two parts to capture the shared knowledge and client-specific personalization, which leads to more reliable and effective PFL. The disentanglement is achieved by a novel Federated Dual variational autoencoder (FedDVA), which employs two encoders to infer the two types of representations. FedDVA can produce a better understanding of the trade-off between global knowledge sharing and local personalization in PFL. Moreover, it can be integrated with existing FL methods and turn them into personalized models for heterogeneous downstream tasks. Extensive experiments validate the advantages caused by disentanglement and show that models trained with disentangled representations substantially outperform those vanilla methods.
Much research has been devoted to the problem of learning fair representations;however, they do not explicitly state the relationship between latent representations. In many real-world applications, there may be causa...
详细信息
ISBN:
(纸本)9783031333736;9783031333743
Much research has been devoted to the problem of learning fair representations;however, they do not explicitly state the relationship between latent representations. In many real-world applications, there may be causal relationships between latent representations. Furthermore, most fair representation learning methods focus on group-level fairness and are based on correlation, ignoring the causal relationships underlying the data. In this work, we theoretically demonstrate that using the structured representations enables downstream predictive models to achieve counterfactual fairness, and then we propose the Counterfactual Fairness variational autoencoder (CF-VAE) to obtain structured representations with respect to domain knowledge. The experimental results show that the proposed method achieves better fairness and accuracy performance than the benchmark fairness methods.
Electronic health records (EHRs) provide rich clinical information and the opportunities to extract epidemiological patterns to understand and predict patient disease risks with suitable machine learning methods such ...
详细信息
ISBN:
(纸本)9781450393850
Electronic health records (EHRs) provide rich clinical information and the opportunities to extract epidemiological patterns to understand and predict patient disease risks with suitable machine learning methods such as topic models. However, existing topic models do not generate identifiable topics each predicting a unique phenotype. One promising direction is to use known phenotype concepts to guide topic inference. We present a seed-guided Bayesian topic model called MixEHR-Seed with 3 contributions: (1) for each phenotype, we infer a dual-form of topic distribution: a seed-topic distribution over a small set of key EHR codes and a regular topic distribution over the entire EHR vocabulary;(2) we model age-dependent disease progression as Markovian dynamic topic priors;(3) we infer seed-guided multi-modal topics over distinct EHR data types. For inference, we developed a variational inference algorithm. Using MixEHR-Seed, we inferred 1569 PheCode-guided phenotype topics from an EHR database in Quebec, Canada covering 1.3 million patients for up to 20-year follow-up with 122 million records for 8539 and 1126 unique diagnostic and drug codes, respectively. We observed (1) accurate phenotype prediction by the guided topics, (2) clinically relevant PheCode-guided disease topics, (3) meaningful age-dependent disease prevalence. Source code is available at GitHub.
In this paper we propose a method for predicting the status of MGMT promoter methylation in high-grade gliomas. From the available MR images, we segment the tumor using deep convolutional neural networks and extract b...
详细信息
ISBN:
(纸本)9783031090028;9783031090011
In this paper we propose a method for predicting the status of MGMT promoter methylation in high-grade gliomas. From the available MR images, we segment the tumor using deep convolutional neural networks and extract both radiomic features and shape features learned by a variational autoencoder. We implemented a standard machine learning workflow to obtain predictions, consisting of feature selection followed by training of a random forest classification model. We trained and evaluated our method on the RSNA-ASNR-MICCAI BraTS 2021 challenge dataset and submitted our predictions to the challenge.
Generalized zero-shot learning (GZSL) is a technique to train a deep learning model to identify unseen classes. Conventionally, conditional generative models have been employed to generate training data for unseen cla...
详细信息
ISBN:
(纸本)9781665405409
Generalized zero-shot learning (GZSL) is a technique to train a deep learning model to identify unseen classes. Conventionally, conditional generative models have been employed to generate training data for unseen classes from the attribute. In this paper, we propose a new conditional generative model that improves the GZSL performance greatly. In a nutshell, the proposed model, called conditional Wasserstein autoencoder (CWAE), minimizes the Wasserstein distance between the real and generated image feature distributions using an encoder-decoder architecture. From the extensive experiments on various benchmark datasets, we show that the proposed CWAE outperforms conventional generative models in terms of the GZSL classification performance.
With the demand for autonomous control and personalized speech generation, the style control and transfer in Text-to-Speech (TTS) is becoming more and more important. In this paper, we propose a new TTS system that ca...
详细信息
With the demand for autonomous control and personalized speech generation, the style control and transfer in Text-to-Speech (TTS) is becoming more and more important. In this paper, we propose a new TTS system that can perform style transfer with interpretability and high fidelity. Firstly, we design a TTS system that combines variational autoencoder (VAE) and diffusion refiner to get refined mel-spectrograms. Specifically, a two-stage and a one-stage system are designed respectively, to improve the audio quality and the performance of style transfer. Secondly, a diffusion bridge of quantized VAE is designed to efficiently learn complex discrete style representations and improve the performance of style transfer. To have a better ability of style transfer, we introduce ControlVAE to improve the reconstruction quality and have good interpretability simultaneously. Experiments on LibriTTS dataset demonstrate that our method is more effective than baseline models
Topic modeling is a dominant method for exploring document collections on the web and in digital libraries. Recent approaches to topic modeling use pretrained contextualized language models and variational autoencoder...
详细信息
ISBN:
(数字)9783031282386
ISBN:
(纸本)9783031282379;9783031282386
Topic modeling is a dominant method for exploring document collections on the web and in digital libraries. Recent approaches to topic modeling use pretrained contextualized language models and variational autoencoders. However, large neural topic models have a considerable memory footprint. In this paper, we propose a knowledge distillation framework to compress a contextualized topicmodel without loss in topic quality. In particular, the proposed distillation objective is to minimize the cross-entropy of the soft labels produced by the teacher and the student models, as well as to minimize the squared 2-Wasserstein distance between the latent distributions learned by the two models. Experiments on two publicly available datasets show that the student trained with knowledge distillation achieves topic coherence much higher than that of the original student model, and even surpasses the teacher while containing far fewer parameters than the teacher. The distilled model also outperforms several other competitive topic models on topic coherence.
The maritime industry generally anticipates having semi-autonomous ferries in commercial use on the west coast of Norway by the end of this decade. In order to schedule maintenance operations of critical components in...
详细信息
ISBN:
(纸本)9781728116976
The maritime industry generally anticipates having semi-autonomous ferries in commercial use on the west coast of Norway by the end of this decade. In order to schedule maintenance operations of critical components in a secure and cost-effective manner, a reliable prognostics and health management system is essential during autonomous operations. Any remaining useful life prediction obtained from such system should depend on an automatic fault detection algorithm. In this study, an unsupervised reconstruction-based fault detection algorithm is used to predict faults automatically in a simulated autonomous ferry crossing operation. The benefits of the algorithm are confirmed on data sets of real-operational data from a marine diesel engine collected from a hybrid power lab. During the ferry crossing operation, the engine is subjected to drastic changes in operational loads. This increases the difficulty of the algorithm to detect faults with high accuracy. Thus, to support the algorithm, three different feature selection processes on the input data is compared. The results suggest that the algorithm achieves the highest prediction accuracy when the input data is subjected to feature selection based on sensitivity analysis.
In the digitization of energy systems, sensors and smart meters are increasingly being used to monitor production, operation and demand. Detection of anomalies based on smart meter data is crucial to identify potentia...
详细信息
ISBN:
(纸本)9783031105258;9783031105241
In the digitization of energy systems, sensors and smart meters are increasingly being used to monitor production, operation and demand. Detection of anomalies based on smart meter data is crucial to identify potential risks and unusual events at an early stage, which can serve as a reference for timely initiation of appropriate actions and improving management. However, smart meter data from energy systems often lack labels and contain noise and various patterns without distinctively cyclical. Meanwhile, the vague definition of anomalies in different energy scenarios and highly complex temporal correlations pose a great challenge for anomaly detection. Many traditional unsupervised anomaly detection algorithms such as cluster-based or distance-based models are not robust to noise and not fully exploit the temporal dependency in a time series as well as other dependencies amongst multiple variables (sensors). This paper proposes an unsupervised anomaly detection method based on a variational Recurrent autoencoder with attention mechanism. with "dirty" data from smart meters, our method pre-detects missing values and global anomalies to shrink their contribution while training. This paper makes a quantitative comparison with the VAE-based baseline approach and four other unsupervised learning methods, demonstrating its effectiveness and superiority. This paper further validates the proposed method by a real case study of detecting the anomalies of water supply temperature from an industrial heating plant.
暂无评论