In this paper, we present a novel flow metric learning architecture in a parametric multispeaker expressive text-to-speech (TTS) system. We proposed inverse autoregressive flow (IAF) as a way to perform the variationa...
详细信息
ISBN:
(纸本)9781713820697
In this paper, we present a novel flow metric learning architecture in a parametric multispeaker expressive text-to-speech (TTS) system. We proposed inverse autoregressive flow (IAF) as a way to perform the variational inference, thus providing flexible approximate posterior distribution. The proposed approach condition the text-to-speech system on speaker embeddings so that latent space represents the emotion as semantic characteristics. For representing the speaker, we extracted speaker embeddings from the x-vector based speaker recognition model trained on speech data from many speakers. To predict the vocoder features, we used the acoustic model conditioned on the textual features as well as on the speaker embedding. We transferred the expressivity by using the mean of the latent variables for each emotion to generate expressive speech in different speaker's voices for which no expressive speech data is available. We compared the results obtained using flow-based variational inference with variational autoencoder as a baseline model. The performance measured by mean opinion score (MOS), speaker MOS, and expressive MOS shows that N-pair loss based deep metric learning along with IAF model improves the transfer of expressivity in the desired speaker's voice in synthesized speech.
The transparent cornea is the window of the eye, facilitating the entry of light rays and controlling focusing the movement of the light within the eye. The cornea is critical, contributing to 75% of the refractive po...
详细信息
ISBN:
(纸本)9781728169262
The transparent cornea is the window of the eye, facilitating the entry of light rays and controlling focusing the movement of the light within the eye. The cornea is critical, contributing to 75% of the refractive power of the eye. Keratoconus is a progressive and multifactorial corneal degenerative disease affecting 1 in 2000 individuals worldwide. Currently, there is no cure for keratoconus other than corneal transplantation for advanced stage keratoconus or corneal cross-linking, which can only halt KC progression. The ability to accurately identify subtle KC or KC progression is of vital clinical significance. To date, there has been little consensus on a useful model to classify KC patients, which therefore inhibits the ability to predict disease progression accurately. In this paper, we utilised machine learning to analyse data from 124 KC patients, including topographical and clinical variables. Both supervised multilayer perceptron and unsupervised variational autoencoder models were used to classify KC patients with reference to the existing Amsler-Krumeich (A-K) classification system. Both methods result in high accuracy, with the unsupervised method showing better performance. The result showed that the unsupervised method with a selection of 29 variables could be a powerful tool to provide an automatic classification tool for clinicians. These outcomes provide a platform for additional analysis for the progression and treatment of keratoconus.
To date, most instance segmentation approaches are based on supervised learning that requires a considerable amount of annotated object contours as training ground truth. Here, we propose a framework that searches for...
详细信息
ISBN:
(纸本)9783030611651;9783030611668
To date, most instance segmentation approaches are based on supervised learning that requires a considerable amount of annotated object contours as training ground truth. Here, we propose a framework that searches for the target object based on a shape prior. The shape prior model is learned with a variational autoencoder that requires only a very limited amount of training data: In our experiments, a few dozens of object shape patches from the target dataset, as well as purely synthetic shapes, were sufficient to achieve results en par with supervised methods with full access to training data on two out of three cell segmentation datasets. Our method with a synthetic shape prior was superior to pre-trained supervised models with access to limited domain-specific training data on all three datasets. Since the learning of prior models requires shape patches, whether real or synthetic data, we call this framework semi-supervised learning. The code is available to the public (https://***/looooongChen/shape_prior_seg).
Estimating the effect of a given medical treatment on individual patients involves evaluating how clinical outcomes are affected by the treatment in question. Robust estimates of the treatment effect for a given patie...
详细信息
Estimating the effect of a given medical treatment on individual patients involves evaluating how clinical outcomes are affected by the treatment in question. Robust estimates of the treatment effect for a given patient with a pre-specified set of clinical characteristics, are possible to obtain when there is sufficient common support for these features. Essentially, features having the greatest common support correspond to regions of significant overlap between the distributions of the different treatment groups. In observational datasets, however, all possible treatment options may not be uniformly represented, and therefore robust estimation of their effect may only be possible for the patients in the overlapping region. In this work, we propose a Contrastive variational autoencoder (Contrastive-VAE) to estimate where there is significant overlap between patient distributions corresponding to different treatment options. A Contrastive-VAE exploits shared information between different groups by modeling the shared information as arising from a shared set of latent variables to approximate distributions for treatment options that are not well represented in observational datasets. The result is an improved estimation of the distribution of the groups with a small number of data points. By estimating the likelihood for each group with annealed importance sampling, we are able to quantitatively identify the area of overlap between multiple treatment groups and obtain an effective confidence interval for the estimated individual treatment effect.
The increasing use of machine-learning (ML) enabled systems in critical tasks fuels the quest for novel verification and validation techniques yet grounded in accepted system assurance principles. In traditional syste...
详细信息
ISBN:
(纸本)9781450371261
The increasing use of machine-learning (ML) enabled systems in critical tasks fuels the quest for novel verification and validation techniques yet grounded in accepted system assurance principles. In traditional system development, model-based techniques have been widely adopted, where the central premise is that abstract models of the required system provide a sound basis for judging its implementation. We posit an analogous approach for ML systems using an ML technique that extracts from the high-dimensional training data implicitly describing the required system, a low-dimensional underlying structure-a manifold. It is then harnessed for a range of quality assurance tasks such as test adequacy measurement, test input generation, and runtime monitoring of the target ML system. The approach is built on variational autoencoder, an unsupervised method for learning a pair of mutually near-inverse functions between a given high-dimensional dataset and a low-dimensional representation. Preliminary experiments establish that the proposed manifold-based approach, for test adequacy drives diversity in test data, for test generation yields fault-revealing yet realistic test cases, and for run-time monitoring provides an independent means to assess trustability of the target system's output.
In this paper, we propose the "trace data analytics" for classifying fault conditions from multivariate time series sensor signals using well-known deep CNN models. In our approach, multiple sensor signals a...
详细信息
ISBN:
(纸本)9781728158761
In this paper, we propose the "trace data analytics" for classifying fault conditions from multivariate time series sensor signals using well-known deep CNN models. In our approach, multiple sensor signals are converted into two dimensional representations using the proposed conversion methods to optimize the classification performance. Many studies on the prediction of manufacturing results using sensor signals have been conducted in the field of fault detection and classification for display and semiconductor manufacturing processes. It is challenging to apply machine learning to real-life manufacturing problems due to practical limitations, class imbalance and data insufficiency, which also make it difficult to produce a generalized model. To overcome these challenges, we propose using omni-supervised learning but with a new approach to knowledge distillation that ensembles predictions from multiple instantiations of a CNN model of synthetically generated data samples from a deep generative model. Our experiment results show that the fault classification accuracy improves substantially by applying trace data analytics to manufacturing data from display fabrication lines. The results also show that the quality of trained CNN models using the proposed knowledge distillation is maintained steadily and stably.
Adversarial variational Bayes (AVB) can infer the parameters of a generative model from the data using approximate maximum likelihood. The likelihood of deep generative models model is intractable. However, it can be ...
详细信息
ISBN:
(纸本)9783030617059;9783030617042
Adversarial variational Bayes (AVB) can infer the parameters of a generative model from the data using approximate maximum likelihood. The likelihood of deep generative models model is intractable. However, it can be approximated by a lower bound obtained in terms of an approximate posterior distribution of the latent variables of the data q. The closer q is to the actual posterior, the tighter the lower bound is. Therefore, by maximizing the lower bound one should expect to also maximize the likelihood. Traditionally, the approximate distribution q is Gaussian. AVB relaxes this limitation and allows for flexible distributions that may lack a closed-form probability density function. Implicit distributions obtained by letting a source of Gaussian noise go through a deep neural network are examples of these distributions. Here, we combine AVB with the importance weighted autoencoder, a technique that has been shown to provide a tighter lower bound on the marginal likelihood. This is expected to lead to a more accurate parameter estimation of the generative model via approximate maximum likelihood. We have evaluated the proposed method on three datasets, MNIST, Fashion MNIST, and Omniglot. The experiments show that the proposed method improves the test log-likelihood of a generative model trained using AVB.
Detecting out-of-distribution samples for image applications plays an important role in safeguarding the reliability of machine learning model deployment. In this article, we developed a software tool to support our O...
详细信息
Detecting out-of-distribution samples for image applications plays an important role in safeguarding the reliability of machine learning model deployment. In this article, we developed a software tool to support our OOD detector CVAD -a self-supervised Cascade variational autoencoder-based Anomaly Detector , which can be easily applied to various image applications without any assumptions. The corresponding open-source software is published for better public research and tool usage.
In recent years, internal attacks have posed a serious threat to the security of individuals, companies and even the country. Machine learning is currently a common method of insider threat detection. However, this te...
详细信息
In recent years, internal attacks have posed a serious threat to the security of individuals, companies and even the country. Machine learning is currently a common method of insider threat detection. However, this technology requires a series of complex feature engineering, which has certain limitations in practical applications. This paper comprehensively considers the user's business operation behavior data and internal psychological data, and establishes an internal threat detection model to analyze their potential associations. The main tasks are as follows: In order to improve the fine-grained features of heterogeneous behavior log data and accurately reflect user behavior attributes, a session-based full feature extraction method is proposed. In this method, combined with a variational autoencoder, a long and shortterm memory variational autoencoder (LVE) model is proposed. Taking into account the time characteristics of user behavior, a long and short-term memory network is used in the codec part, that is, input data, generate hidden variables, and then restore output data through hidden variables. The results show that this method improves the recall rate compared with other algorithms. Finally, the main work and improvement prospects are summarized.
The channels of wireless body area networks (WBANs) are affected by human motion. Focusing on this characteristic of the WBAN channel, human motion classification and transmission power control have been investigated....
详细信息
ISBN:
(纸本)9781728166179
The channels of wireless body area networks (WBANs) are affected by human motion. Focusing on this characteristic of the WBAN channel, human motion classification and transmission power control have been investigated. Feature extraction of the WBAN channels is an important process for human motion classification. It is desirable that feature extraction is determined automatically. This is because it's difficult to select appropriate features by hand considering various factors affecting to the WBAN channels such as positions of transceivers, antennas, surrounding environment, etc. In this paper, an automatic feature extraction of the WBAN channel gains using convolutional neural networks (CNNs) is investigated. First, a human motion classifier is constructed using CNN. The accuracy rate of the classifier is evaluated and the relationship between the vector extracted by CNN and the features used in previous research is examined. Next, Feature extraction of the channel gains using variational autoencoders (VAEs) is performed. The relationship between latent variables extracted by VAE and human motion is examined. Through these considerations, an automatic feature extraction of the WBAN channel gains based on CNN is shown.
暂无评论