In settings requiring synthetic data generation based on a clinical cohort, e.g., due to data protection regulations, heterogeneity across individuals might be a nuisance that we need to control or faithfully preserve...
详细信息
In settings requiring synthetic data generation based on a clinical cohort, e.g., due to data protection regulations, heterogeneity across individuals might be a nuisance that we need to control or faithfully preserve. The sources of such heterogeneity might be known, e.g., as indicated by sub-groups labels, or might be unknown and thus reflected only in properties of distributions, such as bimodality or skewness. We investigate how such heterogeneity can be preserved and controlled when obtaining synthetic data from variational autoencoders (VAEs), i.e., a generative deep learning technique that utilizes a low-dimensional latent representation. To faithfully reproduce unknown heterogeneity reflected in marginal distributions, we propose to combine VAEs with pre-transformations. For dealing with known heterogeneity due to sub-groups, we complement VAEs with models for group membership, specifically from propensity score regression. The evaluation is performed with a realistic simulation design that features sub-groups and challenging marginal distributions. The proposed approach faithfully recovers the latter, compared to synthetic data approaches that focus purely on marginal distributions. Propensity scores add complementary information, e.g., when visualized in the latent space, and enable sampling of synthetic data with or without sub-group specific characteristics. We also illustrate the proposed approach with real data from an international stroke trial that exhibits considerable distribution differences between study sites, in addition to bimodality. These results indicate that describing heterogeneity by statistical approaches, such as propensity score regression, might be more generally useful for complementing generative deep learning for obtaining synthetic data that faithfully reflects structure from clinical cohorts.
In high-stakes applications of data-driven decision-making such as healthcare, it is of paramount importance to learn a policy that maximizes the reward while avoiding potentially dangerous actions when there is uncer...
详细信息
In high-stakes applications of data-driven decision-making such as healthcare, it is of paramount importance to learn a policy that maximizes the reward while avoiding potentially dangerous actions when there is uncertainty. There are two main challenges usually associated with this problem. First, learning through online exploration is not possible due to the critical nature of such applications. Therefore, we need to resort to observational datasets with no counterfactuals. Second, such datasets are usually imperfect, additionally cursed with missing values in the attributes of features. In this article, we consider the problem of constructing personalized policies using logged data when there are missing values in the attributes of features in both training and test data. The goal is to recommend an action (treatment) when (X) over tilde, a degraded version of X with missing values, is observed. We consider three strategies for dealing with missingness. In particular, we introduce the conservative strategy where the policy is designed to safely handle the uncertainty due to missingness. In order to implement this strategy, we need to estimate posterior distribution p(X vertical bar(X) over tilde) and use a variational autoencoder to achieve this. In particular, our method is based on partial variational autoencoders (PVAEs) that are designed to capture the underlying structure of features with missing values.
The de novo design of drug molecules is recognized as a time-consuming and costly process, and computational approaches have been applied in each stage of the drug discovery pipeline. variational autoencoder is one of...
详细信息
ISBN:
(纸本)9783981926361
The de novo design of drug molecules is recognized as a time-consuming and costly process, and computational approaches have been applied in each stage of the drug discovery pipeline. variational autoencoder is one of the computer-aided design methods which explores the chemical space based on an existing molecular dataset. Quantum machine learning has emerged as an atypical learning method that may speed up some classical learning tasks because of its strong expressive power. However, near-term quantum computers suffer from limited number of qubits which hinders the representation learning in high dimensional spaces. We present a scalable quantum generative autoencoder (SQ-VAE) for simultaneously reconstructing and sampling drug molecules, and a corresponding vanilla variant (SQ-AE) for better reconstruction. The architectural strategies in hybrid quantum classical networks such as, adjustable quantum layer depth, heterogeneous learning rates, and patched quantum circuits are proposed to learn high dimensional dataset such as, ligand-targeted drugs. Extensive experimental results are reported for different dimensions including 8x8 and 32x32 after choosing suitable architectural strategies. The performance of quantum generative autoencoder is compared with the corresponding classical counterpart throughout all experiments. The results show that quantum computing advantages can be achieved for normalized low-dimension molecules, and that high-dimension molecules generated from quantum generative autoencoders have better drug properties within the same learning period.
We present a novel deep clustering algorithm that utilizes a variational autoencoder (VAE) framework with an entangled multi encoder-decoder neural architecture. Our model enforces a complementary structure that guide...
详细信息
We present a novel deep clustering algorithm that utilizes a variational autoencoder (VAE) framework with an entangled multi encoder-decoder neural architecture. Our model enforces a complementary structure that guides the learned latent representations towards a better space arrangement. It differs from previous VAE-based clustering algorithms by employing a new generative model that uses multiple encoder-decoders that are entangled to provide a joint clustering decision. The optimal clustering is found by optimizing a lower bound of the model likelihood function. Both the reconstruction component and the regularization component of the ELBO objective function are explicitly involved in the clustering procedure. We show that this modeling results in both better clustering capabilities and improved data generation. The proposed method is evaluated on standard datasets and is shown to significantly outper-form state-of-the-art deep clustering methods.(c) 2023 Elsevier B.V. All rights reserved.
While single-cell multimodal datasets allow for the measurement of individual cells to understand cellular and molecular mechanisms, generating multimodal data for many cells is costly and challenging. Cohen Kalafut a...
详细信息
While single-cell multimodal datasets allow for the measurement of individual cells to understand cellular and molecular mechanisms, generating multimodal data for many cells is costly and challenging. Cohen Kalafut and colleagues develop a machine learning model capable of imputing single-cell modalities and prioritizing multimodal features, such as gene expression, chromatin accessibility and electrophysiology. Single-cell multimodal datasets have measured various characteristics of individual cells, enabling a deep understanding of cellular and molecular mechanisms. However, multimodal data generation remains costly and challenging, and missing modalities happen frequently. Recently, machine learning approaches have been developed for data imputation but typically require fully matched multimodalities to learn common latent embeddings that potentially lack modality specificity. To address these issues, we developed an open-source machine learning model, Joint variational autoencoders for multimodal Imputation and Embedding (JAMIE). JAMIE takes single-cell multimodal data that can have partially matched samples across modalities. variational autoencoders learn the latent embeddings of each modality. Then, embeddings from matched samples across modalities are aggregated to identify joint cross-modal latent embeddings before reconstruction. To perform cross-modal imputation, the latent embeddings of one modality can be used with the decoder of the other modality. For interpretability, Shapley values are used to prioritize input features for cross-modal imputation and known sample labels. We applied JAMIE to both simulation data and emerging single-cell multimodal data including gene expression, chromatin accessibility, and electrophysiology in human and mouse brains. JAMIE significantly outperforms existing state-of-the-art methods in general and prioritized multimodal features for imputation, providing potentially novel mechanistic insights at cellular resolution.
There has been an increasing interest in utilizing machine learning methods in inverse problems and imaging. Most of the work has, however, concentrated on image reconstruction problems, and the number of studies rega...
详细信息
There has been an increasing interest in utilizing machine learning methods in inverse problems and imaging. Most of the work has, however, concentrated on image reconstruction problems, and the number of studies regarding the full solution of the inverse problem is limited. In this work, we study a machine learning--based approach for the Bayesian inverse problem of photoacoustic tomography. We develop an approach for estimating the posterior distribution in photoacoustic tomography using an approach based on the variational autoencoder. The approach is evaluated with numerical simulations and compared to the solution of the inverse problem using a Bayesian approach.
Neuroimaging-derived brain age has been identified as a promising biomarker for accelerated brain age;however, the ageing process is highly heterogeneous and there is a need to further study the different brain ageing...
详细信息
ISBN:
(纸本)9783031745607;9783031745614
Neuroimaging-derived brain age has been identified as a promising biomarker for accelerated brain age;however, the ageing process is highly heterogeneous and there is a need to further study the different brain ageing trajectories. In this study, we implemented a variational autoencoder (VAE) based model coupled with regression to identify different age-related patterns. Additionally, we correlated the patterns obtained, using a linear regression approach, with dementia-related risk factors. The model was evaluated in different cohorts, UK Biobank and ALFA+, to assess the robustness of the approach. The results showed a feasible strategy for detecting and validating brain age-related trajectories to identify possible early deviations using morphological brain data.
We propose a hybrid method for generating arbitrage-free implied volatility (IV) surfaces consistent with historical data by combining model-free variational autoencoders (VAEs) with continuous time stochastic differe...
详细信息
We propose a hybrid method for generating arbitrage-free implied volatility (IV) surfaces consistent with historical data by combining model-free variational autoencoders (VAEs) with continuous time stochastic differential equation (SDE) driven models. We focus on two classes of SDE models: regime switching models and Le'\vy additive processes. By projecting historical surfaces onto the space of SDE model parameters, we obtain a distribution on the parameter subspace faithful to the data on which we then train a VAE. Arbitrage-free IV surfaces are then generated by sampling from the posterior distribution on the latent space, decoding to obtain SDE model parameters, and finally mapping those parameters to IV surfaces. We further refine the VAE model by including conditional features and demonstrate its superior generative out-of-sample performance. Finally, we showcase how our method can be used as a data augmentation tool to help practitioners manage the tail risk of option portfolios.
In this paper, we present Period Singer, a novel end-to-end singing voice synthesis (SVS) model that utilizes variational inference for periodic and aperiodic components, aimed at producing natural-sounding waveforms....
详细信息
In this paper, we present Period Singer, a novel end-to-end singing voice synthesis (SVS) model that utilizes variational inference for periodic and aperiodic components, aimed at producing natural-sounding waveforms. Recent end-to-end SVS models have demonstrated the capability of synthesizing high-fidelity singing voices. However, owing to deterministic pitch conditioning, they do not fully address the one-to-many problem. To address this problem, we present the Period Singer architecture, which integrates variational autoencoders for the periodic and aperiodic components. Additionally, our methodology eliminates the dependency on an external aligner by estimating the phoneme alignment through a monotonic alignment search within note boundaries. Our empirical evaluations show that Period Singer outperforms existing end-to-end SVS models on Mandarin and Korean datasets. The efficacy of the proposed method was further corroborated by ablation studies.
We present the new bidirectional variational autoencoder (BVAE) network architecture. The BVAE uses a single neural network both to encode and decode instead of an encoder-decoder network pair. The network encodes in ...
详细信息
ISBN:
(纸本)9798350359329;9798350359312
We present the new bidirectional variational autoencoder (BVAE) network architecture. The BVAE uses a single neural network both to encode and decode instead of an encoder-decoder network pair. The network encodes in the forward direction and decodes in the backward direction through the same synaptic web. Simulations compared BVAEs and ordinary VAEs on the four image tasks of image reconstruction, classification, interpolation, and generation. The image datasets included MNIST handwritten digits, Fashion-MNIST, CIFAR-10, and CelebA-64 face images. The bidirectional structure of BVAEs cut the parameter count by almost 50% and still slightly outperformed the unidirectional VAEs.
暂无评论