Lung cancer causes over one million deaths each year worldwide. DNA methylation is a well-defined epigenetics factor in genome data analyses for model training. In this article, we explore the applications of unsuperv...
详细信息
ISBN:
(纸本)9781538654880
Lung cancer causes over one million deaths each year worldwide. DNA methylation is a well-defined epigenetics factor in genome data analyses for model training. In this article, we explore the applications of unsupervised deep learning method, variational autoencoders, using DNA methylation data of lung cancer samples downloaded from the GDC TCGA project and perform further work with latent features. We show the logistic regression classifier on the encoded latent features accurately classifies cancer subtypes.
At the heart of a deep neural network is representation learning with complex latent variables. This representation learning has been improved by disentangled representations and the idea of regularization terms. Howe...
详细信息
ISBN:
(纸本)9784907764739
At the heart of a deep neural network is representation learning with complex latent variables. This representation learning has been improved by disentangled representations and the idea of regularization terms. However, adversarial samples show that tasks with DNNs can easily fail due to slight perturbations or transformations of the input. variational autoencoder (VAE) learns P(zjx), the distribution of the latent variable z, rather than P(yjx), the distribution of the output y for the input x. Therefore, VAE is considered to be a good model for learning representations from input data. In other words, the mapping of x is not directly to y, but to the latent variable z. In this paper, we propose an evaluation method to characterize the latent variables that VAE learns. Specifically, latent variables extracted from VAEs trained by two well-known data sets are analyzed by the k-nearest neighbor method(kNN). In doing so, we propose an interpretation of what kind of representation the VAE learns, and share clues about the hyperdimensional space to which the latent variables are mapped.
Albeit of crucial interest for financial researchers, market-implied volatility data of European swaptions often exhibit large portions of missing quotes due to illiquidity of the underlying swaption instruments. In t...
详细信息
The Cancer Genome Atlas (TCGA) has profiled over 10,000 tumors across 33 different cancer-types for many genomic features, including gene expression levels. Gene expression measurements capture substantial information...
详细信息
ISBN:
(纸本)9789813235533;9789813235526
The Cancer Genome Atlas (TCGA) has profiled over 10,000 tumors across 33 different cancer-types for many genomic features, including gene expression levels. Gene expression measurements capture substantial information about the state of each tumor. Certain classes of deep neural network models are capable of learning a meaningful latent space. Such a latent space could be used to explore and generate hypothetical gene expression profiles under various types of molecular and genetic perturbation. For example, one might wish to use such a model to predict a tumor's response to specific therapies or to characterize complex gene expression activations existing in differential proportions in different tumors. variational autoencoders (VAEs) are a deep neural network approach capable of generating meaningful latent spaces for image and text data. In this work, we sought to determine the extent to which a VAE can be trained to model cancer gene expression, and whether or not such a VAE would capture biologically-relevant features. In the following report, we introduce a VAE trained on TCGA pan-cancer RNA-seq data, identify specific patterns in the VAE encoded features, and discuss potential merits of the approach. We name our method "Tybalt" after an instigative, cat-like character who sets a cascading chain of events in motion in Shakespeare's "Romeo and Juliet". From a systems biology perspective, Tybalt could one day aid in cancer stratification or predict specific activated expression patterns that would result from genetic changes or treatment effects.
Deep generative models for graphs are promising for being able to sidestep expensive search procedures in the huge space of chemical compounds. However, incorporating complex and non-differentiable property metrics in...
详细信息
ISBN:
(数字)9783030368029
ISBN:
(纸本)9783030368029;9783030368012
Deep generative models for graphs are promising for being able to sidestep expensive search procedures in the huge space of chemical compounds. However, incorporating complex and non-differentiable property metrics into a generative model remains a challenge. In this work, we formulate a differentiable objective to regularize a variational autoencoder model that we design for graphs. Experiments demonstrate that the regularization performs excellently when used for generating molecules since it can not only improve the performance of objectives optimization task but also generate molecules with high quality in terms of validity and novelty.
Recently, the standard variational autoencoder has been successfully used to learn a probabilistic prior over speech signals, which is then used to perform speech enhancement. variational autoencoders have then been c...
详细信息
ISBN:
(纸本)9781665448703
Recently, the standard variational autoencoder has been successfully used to learn a probabilistic prior over speech signals, which is then used to perform speech enhancement. variational autoencoders have then been conditioned on a label describing a high-level speech attribute (e.g. speech activity) that allows for a more explicit control of speech generation. However, the label is not guaranteed to be disentangled from the other latent variables, which results in limited performance improvements compared to the standard variational autoencoder. In this work, we propose to use an adversarial training scheme for variational autoencoders to disentangle the label from the other latent variables. At training, we use a discriminator that competes with the encoder of the variational autoencoder. Simultaneously, we also use an additional encoder that estimates the label for the decoder of the variational autoencoder, which proves to be crucial to learn disentanglement. We show the benefit of the proposed disentanglement learning when a voice activity label, estimated from visual data, is used for speech enhancement.
Since the beginning of Neural Networks, different mechanisms have been required to provide a sufficient number of examples to avoid overfitting. Data augmentation, the most common one, is focused on the generation of ...
详细信息
ISBN:
(纸本)9789897583063
Since the beginning of Neural Networks, different mechanisms have been required to provide a sufficient number of examples to avoid overfitting. Data augmentation, the most common one, is focused on the generation of new instances performing different distortions in the real samples. Usually, these transformations are problem-dependent, and they result in a synthetic set of, likely, unseen examples. In this work, we have studied a generative model, based on the paradigm of encoder-decoder, that works directly in the data space, that is, with images. This model encodes the input in a latent space where different transformations will be applied. After completing this, we can reconstruct the latent vectors to get new samples. We have analysed various procedures according to the distortions that we could carry out, as well as the effectiveness of this process to improve the accuracy of different classification systems. To do this, we could use both the latent space and the original space after reconstructing the altered version of these vectors. Our results have shown that using this pipeline (encoding-altering-decoding) helps the generalisation of the classifiers that have been selected.
variational autoencoders (VAEs) are known to easily suffer from the KL-vanishing problem when combining with powerful autoregressive models like recurrent neural networks (RNNs), which prohibits their wide application...
详细信息
ISBN:
(纸本)9783319700878;9783319700861
variational autoencoders (VAEs) are known to easily suffer from the KL-vanishing problem when combining with powerful autoregressive models like recurrent neural networks (RNNs), which prohibits their wide application in natural language processing. In this paper, we tackle this problem by tearing the training procedure into two steps: learning effective mechanisms to encode and decode discrete tokens (wake step) and generalizing meaningful latent variables by reconstructing dreamed encodings (sleep step). The training pattern is similar to the wake-sleep algorithm: these two steps are trained alternatively until an equilibrium is achieved. We test our model in a language modeling task. The results demonstrate significant improvement over the current state-of-the-art latent variable models.
In the past, evolutionary algorithms (EAs) that use probabilistic modeling of the best solutions incorporated latent or hidden variables to the models as a more accurate way to represent the search distributions. Rece...
详细信息
ISBN:
(纸本)9781450356183
In the past, evolutionary algorithms (EAs) that use probabilistic modeling of the best solutions incorporated latent or hidden variables to the models as a more accurate way to represent the search distributions. Recently, a number of neural-network models that compute approximations of posterior (latent variable) distributions have been introduced. In this paper, we investigate the use of the variational autoencoder (VAE), a class of neural-network based generative model, for modeling and sampling search distributions as part of an estimation of distribution algorithm. We show that VAE can capture dependencies between decision variables and objectives. This feature is proven to improve the sampling capacity of model based EAs. Furthermore, we extend the original VAE model by adding a new, fitness-approximating network component. We show that it is possible to adapt the architecture of these models and we present evidence of how to extend VAEs to better fulfill the requirements of probabilistic modeling in EAs. While our results are not yet competitive with state of the art probabilistic-based optimizers, they represent a promising direction for the application of generative models within EDAs.
Lung adenocarcinoma is a type of non-small cell lung cancer that accounts for about 40% of all lung cancers, which is divided into different molecular and histological subtypes associated with particular prognosis and...
详细信息
ISBN:
(纸本)9781665473583
Lung adenocarcinoma is a type of non-small cell lung cancer that accounts for about 40% of all lung cancers, which is divided into different molecular and histological subtypes associated with particular prognosis and treatment. Pathologists stratify for diagnosis mainly by its histo-morphological visual features and patterns, which tends to be challenging because of the nature of lung tissue, a mixture of histologically complex patterns and not having a specialized grading system. Here, an unsupervised computational approach based on an ensemble of tissue-specialized variational auto-encoders, which were trained per histopathology subtype, to build an unsupervised embedded tissue-image representation. This representation was used to train a Random Forest classifier of three lung adenocarcinoma histology subtypes (lepidic, papillary and solid), and a 2D-visually interpretable projection from the learned embedded representation. Experimental results achieve an average F-score of 0.72 +/- 0.05 in the test dataset and a well-separated 2D visual mapping of tissue subtypes.
暂无评论