This work presents a supervised subtype differentiation learning of lung cancer features in a latent space constructed with a variational autoencoder. In such space, complicated patterns are quantified by estimating a...
详细信息
ISBN:
(数字)9781510650534
ISBN:
(纸本)9781510650534;9781510650527
This work presents a supervised subtype differentiation learning of lung cancer features in a latent space constructed with a variational autoencoder. In such space, complicated patterns are quantified by estimating a differentiation grade of typical encoded features of lung cancer subtypes. Specifically, selected tissue samples of non-small cell lung cancer are mapped to a latent space and a logistic regression model assigns differentiation cancer subtype grade to the embedded tissue samples. The latent representation captures the invariant features of the most representative tissue samples for both well-differentiated adenocarcinoma and squamous cell, and confusing cases of poorly differentiated complex mixtures of tissue patterns subtypes. This approach builds up a subtype differentiation grade of non-small cell lung cancer among complex structures which are fully interpretable and integrable with a pathology workflow. Typical tissue samples of well-differentiated lung cancer subtypes are grouped close in the latent space with high confidence of the differentiation grade, while poorly differentiated tissue samples, with lower confidence of the differentiation grade, are located at other latent space regions. A variational autoencoder (VAE) was trained to learn the latent space representation with training data of representative tissue samples picked from well-differentiated adenocarcinoma (five cases) and squamous cell (five cases) lung cancer subtypes. Validation was performed by selecting six cases for training and evaluating the location in the latent space of tissue samples from four different cases. Two different metrics, MAE and RMSE, estimated the location of these patches with respect to the patches belonging to the six cases. The best model, under a cross validation, achieves an average performance of MAE = (0.072 +/- 0.0004) and RMSE = (0.2654 +/- 0.0019). In addition, for ten different cases (five adenocarcinoma and five of squamous cell), performance
Purpose Prior studies on the application of deep-learning techniques have focused on enhancing computation algorithms. However, the amount of data is also a key element when attempting to achieve a goal using a quanti...
详细信息
Purpose Prior studies on the application of deep-learning techniques have focused on enhancing computation algorithms. However, the amount of data is also a key element when attempting to achieve a goal using a quantitative approach, which is often underestimated in practice. The problem of sparse sales data is well known in the valuation of commercial properties. This study aims to expand the limited data available to exploit the capability inherent in deep learning techniques. Design/methodology/approach The deep learning approach is used. Seoul, the capital of South Korea is selected as a case study area. Second, data augmentation is performed for properties with low trade volume in the market using a variational autoencoder (VAE), which is a generative deep learning technique. Third, the generated samples are added into the original dataset of commercial properties to alleviate data insufficiency. Finally, the accuracy of the price estimation is analyzed for the original and augmented datasets to assess the model performance. Findings The results using the sales datasets of commercial properties in Seoul, South Korea as a case study show that the augmented dataset by a VAE consistently shows higher accuracy of price estimation for all 30 trials, and the capabilities inherent in deep learning techniques can be fully exploited, promoting the rapid adoption of artificial intelligence skills in the real estate industry. Originality/value Although deep learning-based algorithms are gaining popularity, they are likely to show limited performance when data are insufficient. This study suggests an alternative approach to overcome the lack of data problem in property valuation.
It is increasingly considered that human speech perception and production both rely on articulatory representations. In this paper, we investigate whether this type of representation could improve the performances of ...
详细信息
ISBN:
(纸本)9781713836902
It is increasingly considered that human speech perception and production both rely on articulatory representations. In this paper, we investigate whether this type of representation could improve the performances of a deep generative model (here a variational autoencoder) trained to encode and decode acoustic speech features. First we develop an articulatory model able to associate articulatory parameters describing the jaw, tongue, lips and velum configurations with vocal tract shapes and spectral features. Then we incorporate these articulatory parameters into a variational autoencoder applied on spectral features by using a regularization technique that constrains part of the latent space to represent articulatory trajectories. We show that this articulatory constraint improves model training by decreasing time to convergence and reconstruction loss at convergence, and yields better performance in a speech denoising task.
Recently, variational autoencoders have been successfully used to learn a probabilistic prior over speech signals, which is then used to perform speech enhancement. However, variational autoencoders are trained on cle...
详细信息
ISBN:
(纸本)9781728176055
Recently, variational autoencoders have been successfully used to learn a probabilistic prior over speech signals, which is then used to perform speech enhancement. However, variational autoencoders are trained on clean speech only, which results in a limited ability of extracting the speech signal from noisy speech compared to supervised approaches. In this paper, we propose to guide the variational autoencoder with a supervised classifier separately trained on noisy speech. The estimated label is a high-level categorical variable describing the speech signal (e.g. speech activity) allowing for a more informed latent distribution compared to the standard variational autoencoder. We evaluate our method with different types of labels on real recordings of different noisy environments. Provided that the label better informs the latent distribution and that the classifier achieves good performance, the proposed approach outperforms the standard variational autoencoder and a conventional neural network-based supervised approach.
We explore an approach to behavioral cloning in video games. We are motivated to pursue a learning architecture that is data efficient and provides opportunity for interpreting player strategies and replicating player...
详细信息
ISBN:
(纸本)9781728183923
We explore an approach to behavioral cloning in video games. We are motivated to pursue a learning architecture that is data efficient and provides opportunity for interpreting player strategies and replicating player actions in unseen situations. To this end, we have developed a generative model that learns latent features of a game that can be used for training an action predictor. Specifically, our architecture combines a variational autoencoder with a discriminator mapping the latent space to action predictions (predictor). We compare our model performance to two different behavior cloning architectures: a discriminative model (a Convolutional Neural Network) mapping game states directly to actions, and a variational autoencoder with a predictor trained separately. Finally, we demonstrate how we can use the advantage of generative modeling to sample new states from the latent space of the variational autoencoder to analyze player actions and provide meaning to certain latent features.
Sensor data from wearable devices have been utilized to analyze differences between experts and novices. Previous studies attempted to classify the expert-novice level from sensor data based on supervised learning met...
详细信息
ISBN:
(纸本)9781728176055
Sensor data from wearable devices have been utilized to analyze differences between experts and novices. Previous studies attempted to classify the expert-novice level from sensor data based on supervised learning methods. However, these approaches need to collect enough training data covering various novices' sensor patterns. In this paper, we propose a semi-supervised anomaly detection approach that requires only sensor data of experts for training and identifies those of novices as anomalies. Our proposed anomaly detection model named conditional multimodal variational autoencoder (CMVAE) has the following two technical contributions: (i) considering action information of persons and (ii) utilizing multimodal sensor data, i.e., eye tracking data and motion data in this case. The proposed method is evaluated on sensor data measured when expert and novice soccer players were shooting, dribbling, and doing soccer ball juggling. Experimental results show that CMVAE can more accurately classify the expert-novice level than previous supervised learning methods and anomaly detection methods using other VAEs.
Deep probabilistic generative models have achieved incredible success in many fields of application. Among such models, variational autoencoders (VAEs) have proved their ability in modeling a generative process by lea...
详细信息
ISBN:
(纸本)9781728176055
Deep probabilistic generative models have achieved incredible success in many fields of application. Among such models, variational autoencoders (VAEs) have proved their ability in modeling a generative process by learning a latent representation of the input. In this paper, we propose a novel VAE defined in the quaternion domain, which exploits the properties of quaternion algebra to improve performance while significantly reducing the number of parameters required by the network. The success of the proposed quaternion VAE with respect to traditional VAEs relies on the ability to leverage the internal relations between quaternion-valued input features and on the properties of second-order statistics which allow to define the latent variables in the augmented quaternion domain. In order to show the advantages due to such properties, we define a plain convolutional VAE in the quaternion domain and we evaluate its performance with respect to its real-valued counterpart on the CelebA face dataset.
Over the past decade, deep learning has achieved unprecedented successes in a diversity of application domains, given large-scale datasets. However, particular domains, such as healthcare, inherently suffer from data ...
详细信息
Over the past decade, deep learning has achieved unprecedented successes in a diversity of application domains, given large-scale datasets. However, particular domains, such as healthcare, inherently suffer from data paucity and imbalance. Moreover, datasets could be largely inaccessible due to privacy concerns, or lack of data-sharing incentives. Such challenges have attached significance to the application of generative modeling and data augmentation in that domain. In this context, this study explores a machine learning-based approach for generating synthetic eye-tracking data. We explore a novel application of variational autoencoders (VAEs) in this regard. More specifically, a VAE model is trained to generate an image-based representation of the eye-tracking output, so-called scanpaths. Overall, our results validate that the VAE model could generate a plausible output from a limited dataset. Finally, it is empirically demonstrated that such approach could be employed as a mechanism for data augmentation to improve the performance in classification tasks.
Text generation is one of the essential yet challenging tasks in natural language processing. However, the input text alone is usually hard to provide enough information to generate the desired output. Previous work a...
详细信息
ISBN:
(纸本)9780738133669
Text generation is one of the essential yet challenging tasks in natural language processing. However, the input text alone is usually hard to provide enough information to generate the desired output. Previous work attempts to incorporate syntactic information into the generative models based on variational autoencoder(VAE). But these methods have difficulty in adequately modeling the tree structure of syntactic data. In this paper, we formulate the syntactic structure as a graph and introduce a syntax encoder based on graph neural network(GNN) to model the syntactic information of sentences. Based on the syntax encoder, we propose a novel syntax-enhanced variational autoencoder(SEVAE) with two variants. The variant SEVAEm merges sentence information and syntactic information into one latent space to enrich the fine-grained syntactic information of latent representations. And the variant SEVAE-s with two separate latent spaces allows the sentence decoder to dynamically attend to semantic and syntactic information from two latent variables. Experiments on two benchmark datasets show that our methods achieve significant and consistent improvements compared with previous work.
Whispering is the natural choice of communication when one wants to interact quietly and privately. Due to vast differences in acoustic characteristics of whisper and natural speech, there is drastic degradation in th...
详细信息
ISBN:
(纸本)9781713836902
Whispering is the natural choice of communication when one wants to interact quietly and privately. Due to vast differences in acoustic characteristics of whisper and natural speech, there is drastic degradation in the performance of whisper speech when decoded by the Automatic Speech Recognition (ASR) system trained on neutral speech. Recently, to handle this mismatched train and test scenario Denoising autoencoders (DA) are used which gives some improvement. To improve over DA performance we propose another method to map speech from whisper domain to neutral speech domain via Joint variational Auto-Encoder (JVAE). The proposed method requires time-aligned parallel data which is not available, so we developed an algorithm to convert parallel data to time-aligned parallel data. JVAE jointly learns the characteristics of whisper and neutral speech in a common latent space which significantly improves whisper recognition accuracy and outperforms traditional autoencoder based techniques. We benchmarked our method against two baselines, first being ASR trained on neutral speech and tested on whisper dataset and second being whisper test set mapped using DA and tested on same neutral ASR. We achieved an absolute improvement of 22.31% in Word Error Rate (WER) over the first baseline and an absolute 5.52% improvement over DA.
暂无评论