Over the past decade, deep learning has achieved unprecedented successes in a diversity of application domains, given large-scale datasets. However, particular domains, such as healthcare, inherently suffer from data ...
详细信息
Over the past decade, deep learning has achieved unprecedented successes in a diversity of application domains, given large-scale datasets. However, particular domains, such as healthcare, inherently suffer from data paucity and imbalance. Moreover, datasets could be largely inaccessible due to privacy concerns, or lack of data-sharing incentives. Such challenges have attached significance to the application of generative modeling and data augmentation in that domain. In this context, this study explores a machine learning-based approach for generating synthetic eye-tracking data. We explore a novel application of variational autoencoders (VAEs) in this regard. More specifically, a VAE model is trained to generate an image-based representation of the eye-tracking output, so-called scanpaths. Overall, our results validate that the VAE model could generate a plausible output from a limited dataset. Finally, it is empirically demonstrated that such approach could be employed as a mechanism for data augmentation to improve the performance in classification tasks.
Whispering is the natural choice of communication when one wants to interact quietly and privately. Due to vast differences in acoustic characteristics of whisper and natural speech, there is drastic degradation in th...
详细信息
ISBN:
(纸本)9781713836902
Whispering is the natural choice of communication when one wants to interact quietly and privately. Due to vast differences in acoustic characteristics of whisper and natural speech, there is drastic degradation in the performance of whisper speech when decoded by the Automatic Speech Recognition (ASR) system trained on neutral speech. Recently, to handle this mismatched train and test scenario Denoising autoencoders (DA) are used which gives some improvement. To improve over DA performance we propose another method to map speech from whisper domain to neutral speech domain via Joint variational Auto-Encoder (JVAE). The proposed method requires time-aligned parallel data which is not available, so we developed an algorithm to convert parallel data to time-aligned parallel data. JVAE jointly learns the characteristics of whisper and neutral speech in a common latent space which significantly improves whisper recognition accuracy and outperforms traditional autoencoder based techniques. We benchmarked our method against two baselines, first being ASR trained on neutral speech and tested on whisper dataset and second being whisper test set mapped using DA and tested on same neutral ASR. We achieved an absolute improvement of 22.31% in Word Error Rate (WER) over the first baseline and an absolute 5.52% improvement over DA.
Based upon the fact that multispectral image compression needs to remove both spatial and spectral redundancy, recent learnt models via end-to-end manners have shown promising performance. However, most of them ignore...
详细信息
Based upon the fact that multispectral image compression needs to remove both spatial and spectral redundancy, recent learnt models via end-to-end manners have shown promising performance. However, most of them ignore the characteristics of multispectral image, i.e., the non-stationarity of spectral correlation and the scale-diversity of spatial features. Meanwhile, they directly utilize fully factorized entropy model, rendering compression performance suboptimal. This paper proposes a Multi-Scale SpatialSpectral Attention Network (MSSSA-Net) based on variational autoencoder (VAE). Our MSSSA-Net (1) incorporates a simple neuroscience-based non-local attention module into attention mechanism to capture the tiny features in adjacent pixels and large-scale features in spatial domain simultaneously, (2) proposes a multi-scale spectral attention block to extract non-stationary correlation of adjacent spectra at different scales. We demonstrate that our MSSSA-Net offers the state-of-the-art performance in comparison with classical algorithms, including JPEG20 0 0 and 3D-SPIHT, and recent learnt image compression models, on 7-band and 8-band datasets from Landsat-8 and WorldView-3 satellites, when measured by PSNR, MSSSIM and Mean Spectral Angle. Extensive ablation experiments have verified the effectiveness of each component, and have demonstrated that, for multispectral image compression, Scale-only Hyperprior can make a better trade-off between compression performance and complexity compared with Mean & Scale Hyperprior and Joint Autoregressive model.
In recent years, semi-supervised learning has been investigated to take full advantages of increasing unlabeled data. Although pretrained deep learning models are successfully adopted on a massive amount of unlabeled ...
详细信息
In recent years, semi-supervised learning has been investigated to take full advantages of increasing unlabeled data. Although pretrained deep learning models are successfully adopted on a massive amount of unlabeled data, they may not be applicable in specific domains as the data is limited. In this paper, we propose a model, termed Semi-supervised variational autoencoder (SVAE), which consists of Gated Convolutional Neural Networks (GCNN) as both the encoder and the decoder. Since the canonical VAE suffers from Kullback-Leibler (KL) vanishing problem, we attach a layer named Scalar after Batch Normalization (BN) to scale the output of the BN. We conduct experiments on two domain-specific datasets with a small amount of data. The results show that SVAE outperforms other alternative baselines for language modeling and semi-supervised learning studies. Especially, the results in the language modeling validate the effect of combining BN and Scalar for tackling the KL vanishing problem. Moreover, the visualization of the latent representations verifies the performance of SVAE on less data.
Recent research has shown that pre-trained context-independent word embeddings display biases such as racial bias, gender bias, etc. Using a novel, tunable algorithm, this study attempts to mitigate the hidden gender ...
详细信息
ISBN:
(纸本)9781450397629
Recent research has shown that pre-trained context-independent word embeddings display biases such as racial bias, gender bias, etc. Using a novel, tunable algorithm, this study attempts to mitigate the hidden gender bias in static embeddings. In order to train the model, an enhanced variational autoencoder (E-VAN) is used to learn the latent space of the embedding. Then the latent distributions are used while adaptively resampling and re-weighting the rare/under-represented data. While the word embeddings retain semantic information, E-VAN effectively mitigates unwanted biased gendered associations. Our method E-VAN outperforms previous state-of-the-art methods in both quantitative and human evaluation.
Text generation is one of the essential yet challenging tasks in natural language processing. However, the input text alone is usually hard to provide enough information to generate the desired output. Previous work a...
详细信息
ISBN:
(纸本)9780738133669
Text generation is one of the essential yet challenging tasks in natural language processing. However, the input text alone is usually hard to provide enough information to generate the desired output. Previous work attempts to incorporate syntactic information into the generative models based on variational autoencoder(VAE). But these methods have difficulty in adequately modeling the tree structure of syntactic data. In this paper, we formulate the syntactic structure as a graph and introduce a syntax encoder based on graph neural network(GNN) to model the syntactic information of sentences. Based on the syntax encoder, we propose a novel syntax-enhanced variational autoencoder(SEVAE) with two variants. The variant SEVAEm merges sentence information and syntactic information into one latent space to enrich the fine-grained syntactic information of latent representations. And the variant SEVAE-s with two separate latent spaces allows the sentence decoder to dynamically attend to semantic and syntactic information from two latent variables. Experiments on two benchmark datasets show that our methods achieve significant and consistent improvements compared with previous work.
In this paper, the authors present an Artificial Intelligence (AI) based variational autoencoder (VAE) technique for detecting rotor faults in a large hydrogenerator. The proposed technique is applied to assess health...
详细信息
The proportion of buildings occupying underground space has increased with three-dimensional urban development. Thermal comfort is crucial to the design of underground spaces and plays an important role in the optimiz...
详细信息
The proportion of buildings occupying underground space has increased with three-dimensional urban development. Thermal comfort is crucial to the design of underground spaces and plays an important role in the optimization of building environment controls. Owing to limitations in recording various practical environmental parameters, it is difficult to access large data and further to establish an accurate forecasting model for the thermal comfort of an underground space. This paper addresses the problem from the perspective of data enhancement. A model for generating underground space data based on a variational autoencoder is proposed. The model maps data of the thermal comfort of an underground space to a highly compressed latent layer space and generates data in an unsupervised manner. The forecasting models were trained using the generated data, resulting in accuracy improvements of 41.34%-45.31%. Hence, the proposed generative model can learn effective real data features. The results also demonstrate that the adjustment of ventilation is more effective than the adjustment of the temperature and relative humidity in improving the thermal comfort of an underground space. The findings of this research will provide better thermal comfort evaluation for the operational management of building environment in underground spaces.
The paper proposes an approach for matching of digitized copies of business documents. This task arises when comparing two versions of the same document - genuine and forgery - to find possible modifications, for exam...
详细信息
ISBN:
(纸本)9781510640412
The paper proposes an approach for matching of digitized copies of business documents. This task arises when comparing two versions of the same document - genuine and forgery - to find possible modifications, for example in the banking sector during the conclusion of contracts in paper form to avoid possible fraud. The matching method of two documents based on comparison images of text lines using variational autoencoder (VAE) trained on genuine images and calculation Fisher information metric to find modifications. Experiments were conducted on the public Payslips dataset (in French). The results show the high quality and reliability of finding document forgeries and are compared to the results of the method which applies OCR and image matching.
Applying data-driven methods such as deep learning in material mechanics is challenging because producing a sufficiently large, labeled dataset is costly resource-wise. This paper outlines a new approach to overcoming...
详细信息
Applying data-driven methods such as deep learning in material mechanics is challenging because producing a sufficiently large, labeled dataset is costly resource-wise. This paper outlines a new approach to overcoming this difficulty by transferring knowledge from a source domain of finite-element-analysis data to a target domain of real-world test-specimen images so that a model capable of accurate and robust predictions in both domains may be constructed. To achieve this transfer of knowledge, discrepancy-based unsupervised domain adaptation is adopted into a convolutional variational autoencoder structure. To evaluate the proposed approach, a four-point bending experiment was conducted on 6061 aluminum alloy and 316 stainless steel to produce 550 unlabeled target-domain data images. The same bending situation was analyzed using the finite-element method implemented in the commercial software package ABAQUS to produce 6000 labeled, source-domain data images. The proposed domain-adaptive convolutional variational autoencoder was trained using the maximum mean discrepancy method on the target-and the source-domain data. The predictions using the domain-adapted convolutional variational autoencoder were relatively more accurate than those using the model trained only on the source domain. It is expected that the proposed approach can address the scarcity of labeled data in various applications of material mechanics and provide a base technology for the development of various data-driven approaches.(C) 2022 Elsevier B.V. All rights reserved.
暂无评论