variational autoencoders (VAEs) and Generative Adversarial Networks (GANs) have been widely used in hyperspectral image classification (HSIC) tasks. However, the generated HSI virtual samples by VAEs are often ambiguo...
详细信息
variational autoencoders (VAEs) and Generative Adversarial Networks (GANs) have been widely used in hyperspectral image classification (HSIC) tasks. However, the generated HSI virtual samples by VAEs are often ambiguous, and GANs are prone to the mode collapse, which lead the poor generalization abilities ultimately. Moreover, most of these models only consider the extraction of spectral or spatial features. They fail to combine the two branches interactively and ignore the correlation between them. Consequently, the variational generative adversarial network with crossed spatial and spectral interactions (CSSVGAN) was proposed in this paper, which includes a dual-branch variational Encoder to map spectral and spatial information to different latent spaces, a crossed interactive Generator to improve the quality of generated virtual samples, and a Discriminator stuck with a classifier to enhance the classification performance. Combining these three subnetworks, the proposed CSSVGAN achieves excellent classification by ensuring the diversity and interacting spectral and spatial features in a crossed manner. The superior experimental results on three datasets verify the effectiveness of this method.
In spatial blind source separation the observed multivariate random fields are assumed to be mixtures of latent spatially dependent random fields. The objective is to recover latent random fields by estimating the unm...
详细信息
In spatial blind source separation the observed multivariate random fields are assumed to be mixtures of latent spatially dependent random fields. The objective is to recover latent random fields by estimating the unmixing transformation. Currently, the algorithms for spatial blind source separation can only estimate linear unmixing transformations. Nonlinear blind source separation methods for spatial data are scarce. In this paper, we extend an identifiable variational autoencoder that can estimate nonlinear unmixing transformations to spatially dependent data, and demonstrate its performance for both stationary and nonstationary spatial data using simulations. In addition, we introduce scaled mean absolute Shapley additive explanations for interpreting the latent components through nonlinear mixing transformation. The spatial identifiable variational autoencoder is applied to a geochemical dataset to find the latent random fields, which are then interpreted by using the scaled mean absolute Shapley additive explanations. Finally, we illustrate how the proposed method can be used as a pre-processing method when making multivariate predictions.
The labyrinth of the inner ear is an important auditory and balanced sensory organ and is closely related to tinnitus, hearing loss, vertigo, and Meniere diseases. Quantitative description and measurement of the labyr...
详细信息
The labyrinth of the inner ear is an important auditory and balanced sensory organ and is closely related to tinnitus, hearing loss, vertigo, and Meniere diseases. Quantitative description and measurement of the labyrinth is a challenging task in both clinical practice and medical research. A data-driven-based labyrinth morphological modeling method is proposed for extracting simple and low-dimensional representations or feature vectors to quantify the normal and abnormal labyrinths in morphology. Firstly, a two-stage pose alignment strategy is introduced to align the segmented inner ear labyrinths. Then, an energy-adaptive spatial and inter-slice dimensionality reduction strategy is adopted to extract compact morphological features via a variational autoencoder (VAE). Finally, a statistical model of the compact feature in the latent space is established to represent the morphology distribution of the labyrinths. As one of an application of our model, a reference-free quality evaluation for the segmentation of the labyrinth is explored. The experimental results show that the consistency between the proposed method and the Dice similarity coefficient (DSC) reaches 0.78. Further analysis showed that the model also has a high potential to apply to morphological analysis, such as anomaly detection, of the labyrinths.
Technologies in healthcare, smart homes, security, ecology, and entertainment all deploy audio event detection (AED) in order to detect sound events in an audio recording. Effective AED techniques rely heavily on supe...
详细信息
Technologies in healthcare, smart homes, security, ecology, and entertainment all deploy audio event detection (AED) in order to detect sound events in an audio recording. Effective AED techniques rely heavily on supervised or semi-supervised models to capture the wide range of dynamics spanned by sound events in order to achieve temporally precise boundaries and accurate event classification. These methods require extensive collections of labeled or weakly labeled in-domain data, which is costly and labor-intensive. Importantly, these approaches do not fully leverage the inherent variability and range of dynamics across sound events, aspects that can be effectively identified through unsupervised methods. The present work proposes an approach based on multi-rate autoencoders that are pretrained in an unsupervised way to leverage unlabeled audio data and ultimately learn the rich temporal dynamics inherent in natural sound events. This approach utilizes parallel autoencoders that achieve decompositions of the modulation spectrum along different bands. In addition, we introduce a rate-selective temporal contrastive loss to align the training objective with event detection metrics. Optimizing the configuration of multi-rate encoders and the temporal contrastive loss leads to notable improvements in domestic sound event detection in the context of the DCASE challenge.
This paper introduces Wasserstein Adversarially Regularized Graph autoencoder (WARGA), an implicit generative algorithm that directly regularizes the latent distribution of node embedding to a target distribution via ...
详细信息
This paper introduces Wasserstein Adversarially Regularized Graph autoencoder (WARGA), an implicit generative algorithm that directly regularizes the latent distribution of node embedding to a target distribution via the Wasserstein metric. To ensure the Lipschitz continuity, we propose two approaches: WARGA-WC that uses weight clipping method and WARGA-GP that uses gradient penalty method. The proposed models have been validated by link prediction and node clustering on real-world graphs with visualizations of node embeddings, in which WARGA generally outperforms other state-of-the-art models based on Kullback-Leibler (KL) divergence and typical adversarial framework. (c) 2023 Elsevier B.V. All rights reserved.
In this paper, we present an incremental learning framework for efficient and accurate facial performance tracking. Our approach is to alternate the modeling step, which takes tracked meshes and texture maps to train ...
详细信息
In this paper, we present an incremental learning framework for efficient and accurate facial performance tracking. Our approach is to alternate the modeling step, which takes tracked meshes and texture maps to train our deep learning-based statistical model, and the tracking step, which takes predictions of geometry and texture our model infers from measured images and optimize the predicted geometry by minimizing image, geometry and facial landmark errors. Our Geo-Tex VAE model extends the convolutional variational autoencoder for face tracking, and jointly learns and represents deformations and variations in geometry and texture from tracked meshes and texture maps. To accurately model variations in facial geometry and texture, we introduce the decomposition layer in the Geo-Tex VAE architecture which decomposes the facial deformation into global and local components. We train the global deformation with a fully-connected network and the local deformations with convolutional layers. Despite running this model on each frame independently - thereby enabling a high amount of parallelization - we validate that our framework achieves sub-millimeter accuracy on synthetic data and outperforms existing methods. We also qualitatively demonstrate high-fidelity, long-duration facial performance tracking on several actors.
The current image generative models have achieved a remarkably realistic image quality, offering numerous academic and industrial applications. However, to ensure these models are used for benign purposes, it is essen...
详细信息
The current image generative models have achieved a remarkably realistic image quality, offering numerous academic and industrial applications. However, to ensure these models are used for benign purposes, it is essential to develop tools that definitively detect whether an image has been synthetically generated. Consequently, several detectors with excellent performance in computer vision applications have been developed. However, these detectors cannot be directly applied as they areto multi-spectral satellite images, necessitating the training of new models. While two-class classifiers generally achieve high detection accuracies, they struggle to generalize to image domains and generative architectures different from those encountered during training. In this paper, we propose a one-class classifier based on Vector Quantized variational autoencoder 2 (VQ-VAE 2) features to overcome the limitations of two-class classifiers. We start by highlighting the generalization problem faced by binary classifiers. This was demonstrated by training and testing an EfficientNet-B4 architecture on multiple multi-spectral datasets. We then illustrate that the VQ-VAE 2-based classifier, which was trained exclusively on pristine images, could detect images from different domains and generated by architectures not encountered during training. Finally, we conducted a head-to-head comparison between the two classifiers on the same generated datasets, emphasizing the superior generalization capabilities of the VQ-VAE 2-based detector, wherewe obtained a probability of detection at a 0.05 false alarm rate of 1 for the blue and red channels when using the VQ-VAE 2-based detector, and 0.72 when we used the EfficientNet-B4 classifier.
Domain generalization aims at generalizing the network trained on multiple domains to unknown but related domains. Under the assumption that different domains share the same classes, previous works can build relations...
详细信息
Domain generalization aims at generalizing the network trained on multiple domains to unknown but related domains. Under the assumption that different domains share the same classes, previous works can build relationships across domains. However, in realistic scenarios, the change of domains is always followed by the change of categories, which raises a difficulty for collecting sufficient aligned categories across domains. Bearing this in mind, this article introduces union domain generalization (UDG) as a new domain generalization scenario, in which the label space varies across domains, and the categories in unknown domains belong to the union of all given domain categories. The absence of categories in given domains is the main obstacle to aligning different domain distributions and obtaining domain-invariant information. To address this problem, we propose category-stitch learning (CSL), which aims at jointly learning the domain-invariant information and completing missing categories in all domains through an improved variational autoencoder and generators. The domain-invariant information extraction and sample generation cross-promote each other to better generalizability. Additionally, we decouple category and domain information and propose explicitly regularizing the semantic information by the classification loss with transferred samples. Thus our method can breakthrough the category limit and generate samples of missing categories in each domain. Extensive experiments and visualizations are conducted on MNIST, VLCS, PACS, Office-Home, and DomainNet datasets to demonstrate the effectiveness of our proposed method.
This paper applies a generative deep learning model, namely a variational autoencoder, on probabilistic optimal power flows. The model utilizes Gaussian approximations in order to adequately represent the distribution...
详细信息
This paper applies a generative deep learning model, namely a variational autoencoder, on probabilistic optimal power flows. The model utilizes Gaussian approximations in order to adequately represent the distributions of the results of a system under uncertainty. These approximations are realized by applying several techniques from Bayesian deep learning, among them most notably Stochastic variational Inference. Using the reparameterization trick and batch sampling, the proposed model allows for the training a probabilistic optimal power flow similar to a possibilistic process. The results are shown by application of a reformulation of the Kullback-Leibler divergence, a distance measure of distributions. Not only is the resulting model simple in its appearance, it also shows to perform well and accurate. Furthermore, the paper also explores potential pathways for future research and gives insights for practitioners using such or similar generative models.
Face aging and rejuvenating aim to generate an individual face with aging and rejuvenating effect while retaining identity information. We can analyze a given face image to estimate a past look or predict a future loo...
详细信息
Face aging and rejuvenating aim to generate an individual face with aging and rejuvenating effect while retaining identity information. We can analyze a given face image to estimate a past look or predict a future look of the person. The research on face aging and rejuvenating has important application value in the fields of cross-age recognition,1 public security, and entertainment, for example, changing the appearance of actors at different ages in a movie or finding missing persons in forensic applications. Although this area has attracted much attention of the researchers, there are still many challenges, especially in lack of accurate and sufficient dataset, low aging effect, and bad identity preservation. Previous face aging and rejuvenating methods are split into two main categories: physical model-based methods2,3 and prototype-based methods.4-6 The physical model-based methods describe the alteration in muscle, wrinkle, skin, etc., which can get a good result but suffer from complex modeling. The prototype-based methods try to learn the transformation between different age groups. Due to the development of generative adversarial network (GAN),7 state-of-theart methods8-10 that use the technology of deep learning show impressive success in this field. Face aging and rejuvenating work effectively in public security criminal investigation, cross-age recognition, and entertainment. However, three main problems still exist: the lack of accurate and sufficient dataset, low aging effect, and poor preservation of personal information. We propose a semi-supervised face aging and rejuvenating method for face aging and rejuvenating. In particular, a conditional encoder is utilized to map an input face into a latent vector, which is used by the generator network with age conditions to produce a new face. The latent vector preserves identity information, whereas the age label controls face aging or rejuvenating. To make generated features closer to prior features, the d
暂无评论