Recently, multimodal knowledge distillation-based methods for RGB-T semantic segmentation have been developed to enhance segmentation performance and inference speeds. Technically, the crux of these models lies in the...
详细信息
Recently, multimodal knowledge distillation-based methods for RGB-T semantic segmentation have been developed to enhance segmentation performance and inference speeds. Technically, the crux of these models lies in the feature imitative distillation-based strategies, where the student models imitate the working principles of the teacher models through loss functions. Unfortunately, due to the significant gaps in the representation capability between the student and teacher models, such feature imitative distillation-based strategies may not achieve the anticipatory knowledge transfer performance in an efficient way. In this paper, we propose a novel feature generative distillation strategy for efficient RGB-T semantic segmentation, embodied in the Feature Generative Distillation-based Network (FGDNet), which includes a teacher model (FGDNet-T) and a student model (FGDNet-S). This strategy bridges the gaps between multimodal feature extraction and complementary information excavation by using conditional variational auto-encoder (CVAE) to generate teacher features from student features. Additionally, Multimodal Complementarity Separation modules (MCS-L and MCS-H) are introduced to separate complementary features at different levels. Comprehensive experimental results on four public benchmarks demonstrate that, compared with mainstream RGB-T semantic segmentation methods, our FGDNet-S achieves competitive segmentation performance with lower number of parameters and computational complexity.
We propose an automatic layout method for indoor scenes that effectively satisfies specific constraints. Our approach involves enhancing the existing scene representation method to accommodate complex constraints, inc...
详细信息
We propose an automatic layout method for indoor scenes that effectively satisfies specific constraints. Our approach involves enhancing the existing scene representation method to accommodate complex constraints, including the precise placement of doors, windows, and user-specified furniture. To achieve this, we construct a conditional vector that encapsulates the necessary constraints. Moreover, our automatically constrained layout approach is implemented by training a conditionalvariationalautoencoder model. Given the constraints and randomly sampled vectors, the decoder module can generate diversified reasonable indoor layout results. Evaluations show that our model outperforms the existing methods. Furthermore, our model exhibits a lower parameter count and faster execution speed compared with the existing approaches.
As one of the world's most prevalent mental illnesses, depression is not easy to detect since it affects different people in different ways. Recently, linguistic features extracted from transcribed texts have been...
详细信息
As one of the world's most prevalent mental illnesses, depression is not easy to detect since it affects different people in different ways. Recently, linguistic features extracted from transcribed texts have been widely explored in depression detection because they contain a variety of cues about psychological activities. However, the detection performance is limited due to the following two reasons: 1) the dialogue structure is ignored, which causes the Inconsistent Context problem;and 2) Imbalanced Regression occurs due to the long-tailed distribution of depression datasets. To this end, in this paper we investigate the relationship between the local topic and global context in interview transcripts, and bridge the gap between depression symptoms and depression severity. In particular, we propose a model called conditionalvariational Topic-enriched auto-encoder (CVTAE), which can capture the spatial features from local topics via variational inference, and the temporal features from the global context with attention mechanism. Besides, we apply the re-weighting strategies to assigning weights to the depression labels with different values. Extensive experiments on the DAIC-WOZ dataset in English and a self-constructed database NCUDID in Chinese demonstrate the effectiveness and robustness of CVTAE, while the comprehensive ablation study and case study show its interpretability.
The market for urban distributed photovoltaics (DPV) is expected to take off in the next decade. However, these systems are often subject to complex urban contexts and sub-optimal conditions, requiring scalable and co...
详细信息
The market for urban distributed photovoltaics (DPV) is expected to take off in the next decade. However, these systems are often subject to complex urban contexts and sub-optimal conditions, requiring scalable and comprehensive solutions to detect their underperformances. In recent years, deep generative models (DGMs) have exhibited outstanding performance in the anomaly detection domain, dealing with generic high-dimensional time series data. Nevertheless, the existing applications of DGMs in the photovoltaic (PV) sector are still unable to account for environmental information, limiting their performance under various environmental conditions. This study proposes the Sequential conditionalvariationalautoencoder (SCVAE), which can cope with the sequential impacts of the environment on PV power generation. Using real-world data collected from 30 rooftop PV sites located across China, a data processing pipeline is developed to construct the training datasets which contain mostly normal samples for unsupervised SCVAE model training. This work also constructs a synthetic dataset with a wide variety of artificial anomalies in reference to the domain insights and engineering practice of DPV systems. After checking and refining by experts, the synthetic dataset can finally be used to validate the anomaly detection models. The results demonstrate that the SCVAE model outperforms existing state-of-the-art unsupervised anomaly detection models and can be effectively generalized to unseen PV sites. Moreover, the latent variables of SCVAE could be used to identify the type of DPV failure, thereby enabling more targeted diagnostics of anomaly mechanisms.
Current deep supervised learning methods typically require large amounts of labeled data for training. Since there is a significant cost associated with clinical data acquisition and labeling, medical datasets used fo...
详细信息
Current deep supervised learning methods typically require large amounts of labeled data for training. Since there is a significant cost associated with clinical data acquisition and labeling, medical datasets used for training these models are relatively small in size. In this paper, we aim to alleviate this limitation by proposing a variational generative model along with an effective data augmentation approach that utilizes the generative model to synthesize data. In our approach, the model learns the probability distribution of image data conditioned on a latent variable and the corresponding labels. The trained model can then be used to synthesize new images for data augmentation. We demonstrate the effectiveness of the approach on two independent clinical datasets consisting of ultrasound images of the spine and magnetic resonance images of the brain. For the spine dataset, a baseline and a residual model achieve an accuracy of 85% and 92%, respectively, using our method compared to 78% and 83% using a conventional training approach for image classification task. For the brain dataset, a baseline and a U-net network achieve an accuracy of 84% and 88%, respectively, in Dice coefficient in tumor segmentation compared to 80% and 83% for the convention training approach.
Geotechnical testing serves to assess the strength and stiffness of in-situ soils, for purposes such as informing foundation design. Despite its importance, time constraints, financial considerations, and site-specifi...
详细信息
Geotechnical testing serves to assess the strength and stiffness of in-situ soils, for purposes such as informing foundation design. Despite its importance, time constraints, financial considerations, and site-specific limitations often restrict testing to isolated locations with limited horizontal resolution. Therefore, this paper presents a novel hybrid generative deep learning model designed to approximate soil properties across sites based on sparsely sampled geotechnical data. The model uses geological subsurface samples derived from random field theory as 'a priori' data for a conditional variational auto-encoder (CVAE) model. By doing so, it attempts to map the relationship between in-situ data and the corresponding spatial coordinates, as well as the inherent link between in-situ data and spatial distribution. Then, in the post-processing phase, a Kriging model interpolates minor discrepancies between the measured and predicted values. To demonstrate its practical application, this paper focuses on cone penetration testing (CPT) as the geotechnical test method. The model's development is thoroughly discussed, followed by the validation using in-situ data and an analysis conducted with synthetic data. It is shown that the uncertainty associated with CVAE-Kriging depends upon both the distance from the sample point and the site's inherent complexity. The proposed methodology not only offers refined subsurface modeling but also expands the understanding of uncertainty in geotechnical testing. Practically, it can assist geotechnical engineers with insights during the survey phase.
BackgroundIn medical imaging, images are usually treated as deterministic, while their uncertainties are largely underexplored. PurposeThis work aims at using deep learning to efficiently estimate posterior distributi...
详细信息
BackgroundIn medical imaging, images are usually treated as deterministic, while their uncertainties are largely underexplored. PurposeThis work aims at using deep learning to efficiently estimate posterior distributions of imaging parameters, which in turn can be used to derive the most probable parameters as well as their uncertainties. MethodsOur deep learning-based approaches are based on a variational Bayesian inference framework, which is implemented using two different deep neural networks based on conditional variational auto-encoder (CVAE), CVAE-dual-encoder, and CVAE-dual-decoder. The conventional CVAE framework, that is, CVAE-vanilla, can be regarded as a simplified case of these two neural networks. We applied these approaches to a simulation study of dynamic brain PET imaging using a reference region-based kinetic model. ResultsIn the simulation study, we estimated posterior distributions of PET kinetic parameters given a measurement of the time-activity curve. Our proposed CVAE-dual-encoder and CVAE-dual-decoder yield results that are in good agreement with the asymptotically unbiased posterior distributions sampled by Markov Chain Monte Carlo (MCMC). The CVAE-vanilla can also be used for estimating posterior distributions, although it has an inferior performance to both CVAE-dual-encoder and CVAE-dual-decoder. ConclusionsWe have evaluated the performance of our deep learning approaches for estimating posterior distributions in dynamic brain PET. Our deep learning approaches yield posterior distributions, which are in good agreement with unbiased distributions estimated by MCMC. All these neural networks have different characteristics and can be chosen by the user for specific applications. The proposed methods are general and can be adapted to other problems.
110 kV oil immersed transformer is a key part of the power transmission and transformation system, which determines the power quality and transmission efficiency. Its fault diagnosis can greatly reduce the maintenance...
详细信息
110 kV oil immersed transformer is a key part of the power transmission and transformation system, which determines the power quality and transmission efficiency. Its fault diagnosis can greatly reduce the maintenance cost and improve the economy. At present, the methods of transformer fault diagnosis have a strong dependence on the original data, and the size of the original data directly affects the effect of fault diagnosis. In order to change this situation and achieve higher accuracy of transformer fault diagnosis, this paper firstly uses the conditionalvariationalautomatic encoder (CVAE) composed of full connection layers to expand the original samples under each fault category. After data augmentation, the convolutional neural network (CNN) with strong feature extraction ability is selected as the classifier. Finally, the CVAE-CNN model is validated using public dataset and the result is compared to other machine learning algorithms. (c) 2023 The Authors. Published by Elsevier Ltd. This is an open access article under the CCBY license (http://***/licenses/by/4.0/).
Anomaly detection in surveillance videos aims to identify frames where abnormal events happen. Existing approaches assume that the training and testing videos are from the same scene, exhibiting poor generalization pe...
详细信息
Anomaly detection in surveillance videos aims to identify frames where abnormal events happen. Existing approaches assume that the training and testing videos are from the same scene, exhibiting poor generalization performance when encountering an unseen scene. In this paper, we propose a variational Anomaly Detection Network (VADNet), which is characterized by its high scene-adaptation - it can identify abnormal events in a new scene only via referring to a few normal samples without fine-tuning. Our model embodies two major innovations. First, a novel variational Normal Inference (VNI) module is proposed to formulate image reconstruction in a conditional variational auto-encoder (CVAE) framework, which learns a probabilistic decision model instead of a traditional deterministic one. Secondly, a Margin Learning Embedding (MLE) module is leveraged to boost the variational inference and aid in distinguishing normal events. We theoretically demonstrate that minimizing the triplet loss in MLE module facilitates maximizing the evidence lower bound (ELBO) of CVAE, which promotes the convergence of VNI. By incorporating variational inference with margin learning, VADNet becomes much more generative that is able to handle the uncertainty caused by the changed scene and limited reference data. Extensive experiments on several datasets demonstrate that the proposed VADNet can adapt to a new scene effectively without fine-tuning and achieve remarkable performance, which outperforms other methods significantly and establishes new state-of-the-art in the case of few-shot scene-adaptive anomaly detection. We believe our method is closer to real-world application due to its strong generalization ability. All codes are released in https://***/huangxx156/VADNet.
We propose a conditional variational auto-encoder within Gibbs sampling (CVAE-within-Gibbs) for Bayesian linear inverse problems where the prior or the likelihood function depends on ambiguous hyperparameters. The met...
详细信息
We propose a conditional variational auto-encoder within Gibbs sampling (CVAE-within-Gibbs) for Bayesian linear inverse problems where the prior or the likelihood function depends on ambiguous hyperparameters. The method builds on ideas from classical sampling theory and recent advances in deep generative models to approximate complicated probability distributions. Specifically, we use a CVAE model which is trained with a large amount of data to learn the conditional density of hyperparameters in the original Gibbs sampler. The learned property of the conditional posterior provides more flexibility than classical Gibbs sampling because it avoids manually or experimentally determining the hyperpriors and their hyperparameters. We demonstrate the performance of the proposed method for three linear inverse problems, i.e., image deblurring, signal denoising, and boundary heat flux identification in a heat conduction problem.
暂无评论