The computer vision community is increasingly interested in exploring hyperbolic space for image representation, as hyperbolic approaches have demonstrated outstanding results in efficiently representing data with an ...
详细信息
ISBN:
(纸本)9798350320107
The computer vision community is increasingly interested in exploring hyperbolic space for image representation, as hyperbolic approaches have demonstrated outstanding results in efficiently representing data with an underlying hierarchy. This interest arises from the intrinsic hierarchical nature among images. However, despite the hierarchical nature of remote sensing (RS) images, the investigation of hyperbolic spaces within the RS community has been relatively limited. The objective of this study is therefore to examine the relevance of hyperbolic embeddings of RS data, focusing on scene embedding. Using a variational Auto-Encoder, we project the data into a hyperbolic latent space while ensuring numerical stability with a feature clipping technique. Experiments conducted on the NWPU-RESISC45 image dataset demonstrate the superiority of hyperbolic embeddings over the Euclidean counterparts in a classification task. Our study highlights the potential of operating in hyperbolic space as a promising approach for embedding RS data.
Visual surveillance has become indispensable in the evolution of Intelligent Transportation Systems (ITS). Video object trajectories are key to many of the visual surveillance applications. Classifying varying length ...
详细信息
Visual surveillance has become indispensable in the evolution of Intelligent Transportation Systems (ITS). Video object trajectories are key to many of the visual surveillance applications. Classifying varying length time series data such as video object trajectories using conventional neural networks, can be challenging. In this paper, we propose trajectory classification and anomaly detection using a hybrid Convolutional Neural Network (CNN) and variational autoencoder (VAE) architecture. First, we introduce a high level features for varying length object trajectories using color gradient representation. In the next stage, a semi-supervised way to annotate moving object trajectories extracted using Temporally Incremental Gravitational Model (TIGM) is used for class labeling. For training, anomalous trajectories are identified using t-Distributed Stochastic Neighbor Embedding (t-SNE). Finally, a hybrid CNN-VAE architecture has been proposed for trajectory classification and anomaly detection. The results obtained using publicly available surveillance video datasets reveal that the proposed method can successfully identify traffic anomalies such as violations in lane driving, sudden speed variations, abrupt termination of vehicle during movement, and vehicles moving in wrong directions. The accuracy of trajectory classification improves by a margin of 1-6% against popular neural networks-based classifiers across various datasets using the proposed high-level features. The gradient representation also improves the anomaly detection accuracy significantly (30-35%). Code and dataset can be found at https://***/santhoshkelathodi/CNN-VAE.
Objective: This work investigates the possibility of disentangled representation learning of inter-subject anatomical variations within electrocardiographic (ECG) data. Methods: Since ground truth anatomical factors a...
详细信息
Objective: This work investigates the possibility of disentangled representation learning of inter-subject anatomical variations within electrocardiographic (ECG) data. Methods: Since ground truth anatomical factors are generally not known in clinical ECG for assessing the disentangling ability of the models, the presented work first proposes the SimECG data set, a 12-lead ECG data set procedurally generated with a controlled set of anatomical generative factors. Second, to perform such disentanglement, the presented method evaluates and compares deep generative models with latent density modeled by nonparametric Indian Buffet Process to account for the complex generative process of ECG data. Results: In the simulated data, the experiments demonstrate, for the first time, concrete evidence of the possibility to disentangle key generative anatomical factors within ECG data in separation from task-relevant generative factors. We achieve a disentanglement score of 92.1% while disentangling five anatomical generative factors and the task-relevant generative factor. In both simulated and real-data experiments, this work further provides quantitative evidence for the benefit of disentanglement learning on the downstream clinical task of localizing the origin of ventricular activation. Overall, the presented method achieves an improvement of around 18.5%, and 11.3% for the simulated dataset, and around 7.2%, and 3.6% for the real dataset, over baseline CNN, and standard generative model, respectively. Conclusion: These results demonstrate the importance as well as the feasibility of the disentangled representation learning of inter-subject anatomical variations within ECG data. Significance: This work suggests the important research direction to deal with the well-known challenge posed by the presence of significant inter-subject variations during an automated analysis of ECG data.
Deep learning-based classification algorithms offer no performance guarantees when deployed on testing data not generated by the same process as the training data. Such out-of-distribution (OOD) data often cause class...
详细信息
Deep learning-based classification algorithms offer no performance guarantees when deployed on testing data not generated by the same process as the training data. Such out-of-distribution (OOD) data often cause classification errors that are hard to detect since they do not generate explicit errors in the model. In real-world applications, there is no way to ensure that the testing data and the training data are drawn from the same or sufficiently similar distributions. This problem is especially challenging in wireless communications applications. Because the radio propagation channel is highly dynamic, it is very difficult to ensure that a deep learning model is not tested on OOD data. In this paper, we propose a novel deep learning model called FOOD (Feature representation for detecting OOD data) to detect OOD data in wireless communications applications. FOOD incorporates a new model architecture to detect OOD data accurately and minimizes the instances of normal data being recognized as OOD. We evaluated the performance of FOOD extensively using transmitter classification and modulation recognition tasks, with both experimental datasets and simulation-generated datasets. As far as we know, this is the first systematic study on the impact and detection of OOD data in deep learning-based wireless communications applications.
Facial expression retargeting from humans to virtual characters is a useful technique in computer graphics and animation. Traditional methods use markers or blendshapes to construct a mapping between the human and ava...
详细信息
Facial expression retargeting from humans to virtual characters is a useful technique in computer graphics and animation. Traditional methods use markers or blendshapes to construct a mapping between the human and avatar faces. However, these approaches require a tedious 3D modeling process, and the performance relies on the modelers' experience. In this article, we propose a brand-new solution to this cross-domain expression transfer problem via nonlinear expression embedding and expression domain translation. We first build low-dimensional latent spaces for the human and avatar facial expressions with variational autoencoder. Then we construct correspondences between the two latent spaces guided by geometric and perceptual constraints. Specifically, we design geometric correspondences to reflect geometric matching and utilize a triplet data structure to express users' perceptual preference of avatar expressions. A user-friendly method is proposed to automatically generate triplets for a system allowing users to easily and efficiently annotate the correspondences. Using both geometric and perceptual correspondences, we trained a network for expression domain translation from human to avatar. Extensive experimental results and user studies demonstrate that even nonprofessional users can apply our method to generate high-quality facial expression retargeting results with less time and effort.
This article considers the task of text style transfer: transforming a specific style of sentence into another while preserving its style-independent content. A dominate approach to text style transfer is to learn a g...
详细信息
This article considers the task of text style transfer: transforming a specific style of sentence into another while preserving its style-independent content. A dominate approach to text style transfer is to learn a good content factor of text, define a fixed vector for every style and recombine them to generate text in the required style. In fact, there are a large number of different words to convey the same style from different aspects. Thus, using a fixed vector to represent one style is very inefficient, which causes the weak representation power of the style vector and limits text diversity of the same style. To address this problem, we propose a novel neural generative model called Adversarial Separation Network (ASN), which can learn the content and style vector jointly and the learnt vectors have strong representation power and good interpretabilities. In our method, adversarial learning is implemented to enhance our model's capability of disentangling the two factors. To evaluate our method, we conduct experiments on two benchmark datasets. Experimental results show our method can perform style transfer better than strong comparison systems. We also demonstrate the strong interpretability of the learnt latent vectors.
Speech-based interfaces provide convenient methods for controlling various smart devices. For these interfaces to work reliably, considerable speech data with various noise and speaker characteristics must be collecte...
详细信息
Speech-based interfaces provide convenient methods for controlling various smart devices. For these interfaces to work reliably, considerable speech data with various noise and speaker characteristics must be collected to train the associated speech-processing models. Gathering spoken commands from actual users of devices can improve those devices' performance by familiarizing each device with the individual acoustic characteristic of its particular user's speech. However, the direct acquisition of spoken commands could threaten the privacy of users, as the spoken data would contain sensitive speaker-specific information. Speaker anonymization algorithms can be applied to suppress such sensitive information, while preserving the linguistic content of a user's speech. Previous speaker anonymization algorithms could handle only the voice of speakers who contributed to the training datasets. As speaker anonymization algorithms are typically applied to new speakers (who are absent from the training datasets), a method of handling such speakers (commonly referred to as unseen speakers) should be developed. In this paper, we propose a novel method that can effectively suppress the individual characteristics in an unseen speaker's voice, while retaining the linguistic content of the speech. It adopts zero-shot voice conversion methods for the unseen speaker anonymization. Since the proposed method utilizes speaker identity vectors commonly used in many-to-many voice conversion algorithms and does not modify the conversion algorithm itself, it can be easily combined with many other voice conversion algorithms. The proposed method is evaluated using the VCC2018 and VCTK corpora. Speaker identification rate and speech recognition rate are used for quantitative analysis. The experimental results showed that the average speaker identification accuracy was decreased by 92.3% point absolutely and the average speech recognition accuracy was decreased by 17.7% point absolutely afte
Anomaly detection based on generative models usually uses the reconstruction loss of samples for anomaly discrimination. However, there are two problems in semi-supervised or unsupervised learning. One is that the gen...
详细信息
Anomaly detection based on generative models usually uses the reconstruction loss of samples for anomaly discrimination. However, there are two problems in semi-supervised or unsupervised learning. One is that the generalizing ability of the generator is too strong, which may reduce the reconstruction loss of some outliers. The other is that the background statistics will interfere with the reconstruction loss of outliers. Both of them will reduce the effectiveness of anomaly detection. In this paper, we propose an anomaly detection method called MHMA (Multi-headed Memory autoencoder). The variational autoencoder is used as the generation model, and the vector in potential space is limited by the memory module, which increases the reconstruction error of abnormal samples. Moreover, the MHMA uses the multi-head structure to divide the last layer of the decoder into multiple branches to learn and generate a diverse sample distribution, which keeps the generalization capability of the model within a reasonable range. In the process of calculating outliers, a likelihood ratio method is employed to obtain correct background statistics according to the background model, thus enhancing the specific features in the reconstructed samples. The effectiveness and universality of MHMA are tested on different types of datasets, and the results show that the model achieves 99.5% recall, 99.9% precision, 99.69% F1 and 98.12% MCC on the image dataset and it achieves 98.61% recall, 98.73% precision, 98.67% F1 and 95.82% MCC on the network security dataset.
With the capability of capturing high-resolution imagery data and the ease of accessing remote areas, aerial robots are becoming increasingly popular for forest health monitoring applications. For example, forestry ta...
详细信息
With the capability of capturing high-resolution imagery data and the ease of accessing remote areas, aerial robots are becoming increasingly popular for forest health monitoring applications. For example, forestry tasks such as field surveys and foliar sampling which are generally manual and labour intensive can be automated with remotely controlled aerial robots. In this study, we propose two new online frameworks to quantify and rank the severity of individual tree crown loss. The real-time crown loss estimation (RTCLE) model localises and classifies individual trees into their respective crown loss percentage bins. Experiments are conducted to investigate if synthetically generated tree images can be used to train the RTCLE model as real images with diverse viewpoints are generally expensive to collect. Results have shown that synthetic data training helps to achieve a satisfactory baseline mean average precision (mAP) which can be further improved with just some additional real imagery data. We showed that the mAP can be increased approximately from 60% to 78% by mixing the real dataset with the generated synthetic data. For individual tree crown loss ranking, a two-step crown loss ranking (TSCLR) framework is developed to handle the inconsistently labelled crown loss data. The TSCLR framework detects individual trees before ranking them based on some relative crown loss severity measures. The tree detection model is trained with the combined dataset used in the RTCLE model training where we achieved an mAP of approximately 95% suggesting that the model generalises well to unseen datasets. The relative crown loss severity of each tree is estimated, with deep representation learning, by a probabilistic encoder from a fully trained variational autoencoder (VAE) model. The VAE is trained end-to-end to reconstruct tree images in a background agnostic way. Based on a conservative evaluation, the estimated crown loss severity from the probabilistic encoder generally
In industrial processes, different operating conditions and ratios of ingredients are used to produce multi-grade products in the same production line. Yet, the production grade changes so quickly as the demand from c...
详细信息
In industrial processes, different operating conditions and ratios of ingredients are used to produce multi-grade products in the same production line. Yet, the production grade changes so quickly as the demand from customers varies from time to time. As a result, the process data collected in certain operating regions are often scarce. Process dynamics, nonlinearity, and process uncertainty increase the hardship in developing a reliable model to monitor the process status. In this paper, the sourceaided variational state-space autoencoder (SA-VSSAE) is proposed. It integrates variational state-space autoencoder with the Gaussian mixture. With the additional information from the source grades, SAVSSAE can be used for monitoring processes with sparse target data by performing information sharing to enhance the reliability of the target model. Unlike the past works which perform information sharing and modeling in a two-step procedure, the proposed model is designed for information sharing and modeling in a one-step procedure without causing information loss. In contrast to the traditional state-space model, which is linear and deterministic, the variational state-space autoencoder (VSSAE) extracts the dynamic and nonlinear features in the process variables using neural networks. Also, by taking process uncertainty into consideration, VSSAE describes the features in a probabilistic form. Probability density estimates of the residual and latent variables are given to design the monitoring indices for fault detection. A numerical example and an industrial polyvinyl chloride drying process are presented to show the advantages of the proposed method over the comparative methods. (c) 2022 Elsevier Ltd. All rights reserved.
暂无评论