检索结果-内蒙古大学图书馆

IEEE International Geoscience and Remote Sensing Symposium (IGARSS)

作者： Hamzaoui, Manal Chapel, Laetitia Pham, Minh-Tan Lefevre, Sebastien Univ Bretagne Sud IRISA UMR 6074 F-56000 Vannes France

ISBN: (纸本)9798350320107

The computer vision community is increasingly interested in exploring hyperbolic space for image representation, as hyperbolic approaches have demonstrated outstanding results in efficiently representing data with an underlying hierarchy. This interest arises from the intrinsic hierarchical nature among images. However, despite the hierarchical nature of remote sensing (RS) images, the investigation of hyperbolic spaces within the RS community has been relatively limited. The objective of this study is therefore to examine the relevance of hyperbolic embeddings of RS data, focusing on scene embedding. Using a variational Auto-Encoder, we project the data into a hyperbolic latent space while ensuring numerical stability with a feature clipping technique. Experiments conducted on the NWPU-RESISC45 image dataset demonstrate the superiority of hyperbolic embeddings over the Euclidean counterparts in a classification task. Our study highlights the potential of operating in hyperbolic space as a promising approach for embedding RS data.

关键词： Remote sensing scene embedding underlying hierarchy hyperbolic space variational autoencoder

来源：评论

学校读者我要写书评

暂无评论

Vehicular Trajectory Classification and Traffic Anomaly Detection in Videos Using a Hybrid CNN-VAE Architecture

引用

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 2022年第8期23卷 11891-11902页

作者： Santhosh, Kelathodi Kumaran Dogra, Debi Prosad Roy, Partha Pratim Mitra, Adway Indian Inst Technol Bhubaneswar Sch Elect Sci Bhubaneswar 752050 Odisha India Indian Inst Technol Dept Comp Sci & Engn Roorkee 247667 Uttar Pradesh India Indian Inst Technol Kharagpur Ctr Excellence Artificial Intelligence AI Kharagpur 721302 W Bengal India

Visual surveillance has become indispensable in the evolution of Intelligent Transportation Systems (ITS). Video object trajectories are key to many of the visual surveillance applications. Classifying varying length time series data such as video object trajectories using conventional neural networks, can be challenging. In this paper, we propose trajectory classification and anomaly detection using a hybrid Convolutional Neural Network (CNN) and variational autoencoder (VAE) architecture. First, we introduce a high level features for varying length object trajectories using color gradient representation. In the next stage, a semi-supervised way to annotate moving object trajectories extracted using Temporally Incremental Gravitational Model (TIGM) is used for class labeling. For training, anomalous trajectories are identified using t-Distributed Stochastic Neighbor Embedding (t-SNE). Finally, a hybrid CNN-VAE architecture has been proposed for trajectory classification and anomaly detection. The results obtained using publicly available surveillance video datasets reveal that the proposed method can successfully identify traffic anomalies such as violations in lane driving, sudden speed variations, abrupt termination of vehicle during movement, and vehicles moving in wrong directions. The accuracy of trajectory classification improves by a margin of 1-6% against popular neural networks-based classifiers across various datasets using the proposed high-level features. The gradient representation also improves the anomaly detection accuracy significantly (30-35%). Code and dataset can be found at https://***/santhoshkelathodi/CNN-VAE.

关键词： Trajectory Image color analysis Videos Anomaly detection Convolutional neural networks Feature extraction Training data Convolutional neural network deep learning variational autoencoder Dirichlet process mixture model visual surveillance trajectory classification traffic anomaly detection

来源：评论

学校读者我要写书评

暂无评论

Learning to Disentangle Inter-Subject Anatomical Variations in Electrocardiographic Data

引用

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING 2022年第2期69卷 860-870页

作者： Gyawali, Prashnna K. Murkute, Jaideep Vitthal Toloubidokhti, Maryam Jiang, Xiajun Horacek, B. Milan Sapp, John L. Wang, Linwei Rochester Inst Technol Stanford CA 94305 USA Stanford Univ Stanford CA 94305 USA Dalhousie Univ Sch Biomed Engn Halifax NS Canada

Objective: This work investigates the possibility of disentangled representation learning of inter-subject anatomical variations within electrocardiographic (ECG) data. Methods: Since ground truth anatomical factors are generally not known in clinical ECG for assessing the disentangling ability of the models, the presented work first proposes the SimECG data set, a 12-lead ECG data set procedurally generated with a controlled set of anatomical generative factors. Second, to perform such disentanglement, the presented method evaluates and compares deep generative models with latent density modeled by nonparametric Indian Buffet Process to account for the complex generative process of ECG data. Results: In the simulated data, the experiments demonstrate, for the first time, concrete evidence of the possibility to disentangle key generative anatomical factors within ECG data in separation from task-relevant generative factors. We achieve a disentanglement score of 92.1% while disentangling five anatomical generative factors and the task-relevant generative factor. In both simulated and real-data experiments, this work further provides quantitative evidence for the benefit of disentanglement learning on the downstream clinical task of localizing the origin of ventricular activation. Overall, the presented method achieves an improvement of around 18.5%, and 11.3% for the simulated dataset, and around 7.2%, and 3.6% for the real dataset, over baseline CNN, and standard generative model, respectively. Conclusion: These results demonstrate the importance as well as the feasibility of the disentangled representation learning of inter-subject anatomical variations within ECG data. Significance: This work suggests the important research direction to deal with the well-known challenge posed by the presence of significant inter-subject variations during an automated analysis of ECG data.

关键词： Disentangled representation learning generative models variational autoencoder India Buffet process electrocardiograms

来源：评论

学校读者我要写书评

暂无评论

Detecting Out-of-Distribution Data in Wireless Communications Applications of Deep Learning

引用

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS 2022年第4期21卷 2476-2487页

作者： Liu, Jinshan Oyedare, Taiwo Park, Jung-Min Virginia Tech Dept Elect & Comp Engn Arlington VA 22203 USA

Deep learning-based classification algorithms offer no performance guarantees when deployed on testing data not generated by the same process as the training data. Such out-of-distribution (OOD) data often cause classification errors that are hard to detect since they do not generate explicit errors in the model. In real-world applications, there is no way to ensure that the testing data and the training data are drawn from the same or sufficiently similar distributions. This problem is especially challenging in wireless communications applications. Because the radio propagation channel is highly dynamic, it is very difficult to ensure that a deep learning model is not tested on OOD data. In this paper, we propose a novel deep learning model called FOOD (Feature representation for detecting OOD data) to detect OOD data in wireless communications applications. FOOD incorporates a new model architecture to detect OOD data accurately and minimizes the instances of normal data being recognized as OOD. We evaluated the performance of FOOD extensively using transmitter classification and modulation recognition tasks, with both experimental datasets and simulation-generated datasets. As far as we know, this is the first systematic study on the impact and detection of OOD data in deep learning-based wireless communications applications.

关键词： Wireless communication Data models Training data Training Deep learning Testing Feature extraction Deep learning out-of-distribution variational autoencoder transmitter classification modulation recognition

来源：评论

学校读者我要写书评

暂无评论

Facial Expression Retargeting From Human to Avatar Made Easy

引用

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022年第2期28卷 1274-1287页

作者： Zhang, Juyong Chen, Keyu Zheng, Jianmin Univ Sci & Technol China Sch Math Sci Hefei 230052 Anhui Peoples R China Nanyang Technol Univ Sch Comp Sci & Engn Singapore 639798 Singapore

Facial expression retargeting from humans to virtual characters is a useful technique in computer graphics and animation. Traditional methods use markers or blendshapes to construct a mapping between the human and avatar faces. However, these approaches require a tedious 3D modeling process, and the performance relies on the modelers' experience. In this article, we propose a brand-new solution to this cross-domain expression transfer problem via nonlinear expression embedding and expression domain translation. We first build low-dimensional latent spaces for the human and avatar facial expressions with variational autoencoder. Then we construct correspondences between the two latent spaces guided by geometric and perceptual constraints. Specifically, we design geometric correspondences to reflect geometric matching and utilize a triplet data structure to express users' perceptual preference of avatar expressions. A user-friendly method is proposed to automatically generate triplets for a system allowing users to easily and efficiently annotate the correspondences. Using both geometric and perceptual correspondences, we trained a network for expression domain translation from human to avatar. Extensive experimental results and user studies demonstrate that even nonprofessional users can apply our method to generate high-quality facial expression retargeting results with less time and effort.

关键词： Avatars Three-dimensional displays Strain Animation Shape Solid modeling Machine learning Facial expression retargeting variational autoencoder deformation transfer cross domain translation triplet

来源：评论

学校读者我要写书评

暂无评论

Adversarial Separation Network for Text Style Transfer

引用

ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING 2022年第2期21卷 1–14页

作者： Yang, Haitong Zhou, Guangyou He, Tingting Cent China Normal Univ Hubei Prov Key Lab Artificial Intelligence & Smar Wuhan Peoples R China Cent China Normal Univ Natl Language Resources Monitoring Wuhan Peoples R China Cent China Normal Univ Res Ctr Network Media Wuhan Peoples R China Cent China Normal Univ Sch Comp Luo Yu Rd Wuhan Hubei Peoples R China

This article considers the task of text style transfer: transforming a specific style of sentence into another while preserving its style-independent content. A dominate approach to text style transfer is to learn a good content factor of text, define a fixed vector for every style and recombine them to generate text in the required style. In fact, there are a large number of different words to convey the same style from different aspects. Thus, using a fixed vector to represent one style is very inefficient, which causes the weak representation power of the style vector and limits text diversity of the same style. To address this problem, we propose a novel neural generative model called Adversarial Separation Network (ASN), which can learn the content and style vector jointly and the learnt vectors have strong representation power and good interpretabilities. In our method, adversarial learning is implemented to enhance our model's capability of disentangling the two factors. To evaluate our method, we conduct experiments on two benchmark datasets. Experimental results show our method can perform style transfer better than strong comparison systems. We also demonstrate the strong interpretability of the learnt latent vectors.

关键词： Adversarial learning adversarial separation network latent factor mapping neural generative model text style transfer variational autoencoder

来源：评论

学校读者我要写书评

暂无评论

Zero-Shot Unseen Speaker Anonymization via Voice Conversion

引用

IEEE ACCESS 2022年 10卷 130190-130199页

作者： Chang, Hyung-Pil Yoo, In-Chul Jeong, Changhyeon Yook, Dongsuk Korea Univ Dept Comp Sci & Engn Artificial Intelligence Lab Seoul South Korea

Speech-based interfaces provide convenient methods for controlling various smart devices. For these interfaces to work reliably, considerable speech data with various noise and speaker characteristics must be collected to train the associated speech-processing models. Gathering spoken commands from actual users of devices can improve those devices' performance by familiarizing each device with the individual acoustic characteristic of its particular user's speech. However, the direct acquisition of spoken commands could threaten the privacy of users, as the spoken data would contain sensitive speaker-specific information. Speaker anonymization algorithms can be applied to suppress such sensitive information, while preserving the linguistic content of a user's speech. Previous speaker anonymization algorithms could handle only the voice of speakers who contributed to the training datasets. As speaker anonymization algorithms are typically applied to new speakers (who are absent from the training datasets), a method of handling such speakers (commonly referred to as unseen speakers) should be developed. In this paper, we propose a novel method that can effectively suppress the individual characteristics in an unseen speaker's voice, while retaining the linguistic content of the speech. It adopts zero-shot voice conversion methods for the unseen speaker anonymization. Since the proposed method utilizes speaker identity vectors commonly used in many-to-many voice conversion algorithms and does not modify the conversion algorithm itself, it can be easily combined with many other voice conversion algorithms. The proposed method is evaluated using the VCC2018 and VCTK corpora. Speaker identification rate and speech recognition rate are used for quantitative analysis. The experimental results showed that the average speaker identification accuracy was decreased by 92.3% point absolutely and the average speech recognition accuracy was decreased by 17.7% point absolutely afte

关键词： Data privacy speaker anonymization unseen speakers variational autoencoder voice conversion zero-shot learning

来源：评论

学校读者我要写书评

暂无评论

Unsupervised abnormal detection using VAE with memory

引用

SOFT COMPUTING 2022年第13期26卷 6219-6231页

作者： Xie, Xin Li, Xinlei Wang, Bin Wan, Tiancheng Xu, Lei Li, Huiping East China Jiaotong Univ Sch Informat Engn Nanchang Jiangxi Peoples R China Univ Elect Sci & Technol China Sch Informat & Commun Engn Chengdu Peoples R China

Anomaly detection based on generative models usually uses the reconstruction loss of samples for anomaly discrimination. However, there are two problems in semi-supervised or unsupervised learning. One is that the generalizing ability of the generator is too strong, which may reduce the reconstruction loss of some outliers. The other is that the background statistics will interfere with the reconstruction loss of outliers. Both of them will reduce the effectiveness of anomaly detection. In this paper, we propose an anomaly detection method called MHMA (Multi-headed Memory autoencoder). The variational autoencoder is used as the generation model, and the vector in potential space is limited by the memory module, which increases the reconstruction error of abnormal samples. Moreover, the MHMA uses the multi-head structure to divide the last layer of the decoder into multiple branches to learn and generate a diverse sample distribution, which keeps the generalization capability of the model within a reasonable range. In the process of calculating outliers, a likelihood ratio method is employed to obtain correct background statistics according to the background model, thus enhancing the specific features in the reconstructed samples. The effectiveness and universality of MHMA are tested on different types of datasets, and the results show that the model achieves 99.5% recall, 99.9% precision, 99.69% F1 and 98.12% MCC on the image dataset and it achieves 98.61% recall, 98.73% precision, 98.67% F1 and 95.82% MCC on the network security dataset.

关键词： Anomaly detection Unsupervised learning variational autoencoder Generative adversarial networks MHMA

来源：评论

学校读者我要写书评

暂无评论

Vision based crown loss estimation for individual trees with remote aerial robots

引用

ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING 2022年 188卷 75-88页

作者： Ho, Boon Kocer, Basaran Bahadir Kovac, Mirko Imperial Coll London Aerial Robot Lab London SW7 2AZ England Swiss Fed Labs Mat Sci & Technol Mat & Technol Ctr Robot CH-8600 Dubendorf Switzerland

With the capability of capturing high-resolution imagery data and the ease of accessing remote areas, aerial robots are becoming increasingly popular for forest health monitoring applications. For example, forestry tasks such as field surveys and foliar sampling which are generally manual and labour intensive can be automated with remotely controlled aerial robots. In this study, we propose two new online frameworks to quantify and rank the severity of individual tree crown loss. The real-time crown loss estimation (RTCLE) model localises and classifies individual trees into their respective crown loss percentage bins. Experiments are conducted to investigate if synthetically generated tree images can be used to train the RTCLE model as real images with diverse viewpoints are generally expensive to collect. Results have shown that synthetic data training helps to achieve a satisfactory baseline mean average precision (mAP) which can be further improved with just some additional real imagery data. We showed that the mAP can be increased approximately from 60% to 78% by mixing the real dataset with the generated synthetic data. For individual tree crown loss ranking, a two-step crown loss ranking (TSCLR) framework is developed to handle the inconsistently labelled crown loss data. The TSCLR framework detects individual trees before ranking them based on some relative crown loss severity measures. The tree detection model is trained with the combined dataset used in the RTCLE model training where we achieved an mAP of approximately 95% suggesting that the model generalises well to unseen datasets. The relative crown loss severity of each tree is estimated, with deep representation learning, by a probabilistic encoder from a fully trained variational autoencoder (VAE) model. The VAE is trained end-to-end to reconstruct tree images in a background agnostic way. Based on a conservative evaluation, the estimated crown loss severity from the probabilistic encoder generally

关键词： Aerial robots Unmanned aerial vehicles Crown loss estimation Convolutional neural network variational autoencoder Foliar sampling

来源：评论

学校读者我要写书评

暂无评论

Using source data to aid and build variational state-space autoencoders with sparse target data for process monitoring

引用

NEURAL NETWORKS 2022年第0期154卷 455-468页

作者： Lee, Yi Shan Chen, Junghui Chung Yuan Christian Univ Dept Chem Engn Taoyuan 32023 Taiwan

In industrial processes, different operating conditions and ratios of ingredients are used to produce multi-grade products in the same production line. Yet, the production grade changes so quickly as the demand from customers varies from time to time. As a result, the process data collected in certain operating regions are often scarce. Process dynamics, nonlinearity, and process uncertainty increase the hardship in developing a reliable model to monitor the process status. In this paper, the sourceaided variational state-space autoencoder (SA-VSSAE) is proposed. It integrates variational state-space autoencoder with the Gaussian mixture. With the additional information from the source grades, SAVSSAE can be used for monitoring processes with sparse target data by performing information sharing to enhance the reliability of the target model. Unlike the past works which perform information sharing and modeling in a two-step procedure, the proposed model is designed for information sharing and modeling in a one-step procedure without causing information loss. In contrast to the traditional state-space model, which is linear and deterministic, the variational state-space autoencoder (VSSAE) extracts the dynamic and nonlinear features in the process variables using neural networks. Also, by taking process uncertainty into consideration, VSSAE describes the features in a probabilistic form. Probability density estimates of the residual and latent variables are given to design the monitoring indices for fault detection. A numerical example and an industrial polyvinyl chloride drying process are presented to show the advantages of the proposed method over the comparative methods. (c) 2022 Elsevier Ltd. All rights reserved.

关键词： Gaussian mixture model Multigrade-process Process dynamics Process monitoring Sparse data variational autoencoder

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：