Kalman filters, recognized as a traditional and effective inference algorithm based on state space models (SSM), have been extensively applied in the fields of navigation and mapping. However, their performance will d...
详细信息
Kalman filters, recognized as a traditional and effective inference algorithm based on state space models (SSM), have been extensively applied in the fields of navigation and mapping. However, their performance will degrade when facing model assumption mismatches, such as non-linear dynamics and non-Gaussian correlated noises. The model-based deep learning methods overcome these mismatches by combining the domain knowledge of the model-based methods and the expressiveness of the data-driven deep learning methods, and thus can provide a promising solution for addressing high-dimensional and nonlinear challenges. This paper presents a succinct overview of the principles, inference model, and training methodology employed in model-based deep learning methods, with particular focus on the KalmanNet and the dynamicalvariational Autoencoder (DVAE). Furthermore, it implements KalmanNet on robust and high-precision navigation and positioning problem. The experimental results substantiate the feasibility of achieving navigation and positioning accuracy comparable to that of the Extended Kalman Filter (EKF), while simultaneously exhibiting enhanced robustness, albeit at the cost of some computational overhead.
dynamical variational autoencoders (DVAEs) are a class of deep generative models with latent variables, dedicated to model time series of high-dimensional data. DVAEs can be considered as extensions of the variational...
详细信息
dynamical variational autoencoders (DVAEs) are a class of deep generative models with latent variables, dedicated to model time series of high-dimensional data. DVAEs can be considered as extensions of the variational autoencoder (VAE) that include temporal dependencies between successive observed and/or latent vectors. Previous work has shown the interest of using DVAEs over the VAE for speech spectrograms modeling. Independently, the VAE has been successfully applied to speech enhancement in noise, in an unsupervised noise-agnostic set-up that requires neither noise samples nor noisy speech samples at training time, but only requires clean speech signals. In this paper, we extend these works to DVAE-based single-channel unsupervised speech enhancement, hence exploiting both speech signals unsupervised representation learning and dynamics modeling. We propose an unsupervised speech enhancement algorithm that combines a DVAE speech prior pre-trained on clean speech signals with a noise model based on nonnegative matrix factorization, and we derive a variational expectation-maximization (VEM) algorithm to perform speech enhancement. The algorithm is presented with the most general DVAE formulation and is then applied with three specific DVAE models to illustrate the versatility of the framework. Experimental results show that the proposed DVAE-based approach outperforms its VAE-based counterpart, as well as several supervised and unsupervised noise-dependent baselines, especially when the noise type is unseen during training.
The variational Autoencoder (VAE) is a powerful deep generative model that is now extensively used to represent high-dimensional complex data via a low-dimensional latent space learned in an unsupervised manner. In th...
详细信息
ISBN:
(纸本)9781713836902
The variational Autoencoder (VAE) is a powerful deep generative model that is now extensively used to represent high-dimensional complex data via a low-dimensional latent space learned in an unsupervised manner. In the original VAE model, input data vectors are processed independently. In recent years, a series of papers have presented different extensions of the VAE to process sequential data, that not only model the latent space, but also model the temporal dependencies within a sequence of data vectors and corresponding latent vectors, relying on recurrent neural networks. We recently performed a comprehensive review of those models and unified them into a general class called dynamical variational autoencoders (DVAEs). In the present paper, we present the results of an experimental benchmark comparing six of those DVAE models on the speech analysis-resynthesis task, as an illustration of the high potential of DVAEs for speech modeling.
This work builds on a previous work on unsupervised speech enhancement using a dynamicalvariational autoencoder (DVAE) as the clean speech model and non-negative matrix factorization (NMF) as the noise model. We prop...
详细信息
This work builds on a previous work on unsupervised speech enhancement using a dynamicalvariational autoencoder (DVAE) as the clean speech model and non-negative matrix factorization (NMF) as the noise model. We propose to replace the NMF noise model with a deep dynamical generative model (DDGM) depending either on the DVAE latent variables, or on the noisy observations, or on both. This DDGM can be trained in three configurations: noise-agnostic, noise-dependent and noise adaptation after noise-dependent training. Experimental results show that the proposed method achieves competitive performance compared to state-of-the-art unsupervised speech enhancement methods, while the noise-dependent training configuration yields a much more time-efficient inference process. Index Terms: Unsupervised speech enhancement, dynamical variational autoencoders, deep dynamical generative model.
Modeling nonlinear dynamical systems is a challenging task in fields such as speech processing, music generation, and video prediction. This paper introduces a hierarchical framework for Deep State Space Models (DSSMs...
详细信息
Structured sequences are popularly used to describe graph data with time-evolving node features and edges. A typical real-world scenario of structured sequences is that unknown class labels continuously arrive and thu...
详细信息
Structured sequences are popularly used to describe graph data with time-evolving node features and edges. A typical real-world scenario of structured sequences is that unknown class labels continuously arrive and thus the training and testing often across different class spaces. This scenario is also referred to as the open-world learning problem on structured sequences . In this paper, we present a new Dense Open-world Structured Sequence Learning model (DOSSL for short) to learn graph streams in the open-world learning setting. To capture both structural and temporal information, DOSSL uses a GNN-based stochastic recurrent neural network for learning node representation in graph streams, then a truncated Laplacian distribution to describe the latent distribution of graph nodes, and a sampling function is used to generate node representations. Further, DOSSL learns dense target embeddings for the known classes to improve the compactness of known class distribution and reserve enough space for open-world unknown classes. The ultimate open-world classifier is optimized to detect the samples from unknown classes under the constraints of DVAE loss, label loss, class uncertainty loss, and dense target loss. Through empirical analysis conducted on real-world datasets, it has been demonstrated that the advanced technique known as DOSSL exhibits the ability to acquire precise node classifiers by harnessing the power of graph streams.
暂无评论