检索结果-内蒙古大学图书馆

5th International-Society-for-Photogrammetry-and-Remote-Sensing (ISPRS) Geospatial Week (GSW)

作者： Shen, Dan Ma, Yuexin Liu, Gelu Hu, Jiaocheng Weng, Qizhen Zhu, Xiangwei Sun Yat Sen Univ Sch Syst Sci & Engn Guangzhou 510006 Peoples R China Sun Yat Sen Univ Sch Elect & Commun Engn Shenzhen 528406 Peoples R China

Kalman filters, recognized as a traditional and effective inference algorithm based on state space models (SSM), have been extensively applied in the fields of navigation and mapping. However, their performance will degrade when facing model assumption mismatches, such as non-linear dynamics and non-Gaussian correlated noises. The model-based deep learning methods overcome these mismatches by combining the domain knowledge of the model-based methods and the expressiveness of the data-driven deep learning methods, and thus can provide a promising solution for addressing high-dimensional and nonlinear challenges. This paper presents a succinct overview of the principles, inference model, and training methodology employed in model-based deep learning methods, with particular focus on the KalmanNet and the dynamical variational Autoencoder (DVAE). Furthermore, it implements KalmanNet on robust and high-precision navigation and positioning problem. The experimental results substantiate the feasibility of achieving navigation and positioning accuracy comparable to that of the Extended Kalman Filter (EKF), while simultaneously exhibiting enhanced robustness, albeit at the cost of some computational overhead.

关键词： State Space Models dynamical variational autoencoders Dynamic Bayesian Networks Kalman Filter KalmanNet

来源：评论

学校读者我要写书评

暂无评论

Unsupervised Speech Enhancement Using dynamical variational autoencoders

引用

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING 2022年 30卷 2993-3007页

作者： Bie, Xiaoyu Leglaive, Simon Alameda-Pineda, Xavier Girin, Laurent Univ Grenoble Alpes Inria Grenoble Rhone Alpes F-38000 Grenoble France Cent Supelec IETR UMR CNRS 6164 F-35576 Cesson Sevigne France Univ Grenoble Alpes GIPSA Lab CNRS Grenoble INP F-38402 Grenoble France

dynamical variational autoencoders (DVAEs) are a class of deep generative models with latent variables, dedicated to model time series of high-dimensional data. DVAEs can be considered as extensions of the variational autoencoder (VAE) that include temporal dependencies between successive observed and/or latent vectors. Previous work has shown the interest of using DVAEs over the VAE for speech spectrograms modeling. Independently, the VAE has been successfully applied to speech enhancement in noise, in an unsupervised noise-agnostic set-up that requires neither noise samples nor noisy speech samples at training time, but only requires clean speech signals. In this paper, we extend these works to DVAE-based single-channel unsupervised speech enhancement, hence exploiting both speech signals unsupervised representation learning and dynamics modeling. We propose an unsupervised speech enhancement algorithm that combines a DVAE speech prior pre-trained on clean speech signals with a noise model based on nonnegative matrix factorization, and we derive a variational expectation-maximization (VEM) algorithm to perform speech enhancement. The algorithm is presented with the most general DVAE formulation and is then applied with three specific DVAE models to illustrate the versatility of the framework. Experimental results show that the proposed DVAE-based approach outperforms its VAE-based counterpart, as well as several supervised and unsupervised noise-dependent baselines, especially when the noise type is unseen during training.

关键词： Speech enhancement Noise measurement Training Recording Inference algorithms Time-domain analysis Time series analysis Speech enhancement dynamical variational autoencoders nonnegative matrix factorization variational inference

来源：评论

学校读者我要写书评

暂无评论

A Benchmark of dynamical variational autoencoders applied to Speech Spectrogram Modeling 22

A Benchmark of Dynamical Variational Autoencoders applied to...

引用

Interspeech Conference

作者： Bie, Xiaoyu Girin, Laurent Leglaive, Simon Hueber, Thomas Alameda-Pineda, Xavier Univ Grenoble Alpes CNRS LJK INRIA F-38000 Grenoble France Univ Grenoble Alpes CNRS Grenoble INP GIPSA Lab F-38000 Grenoble France IETR Cent Supelec F-35576 Cesson Sevigne France

ISBN: (纸本)9781713836902

The variational Autoencoder (VAE) is a powerful deep generative model that is now extensively used to represent high-dimensional complex data via a low-dimensional latent space learned in an unsupervised manner. In the original VAE model, input data vectors are processed independently. In recent years, a series of papers have presented different extensions of the VAE to process sequential data, that not only model the latent space, but also model the temporal dependencies within a sequence of data vectors and corresponding latent vectors, relying on recurrent neural networks. We recently performed a comprehensive review of those models and unified them into a general class called dynamical variational autoencoders (DVAEs). In the present paper, we present the results of an experimental benchmark comparing six of those DVAE models on the speech analysis-resynthesis task, as an illustration of the high potential of DVAEs for speech modeling.

关键词： Speech signals modeling dynamical variational autoencoders speech spectrograms speech analysis-resynthesis

来源：评论

学校读者我要写书评

暂无评论

Unsupervised speech enhancement with deep dynamical generative speech and noise models 24

Unsupervised speech enhancement with deep dynamical generati...

引用

Interspeech Conference

作者： Lin, Xiaoyu Leglaive, Simon Girin, Laurent Alameda-Pineda, Xavier Univ Grenoble Alpes Inria Grenoble Rhone Alpes Grenoble France IETR Cent Supelec UMR CNRS 6164 Gif Sur Yvette France Univ Grenoble Alpes CNRS Grenoble INP GIPSA Lab Grenoble France

This work builds on a previous work on unsupervised speech enhancement using a dynamical variational autoencoder (DVAE) as the clean speech model and non-negative matrix factorization (NMF) as the noise model. We propose to replace the NMF noise model with a deep dynamical generative model (DDGM) depending either on the DVAE latent variables, or on the noisy observations, or on both. This DDGM can be trained in three configurations: noise-agnostic, noise-dependent and noise adaptation after noise-dependent training. Experimental results show that the proposed method achieves competitive performance compared to state-of-the-art unsupervised speech enhancement methods, while the noise-dependent training configuration yields a much more time-efficient inference process. Index Terms: Unsupervised speech enhancement, dynamical variational autoencoders, deep dynamical generative model.

关键词： Unsupervised speech enhancement dynamical variational autoencoders deep dynamical generative model

来源：评论

学校读者我要写书评

暂无评论

A Hierarchical Taxonomy For Deep State Space Models

A Hierarchical Taxonomy For Deep State Space Models

引用

2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025

作者： Tang, Shiqin Feng, Pengxing Yu, Shujian Dong, Yining Qin, S. Joe Department of Data Science City University of Hong Kong Hong Kong Department of Electrical Engineering City University of Hong Kong Hong Kong Department of Artificial Intelligence Vrije Universiteit Amsterdam Amsterdam Netherlands Department of Computing and Decision Science Lingnan University Hong Kong

ISBN: (纸本)9798350368741

Modeling nonlinear dynamical systems is a challenging task in fields such as speech processing, music generation, and video prediction. This paper introduces a hierarchical framework for Deep State Space Models (DSSMs), categorizing them by their conditional independence properties and Markov assumptions and positioning existing models within this framework, including the Stochastic Recurrent Neural Network (SRNN), variational Recurrent Neural Network (VRNN), and Recurrent State Space Model (RSSM). We discuss different options for the inference networks and demonstrate how integrating normalizing flows can enhance model flexibility by capturing complex distributions. Our work not only clarifies the relationships among existing models but also paves the way for the development of new, more effective approaches for modeling nonlinear dynamics. In particular, we propose the Autoregressive State Space Model (ArSSM) and evaluate its effectiveness in speech and polyphonic music modeling tasks. © 2025 IEEE.

关键词： Deep State Space Models dynamical variational autoencoders Hierarchical Taxonomy Normalizing Flows

来源：评论

学校读者我要写书评

暂无评论

Open-world structured sequence learning via dense target encoding

引用

INFORMATION SCIENCES 2024年 680卷

作者： Zhang, Qin Liu, Ziqi Li, Qincai Xiang, Haolong Yu, Zhizhi Chen, Junyang Zhang, Peng Chen, Xiaojun Shenzhen Univ Big Data Inst Coll Comp Sci & Software Engn Shenzhen 518060 Peoples R China Nanjing Univ Informat Sci & Technol Sch Software Nanjing 210044 Peoples R China Tianjin Univ Coll Intelligence & Comp Tianjin 300350 Peoples R China Guangzhou Univ Cyberspace Inst Adv Technol Guangzhou 510006 Peoples R China

Structured sequences are popularly used to describe graph data with time-evolving node features and edges. A typical real-world scenario of structured sequences is that unknown class labels continuously arrive and thus the training and testing often across different class spaces. This scenario is also referred to as the open-world learning problem on structured sequences . In this paper, we present a new Dense Open-world Structured Sequence Learning model (DOSSL for short) to learn graph streams in the open-world learning setting. To capture both structural and temporal information, DOSSL uses a GNN-based stochastic recurrent neural network for learning node representation in graph streams, then a truncated Laplacian distribution to describe the latent distribution of graph nodes, and a sampling function is used to generate node representations. Further, DOSSL learns dense target embeddings for the known classes to improve the compactness of known class distribution and reserve enough space for open-world unknown classes. The ultimate open-world classifier is optimized to detect the samples from unknown classes under the constraints of DVAE loss, label loss, class uncertainty loss, and dense target loss. Through empirical analysis conducted on real-world datasets, it has been demonstrated that the advanced technique known as DOSSL exhibits the ability to acquire precise node classifiers by harnessing the power of graph streams.

关键词： Open-world learning Structured sequence Dense target encoding dynamical variational autoencoders

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：