检索结果-内蒙古大学图书馆

2025 IEEE International Conference on Acoustics, speech, and signal processing, ICASSP 2025

作者： Linke, Julian Steger, Sophie Steinwender, Philipp Kubin, Gernot Pernkopf, Franz Schuppler, Barbara Signal Processing and Speech Communication Laboratory Graz University of Technology Austria

ISBN: (纸本)9798350368741

This paper presents methods for prominence classification in conversational speech. Most existing tools rely on prosodic features extracted at syllable- or phone-level, performing well on read speech. This is not the case for conversational speech, where the quality of automatic segmentation is significantly worse. We introduce entropy-based chroma features, requiring only word-level segmentations. They perform equally well as a random forest classifier with prosodic features (requiring phone-level segmentation), with accuracies in the range of the human inter-rater agreement. We further use Bayesian deep learning to quantify the epistemic and aleatoric uncertainty of the prediction for prosodic and chroma features. Whereas the aleatoric uncertainty is, as expected, consistent with inter-rater agreement and similarly high for both feature sets, the epistemic uncertainty is lower for the classifier based on chroma features, indicating higher classification consistency across the corpus. © 2025 IEEE.

关键词： Austrian German chromagram conversational speech prosodic prominence uncertainty prediction

来源：评论

学校读者我要写书评

暂无评论

Reliable Belief Propagation: Recent Theoretical and Practical Advances 33

Reliable Belief Propagation: Recent Theoretical and Practica...

引用

33rd IEEE International Workshop on Machine Learning for signal processing, MLSP 2023

作者： Knoll, Christian Pernkopf, Franz Graz University of Technology Signal Processing and Speech Communication Laboratory Austria

ISBN: (纸本)9798350324112

Belief propagation (BP) is an effective approximate inference method but lacks theoretical guarantees for loopy graphs. We discuss the optimization landscape and the message dynamics and how this helps to understand the behavior of message passing algorithms. These insights suggest several improvements. Specifically, we consider iterative initialization strategies, optimized message scheduling methods, and structural modifications, to improve the convergence behavior and accuracy while maintaining the model's interpretability. We then evaluate the different modifications on signal detection problems in MIMO systems, which is a particularly challenging application for message passing algorithms. Our experimental results show consistent improvements over standard BP with minimal increase in computational burden. © 2023 IEEE.

关键词： Belief propagation

来源：评论

学校读者我要写书评

暂无评论

Weighting Function Modification Used for Phase Transform-Based Time Delay Estimation

引用

China Communications 2022年第11期19卷 241-256页

作者： Xue Yang Changchun Bao Zihao Cui Speech and Audio Signal Processing Laboratory Faculty of Information TechnologyBeijing University of TechnologyBeijing 100124China

Generalized cross-correlation is considered as the most straightforward time delay estimation *** on various weighting function,different methods were derived and a straightforward method,named phase transform(PHAT)has been widely *** is well-known for its robustness to reverberation and its sensitivity to noise,which is partly due to the fact that PHAT distributes same weights to the frequencies dominated by signal or *** alleviate this problem,two weighting functions are proposed in this *** taking a posteriori signal-to-noise ratio(SNR)into account to classify reliable and unreliable frequencies,different weights could be *** first proposed weighting function borrows the idea of binary mask and distributes same weights to frequencies in same set,whereas,the second one assigns weights based on coherence *** showed the robustness of proposed methods to reverberation and noise for improving the performance of time delay estimation through various criteria.

关键词： time delay estimation generalized crosscorrelation PHAT a posteriori SNR coherence function

来源：评论

学校读者我要写书评

暂无评论

Variational signal Separation for Automotive Radar Interference Mitigation

IEEE Transactions on Radar Systems

引用

IEEE Transactions on Radar Systems 2024年 2卷 1007-1026页

作者： Toth, Mate Leitinger, Erik Witrisal, Klaus Graz University of Technology Institute of Communication Networks and Satellite Communications Graz Austria Graz University of Technology Laboratory of Signal Processing and Speech Communication Graz Austria

Algorithms for mutual interference mitigation and object parameter estimation are a key enabler for automotive applications of frequency-modulated continuous wave (FMCW) radar. In this paper, we introduce a signal separation method to detect and estimate radar object parameters while jointly estimating and successively canceling the interference signal. The underlying signal model poses a challenge, since both the coherent radar echo and the non-coherent interference influenced by individual multipath propagation channels must be considered. Under certain assumptions, the model is described as a superposition of multipath channels weighted by parametric interference chirp envelopes. Inspired by sparse Bayesian learning (SBL), we employ an augmented probabilistic model that uses a hierarchical Gamma-Gaussian prior model for each multipath channel. Based on this, an iterative inference algorithm is derived using the variational expectation-maximization (EM) methodology. The algorithm is statistically evaluated in terms of object parameter estimation accuracy and robustness, indicating that it is fundamentally capable of achieving the Cramer-Rao lower bound (CRLB) with respect to the accuracy of object estimates and it closely follows the radar performance achieved when no interference is present. © 2023 IEEE.

关键词： Radar interference

来源：评论

学校读者我要写书评

暂无评论

Single Channel Source Separation in the Wild – Conversational speech in Realistic Environments 15

Single Channel Source Separation in the Wild – Conversation...

引用

15th ITG Conference on speech Communication

作者： Berger, Emil Schuppler, Barbara Hagmüller, Martin Pernkopf, Franz Signal Processing and Speech Communication Laboratory Graz University of Technology Graz Austria

ISBN: (纸本)9783800761654

Recent progress in Single Channel Source Separation (SCSS) using deep neural networks led to impressive performance gains while also increasing the model sizes, requiring tremendous data resources. This demand is covered by artificially composed speech and noise mixtures that do not capture real-life characteristics of conversations taking place in noisy environments. This paper introduces a new dataset containing task-oriented dialogues spoken in a realistic environment and presents experimental results for two SCSS architectures - the Conv-TasNet and the transformer-based MossFormer. Overall, we observe a severe drop in performance of up to 4.3dB (SI-SDR improvement) for the 8kHz variant of the Conv-TasNet. For speaker pairs of homogeneous sex, the difference is even higher of up to 6dB. Only the model using 16kHz sample rate performs on a comparable level for speaker pairs of mixed sex. Our findings illustrate the need of using realistic data for both, training and evaluating. © VDE VERLAG GMBH Berlin Offenbach.

关键词： Deep neural networks

来源：评论

学校读者我要写书评

暂无评论

Exploring Phonetic Features in Language Embeddings for Unseen Language Varieties of Austrian German 20

Exploring Phonetic Features in Language Embeddings for Unsee...

引用

20th Conference on Natural Language processing, KONVENS 2024

作者： Gutscher, Lorenz Pucher, Michael Signal Processing and Speech Communication Laboratory Graz University of Technology Austrian Research Institute for Artificial Intelligence Vienna Austria

Vectorized language embeddings of raw audio data improve tasks like language recognition, automatic speech recognition, and machine translation. Although embeddings exhibit high effectiveness in their respective tasks, unraveling explicit information or meaning encapsulated within the embeddings proves challenging. This study investigates a multilingual model’s ability to capture features from phonetic, articulatory, variety, and speaker categories from brief audio segments comprising five consecutive phones spoken by Austrian speakers. Within the employed model for extraction, German serves as one of the pre-trained languages used. However, the manner in which the model processes Austrian varieties presents an intriguing area for investigation. Using a k-nearest neighbor classifier, it is tested whether the encoded features are prominent in the embedding. While characteristics like variety are effectively classified, the accuracy of phone classification is particularly high for specific phones that are characteristic of the respective dialect/sociolect. ©2024 Association for Computational Linguistics.

关键词： Embeddings

来源：评论

学校读者我要写书评

暂无评论

Multi-Source Localization Method Based on the Log-Mel Spectrum Augmented Noise Subspace

Multi-Source Localization Method Based on the Log-Mel Spectr...

引用

2023 IEEE International Conference on signal processing, Communications and Computing, ICSPCC 2023

作者： Duan, Haiwei Bao, Changchun Zhou, Jing Beijing University of Technology Speech and Audio Signal Processing Laboratory Faculty of Information Technology Beijing China

ISBN: (纸本)9798350316728

The deep learning (DL) based direction-of-arrival (DOA) estimation is one of the research hotspots, and many methods have been proposed recently. However, most of those methods will face serious performance degradation, since the adverse impacts caused by the sources overlapping, noise and reverberation. One of the primary impacts is that the performance degradation is susceptible to some pre-extracted features that often result in spectral aliasing and peak confusion in a complex scenario. In this paper, a new feature stacked with the log-Mel spectrum and the noise subspace of the covariance matrix of the relative sound pressure is proposed and further used for the DL-based DOA estimation, which is referred to log-Mel spectrum augmented noise subspace (LMNS). The LMNS is more robust compared with the conventional features since it can represent both spectral and spatial information effectively. Meanwhile, the LMNS is used as the input feature and fed to a Conformer based residual network to map the spatial pseudo-spectrum, thereby the DOAs of the sound sources can be obtained. The experimental results show that the proposed method has better performance on the DOA estimation, which verifies that the proposed feature LMNS is more robust and effective in the scenarios with multi-source, noise and reverberation. © 2023 IEEE.

关键词： Reverberation

来源：评论

学校读者我要写书评

暂无评论

On the Role of Priors in Bayesian Causal Learning

IEEE Transactions on Artificial Intelligence

引用

IEEE Transactions on Artificial Intelligence 2025年第5期6卷 1439-1445页

作者： Geiger, Bernhard C. Kern, Roman Graz University of Technology Signal Processing and Speech Communication Laboratory Graz8010 Austria Know Center Research GmbH Graz8010 Austria Graz University of Technology Institute of Machine Learning and Neural Computation Graz8010 Austria

In this work, we investigate causal learning of independent causal mechanisms (ICMs) from a Bayesian perspective. Confirming previous claims from the literature, we show in a didactically accessible manner that unlabeled data (i.e., cause realizations) do not improve the estimation of the parameters defining the mechanism. Furthermore, we observe the importance of choosing an appropriate prior for the cause and mechanism parameters, respectively. Specifically, we show that a factorized prior results in a factorized posterior, which resonates with Janzing and Schölkopf's definition of ICMs via the Kolmogorov complexity of the involved distributions and with the concept of parameter independence of Heckerman et al. © 2020 IEEE.

关键词： Contrastive Learning

来源：评论

学校读者我要写书评

暂无评论

Predictive Packet Loss Concealment Method Based on Conformer and Temporal Convolution Module

Predictive Packet Loss Concealment Method Based on Conformer...

引用

2023 IEEE International Conference on signal processing, Communications and Computing, ICSPCC 2023

作者： Zhao, Yunhao Bao, Changchun Yang, Xue Beijing University of Technology Speech and Audio Signal Processing Laboratory Faculty of Information Technology Beijing100124 China

ISBN: (纸本)9798350316728

To enhance the perceptual quality of speech signal, the Packet Loss Concealment (PLC) technique focuses on recovering the lost speech caused by network latency and jitter. In practical applications, the PLC methods typically employ a predictive process that relies on previously received speech signal to recover the lost speech without introducing additional delay. In this paper, we propose a predictive PLC network, which employs the Conformer and temporal convolution module to fully exploit the contextual dependencies and to better predict the lost speech. In addition, our proposed network can be directly employed as the generator and combined with appropriate discriminative networks, forming a Generative Adversarial Network (GAN) paradigm that can enhance the perceptual quality of the recovered speech signal. Experimental results demonstrate that without any discriminative network, the proposed method demonstrates impressive results in PLC. Under the GAN paradigm, further improvement can be observed and our proposed method outperforms several baseline methods at different packet loss rates. © 2023 IEEE.

关键词： Generative adversarial networks

来源：评论

学校读者我要写书评

暂无评论

A Time-domain Packet Loss Concealment Method by Designing Transformer-based Convolutional Recurrent Network

A Time-domain Packet Loss Concealment Method by Designing Tr...

引用

2023 IEEE International Conference on signal processing, Communications and Computing, ICSPCC 2023

作者： Li, Wenwen Bao, Changchun Zhou, Jing Yang, Xue Beijing University of Technology Speech and Audio Signal Processing Laboratory Faculty of Information Technology Beijing 100124 China

ISBN: (纸本)9798350316728

Due to the network constrain, the packet loss is inevitable in the real-Time speech communication. The packet loss often leads to the short interruption of the voice communication, which seriously affects the quality of speech communication. In this paper, a transformer-based convolutional recurrent network (CRN-Trans) is designed for packet loss concealment (PLC) in time domain. In the CRN-Tans, the convolutional layers are used to extract the high-level feature of speech signal in each frame. The transformer and long short-Term memory (LSTM) are combined to model the sequential information along the time dimension. Furthermore, using the self-Attention mechanism, the transformer is used to compute the similarities between different positions in the long sequence and dynamically generate the weights for different positions. Thus, the lost packets can be better predicted by explicitly providing attention weights for the previous frames. The experimental results show that the proposed method has better performance in perceptual evaluation of speech quality (PESQ) and short-Term objective intelligibility (STOI). © 2023 IEEE.

关键词： Long short-term memory

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：