咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Simultaneous Speech Extraction... 收藏
arXiv

Simultaneous Speech Extraction for Multiple Target Speakers under the Meeting Scenarios

作     者:Zeng, Bang Suo, Hongbing Wan, Yulong Li, Ming 

作者机构:School of Computer Science Wuhan University Wuhan China Suzhou Municipal Key Laboratory of Multimodal Intelligent Systems Duke Kunshan University Kunshan China Data & AI Engineering System OPPO Beijing China 

出 版 物:《arXiv》 (arXiv)

年 卷 期:2022年

核心收录:

主  题:Machine learning 

摘      要:The common target speech separation directly estimate the target source, ignoring the interrelationship between different speakers at each frame. We propose a multiple-target speech separation model (MTSS) to simultaneously extract each speaker s voice from the mixed speech rather than just optimally estimating the target source. Moreover, we propose a speaker diarization (SD) aware MTSS system (SD-MTSS), which consists of a SD module and MTSS module. By exploiting the TSVAD decision and the estimated mask, our SD-MTSS model can extract the speech signal of each speaker concurrently in a conversational recording without additional enrollment audio in advance. Experimental results show that our MTSS model achieves 1.38dB SDR, 1.34dB SI-SDR, and 0.13 PESQ improvements over the baseline on the WSJ0- 2mix-extr dataset, respectively. The SD-MTSS system makes 19.2% relative speaker dependent character error rate (CER) reduction on the Alimeeting dataset. Copyright © 2022, The Authors. All rights reserved.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分