咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Speaker Extraction with Verifi... 收藏

Speaker Extraction with Verification of Present and Absent Target Speakers

结合目标说话人存在与否验证的说话人提取

作     者:Zhang, Ke Borsdorf, Marvin Liu, Tianchi Wang, Shuai Wei, Yangjie Li, Haizhou 

作者机构:Key Laboratory of Intelligent Computing in Medical Image Northeastern University Shenyang110819 China  Shenzhen Guangdong518000 China Machine Listening Lab University of Bremen Bremen28359 Germany Department of Electrical and Computer Engineering National University of Singapore Singapore119077 Singapore 

出 版 物:《Journal of Shanghai Jiaotong University (Science)》 (J. Shanghai Jiaotong Univ. Sci.)

年 卷 期:2025年

页      面:1-6页

核心收录:

基  金:Foundation item: the Deutsche Forschungsgemeinschaft (DFG  German Research Foundation) under Germany\u2019s Excellence Strategy (University Allowance  EXC 2077  University of Bremen)  the National Natural Science Foundation of China (Nos. 62401377 and 62271432)  and the Internal Project of Shenzhen Research Institute of Big Data (No. T00120220002) 

主  题:Speech recognition 

摘      要:Target speaker extraction (TSE) models are expected to extract the target speech from a cocktail party mixture signal. When only trained with present target speaker samples (PT), these models output noise in the absence of the target speaker (AT). One may enhance the TSE quality by providing the information about the PT and AT. However, the detection of the target speaker is not perfect. In this paper, we propose a new model, TSEV, which performs target speaker extraction and speaker verification simultaneously. The TSEV model outputs an extracted speech and generates two speaker embeddings per inference to detect the target speaker. By sharing the speaker encoder and low-level modules, the speaker verification task can be performed in low signal-to-noise ratio scenarios. We train the TSEV model on multi-talker PT and AT conditions with fully overlapped speech. Experiments verify the superiority of jointly performing two tasks in the proposed model. The TSEV model achieves better verification performance without degrading the extraction performance compared with the baseline. © Shanghai Jiao Tong University 2025.

读者评论 与其他读者分享你的观点