版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Ohio State Univ Dept Comp Sci & Engn Columbus OH 43210 USA Mitsubishi Elect Res Labs Cambridge MA 02139 USA Ohio State Univ Ctr Cognit & Brain Sci Columbus OH 43210 USA
出 版 物:《IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING》 (IEEE ACM Trans. Audio Speech Lang. Process.)
年 卷 期:2021年第29卷
页 面:2001-2014页
核心收录:
学科分类:0808[工学-电气工程] 08[工学] 0702[理学-物理学]
基 金:NIDCD [R01 DC012048] NSF [ECCS-1808932] Ohio Supercomputer Center
主 题:Geometry Array signal processing Speech processing Microphone arrays Covariance matrices Deep learning Training Complex spectral mapping speaker separation microphone array processing deep learning
摘 要:We propose multi-microphone complex spectral mapping, a simple way of applying deep learning for time-varying non-linear beamforming, for speaker separation in reverberant conditions. We aim at both speaker separation and dereverberation. Our study first investigates offline utterance-wise speaker separation and then extends to block-online continuous speech separation (CSS). Assuming a fixed array geometry between training and testing, we train deep neural networks (DNN) to predict the real and imaginary (RI) components of target speech at a reference microphone from the RI components of multiple microphones. We then integrate multi-microphone complex spectral mapping with minimum variance distortionless response (MVDR) beamforming and post-filtering to further improve separation, and combine it with frame-level speaker counting for block-online CSS. Although our system is trained on simulated room impulse responses (RIR) based on a fixed number of microphones arranged in a given geometry, it generalizes well to a real array with the same geometry. State-of-the-art separation performance is obtained on the simulated two-talker SMS-WSJ corpus and the real-recorded LibriCSS dataset.