The simultaneous denoising and dereverberation for single-channel mixture speech under the complicated acoustic environment is considered to be a challengeable task. In this paper, we propose a denoising and dereverbe...
详细信息
ISBN:
(数字)9786165904773
ISBN:
(纸本)9786165904773
The simultaneous denoising and dereverberation for single-channel mixture speech under the complicated acoustic environment is considered to be a challengeable task. In this paper, we propose a denoising and dereverberation network named as D(2)Net in which a two-branch encoder (TBE) is designed to extract and selectively fuse features with different granularity. In addition, we design a global-local dual-path transformer (GLDPT) which introduces the local dense synthesizer attention (LDSA) in the dual-path transformer to improve the perception of local information. We evaluated our proposed D(2)Net and conducted ablation studies on the VoiceBank+DEMAND and WHAMR! datasets. Meanwhile, we chose three types of data in the WHAMR! dataset to verify the ability of the D(2)Net on the tasks of denoising-only, dereverberation-only, and simultaneous denoising and dereverberation, respectively. Experimental results show that our proposed model outperforms the comparative models, and all achieve better performance on the tasks of simultaneous denoising and dereverberation, dereverberation-only, and denoising-only, while keeping a small number of network parameters.
暂无评论