A new Non-negative matrix factorization(NMF) based algorithm is proposed for single-channel speech separation with a prior known speakers, which aims to better model the spectral structure and temporal continuity of s...
详细信息
A new Non-negative matrix factorization(NMF) based algorithm is proposed for single-channel speech separation with a prior known speakers, which aims to better model the spectral structure and temporal continuity of speech signal. First, NMF and k-means clustering are employed to obtain multiple small dictionaries as well as a state sequence that describes the temporal dynamics between these dictionaries for each ***, a Factorial conditional random field(FCRF) model is trained using the state sequences and dictionaries to jointly model the temporal continuity of two speakers' mixed signal for separation. Experiments show that the proposed algorithm outperforms the baselines with respect to all metrics, for example sparse NMF(+1.12 dB SDR, +2.37 dB SIR, +0.40 dB SAR, +0.2 MOS), nonnegative factorial hidden Markov model(+2.04 dB SDR,+4.26 dB SIR, +0.62 dB SAR, +1.0 MOS) and standard NMF(+2.8 dB SDR, +5.08 dB SIR, +1.06 dB SAR, +1.2 MOS).
近年来,非负矩阵分解(Non-negative matrix factorization,NMF)被广泛应用于单通道语音分离问题。然而,标准的NMF算法假设语音的相邻帧之间是相互独立的,不能表征语音信号的时间连续性信息。为此,本文提出了一种新的语音分离算法,首先将NMF和k均值聚类结合对纯净语音的频谱结构以及时间连续性进行建模,然后利用得到的模型训练因子条件随机场(factorial conditional random field,FCRF),进而对混合语音信号进行分离。结果表明本文提出的算法相比于没有考虑语音时间连续特性的基于NMF的算法,如Active-Set Newton Algorithm(ASNA),在客观指标上有明显提高。
暂无评论