版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Southeast Univ Minist Educ Sch Automat Lab Measurement & Control Complex Syst Engn Nanjing 210096 Jiangsu Peoples R China
出 版 物:《IEEE ACCESS》 (IEEE Access)
年 卷 期:2019年第7卷
页 面:142009-142021页
核心收录:
基 金:Natural Science Foundation of Jiangsu Province [BE2016805] National Natural Science Foundation of China [61503081, 61473079]
主 题:Emotion recognition Speech recognition Wavelet packets Feature extraction Robustness Wavelet analysis Acoustics Robust noise speech emotion recognition LW-WPCC feature feature extraction algorithm bio-modal emotion recognition
摘 要:Noise is an unneglectable problem in emotion recognition if we want to put it into practice. First, aiming at the problem of noise in speech, we design a new acoustic feature, Long time frame Analysis Weighted Wavelet Packet Cepstral Coefficient (LW-WPCC), for better robustness. To extract LW-WPCC feature, first the best wavelet packet basis is constructed. On the basis of this, a robust wavelet packet Cepstral Coefficient is extracted by combining short time frame analysis with long time frame analysis. After that, we introduce a sub-band spectral center-of-mass parameter with good robustness to additive noise and propose an extraction algorithm of LW-WPCC. Through experiments on speech emotion recognition of different SNR levels, it is shown that our proposed method shows better noise robustness and performance on speech emotion recognition. Whats more, as facial expressions will not be affected by noise, we do bio-modal emotion recognition based on audio-visual data to improve robustness by making a decision-level fusion. Experiments based on audio-visual data are conducted to evaluate efficiency of our method. Results show that bio-modal emotion recognition based on audio-visual data can improve robustness and achieve better performance by benefiting from different kinds of emotion data.