Objective quality assessment aims towards evaluating the perceptual quality of a signal using a machine-based algorithm. Due to different challenges involved in the subjective evaluation of speech quality, it is neces...
详细信息
Objective quality assessment aims towards evaluating the perceptual quality of a signal using a machine-based algorithm. Due to different challenges involved in the subjective evaluation of speech quality, it is necessary to develop objective measures. The goal of any non-intrusive quality assessment metric for noise-suppressed speech is to assess the quality of a noise-suppressed signal in the absence of any clean reference signal. As per the ITU-T P.835 recommendations, the quality assessment of noise-suppressed speech involves predicting three quality scores, namely, signal quality, background quality, and overall quality score, and hence, considered in this study. In recent literature, the non-intrusive quality assessment problem is presented as a regression problem, in which the mapping between a set of acoustic features and corresponding quality scores is found using a perceptual model. Recently, we proposed the use of Deep autoencoder (DAE) features and subband autoencoder (SBAE) features for acoustic representation and an Artificial Neural Network (ANN) as a regression model. DAE and SBAE are variants of autoencoder architecture that have bottleneck structure in the hidden layers. Such architecture represents the class of generalized nonlinear Principal Component Analysis (PCA) that guarantees reconstruction of the input features with arbitrary accuracy. Both the features (DAE and SBAE) are extracted using unsupervised deep learning architectures, and they demonstrated better performance than the state-of-the-art spectral feature set, namely, Mel Filterbank Energies (FBEs). In this paper, we present more detailed analysis of previously proposed features, i.e., DAE and SBAE features, and analyze the usefulness of these features in predicting signal as well as background quality scores in addition to the overall quality score. We compare the performance of all the three features with each other as well as with current ITU-T P.563 metric for non-intrusive speec
暂无评论