We apply a fast kernel method for mask-based single-channel speech enhancement. Specifically, our method solves a kernel regression problem associated to a non-smooth kernel function (exponential power kernel) with a ...
详细信息
We apply a fast kernel method for mask-based single-channel speech enhancement. Specifically, our method solves a kernel regression problem associated to a non-smooth kernel function (exponential power kernel) with a highly efficient iterative method (EigenPro). Due to the simplicity of this method, its hyper-parameters such as kernel bandwidth can be automatically and efficiently selected using line search with subsamples of training data. We observe an empirical correlation between the regression loss (mean square error) and regular metrics for speech enhancement. This observation justifies our training target and motivates us to achieve lower regression loss by training separate kernel models for different frequency subbands. We compare our method with the state-of-the-art deep neural networks on mask-based HINT and TIMIT. Experimental results show that our kernel method consistently outperforms deep neural networks while requiring less training time.
Recent evidences suggest that the performance of kernel methods may match that of deep neural networks (DNNs), which have been the state-of-the-art approach for speech recognition. In this work, we present an improvem...
详细信息
ISBN:
(纸本)9781479999880
Recent evidences suggest that the performance of kernel methods may match that of deep neural networks (DNNs), which have been the state-of-the-art approach for speech recognition. In this work, we present an improvement of the kernel ridge regression studied in Huang et al., ICASSP 2014, and show that our proposal is computationally advantageous. Our approach performs classifications by using the one-vs-one scheme, which, under certain assumptions, reduces the costs of the one-vs-rest scheme by asymptotically a factor of c(2) in training time and c in memory consumption. Here, c is the number of classes and it is typically on the order of hundreds and thousands for speech recognition. We demonstrate empirical results on the benchmark corpus TIMIT. In particular, the classification accuracy is one to two percentages higher (in the absolute term) than the best of the kernel methods and of the DNNs reported by Huang et al, and the speech recognition accuracy is highly comparable.
Despite their theoretical appeal and grounding in tractable convex optimization techniques, kernel methods are often not the first choice for large-scale speech applications due to their significant memory requirement...
详细信息
ISBN:
(纸本)9781479928934
Despite their theoretical appeal and grounding in tractable convex optimization techniques, kernel methods are often not the first choice for large-scale speech applications due to their significant memory requirements and computational expense. In recent years, randomized approximate feature maps have emerged as an elegant mechanism to scale-up kernel methods. Still, in practice, a large number of random features is required to obtain acceptable accuracy in predictive tasks. In this paper, we develop two algorithmic schemes to address this computational bottleneck in the context of kernel ridge regression. The first scheme is a specialized distributed block coordinate descent procedure that avoids the explicit materialization of the feature space data matrix, while the second scheme gains efficiency by combining multiple weak random feature models in an ensemble learning framework. We demonstrate that these schemes enable kernel methods to match the performance of state of the art Deep Neural Networks on TIMIT for speech recognition and classification tasks. In particular, we obtain the best classification error rates reported on TIMIT using kernel methods.
暂无评论