The length of the test speech greatly influences the performance of GMM-UBM based text-independent speaker recognition system, for example when the length of valid speech is as short as 1~5 seconds, the performance de...
详细信息
The length of the test speech greatly influences the performance of GMM-UBM based text-independent speaker recognition system, for example when the length of valid speech is as short as 1~5 seconds, the performance decreases significantly because the GMM-UBM based speaker recognition method is a statistical one, of which sufficient data is the foundation. Considering that the use of text information will be helpful to speaker recognition, a multi-model method is proposed to improve short-utterance speaker recognition (SUSR) in Chinese. We build a few phoneme class models for each speaker to represent different parts of the characteristic space and fuse the scores to fit the test data on the models with the purpose of increasing the matching degree between training models and test utterance. Experimental results showed that the proposed method achieved a relative EER reduction of about 26% compared with the traditional GMM-UBM method.
In this paper, we formalize the task of finding a knowledge base entry that a given named entity mention refers to, namely entity linking, by identifying the most "important" node among the graph nodes repre...
详细信息
Recent research usually models POS tagging as a sequential labeling problem, in which only local context features can be used. Due to the lack of morphological inflections, many tagging ambiguities in Chinese are diff...
详细信息
Word Sense Disambiguation (WSD) is one of the fundamental natural languageprocessing tasks. However, lack of training corpora is a bottleneck to construct a high accurate all-words WSD system. Annotating a large-scal...
详细信息
Annotating Named Entity Recognition (NER) training corpora is a costly process but necessary for supervised NER systems. This paper presents an approach to generate large-scale Chinese NER training data from an Englis...
详细信息
We address the memory problem of maximum entropy language models(MELM) with very large feature sets. Randomized techniques are employed to remove all large, exact data structures in MELM implementations. To avoid the ...
详细信息
We propose an efficient way to train maximum entropy language models (MELM) and neural network language models (NNLM). The advantage of the proposed method comes from a more robust and efficient subsampling technique....
详细信息
We propose to improve accented speech recognition performance by using asymmetric acoustic model. Our proposed model is generated based on reliable accent specific units and acoustic model reconstruction. The reliable...
详细信息
We propose to improve accented speech recognition performance by using asymmetric acoustic model. Our proposed model is generated based on reliable accent specific units and acoustic model reconstruction. The reliable units are extracted with time alignment recognition to cover accent variations at both acoustic and phonetic levels. The asymmetric acoustic model is obtained through selective decision tree merging together with dynamic Gaussian component selection in model reconstruction. The improved resolution of our proposed model is able to handle different levels of accented variations at different degrees. The effectiveness of our approach was evaluated on a typical Chinese accent. Our system outperforms traditional acoustic model reconstruction and MAP adaptation approaches by 8.28% and 7.14%, relatively on Syllable Error Rate (SER) reduction without sacrificing the performance on standard Mandarin speech.
In recent work, Lyu and Simoncelli [1] introduced radial Gaussianization (RG) as a very efficient procedure for transforming n-dimensional random vectors into Gaussian vectors with independent and identically distribu...
详细信息
In recent work, Lyu and Simoncelli [1] introduced radial Gaussianization (RG) as a very efficient procedure for transforming n-dimensional random vectors into Gaussian vectors with independent and identically distributed (i.i.d.) components. This entails transforming the norms of the data so that they become chi-distributed with n degrees of freedom. A necessary requirement is that the original data are generated by an isotropic distribution, that is, their probability density function (pdf) is constant over surfaces of n-dimensional spheres (or, more general, n-dimensional ellipsoids). The case of biases in the data, which is of great practical interest, is studied here; as we demonstrate with experiments, there are situations in which even very small amounts of bias can cause RG to fail. This becomes evident especially when the data form clusters in low-dimensional manifolds. To address this shortcoming, we propose a two-step approach which entails (i) first discovering clusters in the data and removing the bias from each, and (ii) performing RG on the bias-compensated data. In experiments with synthetic data, the proposed bias compensation procedure results in significantly better Gaus-sianization than the non-compensated RG method.
In this paper, we present an in-car Chinese noise corpus that can be used in simulating complicated car environment for robust speech recognition research and experiment. The corpus was collected in mainland China in ...
详细信息
暂无评论