Recent research usually models POS tagging as a sequential labeling problem, in which only local context features can be used. Due to the lack of morphological inflections, many tagging ambiguities in Chinese are diff...
详细信息
Word Sense Disambiguation (WSD) is one of the fundamental natural languageprocessing tasks. However, lack of training corpora is a bottleneck to construct a high accurate all-words WSD system. Annotating a large-scal...
详细信息
Annotating Named Entity Recognition (NER) training corpora is a costly process but necessary for supervised NER systems. This paper presents an approach to generate large-scale Chinese NER training data from an Englis...
详细信息
We address the memory problem of maximum entropy language models(MELM) with very large feature sets. Randomized techniques are employed to remove all large, exact data structures in MELM implementations. To avoid the ...
详细信息
We propose an efficient way to train maximum entropy language models (MELM) and neural network language models (NNLM). The advantage of the proposed method comes from a more robust and efficient subsampling technique....
详细信息
We propose to improve accented speech recognition performance by using asymmetric acoustic model. Our proposed model is generated based on reliable accent specific units and acoustic model reconstruction. The reliable...
详细信息
We propose to improve accented speech recognition performance by using asymmetric acoustic model. Our proposed model is generated based on reliable accent specific units and acoustic model reconstruction. The reliable units are extracted with time alignment recognition to cover accent variations at both acoustic and phonetic levels. The asymmetric acoustic model is obtained through selective decision tree merging together with dynamic Gaussian component selection in model reconstruction. The improved resolution of our proposed model is able to handle different levels of accented variations at different degrees. The effectiveness of our approach was evaluated on a typical Chinese accent. Our system outperforms traditional acoustic model reconstruction and MAP adaptation approaches by 8.28% and 7.14%, relatively on Syllable Error Rate (SER) reduction without sacrificing the performance on standard Mandarin speech.
In recent work, Lyu and Simoncelli [1] introduced radial Gaussianization (RG) as a very efficient procedure for transforming n-dimensional random vectors into Gaussian vectors with independent and identically distribu...
详细信息
In recent work, Lyu and Simoncelli [1] introduced radial Gaussianization (RG) as a very efficient procedure for transforming n-dimensional random vectors into Gaussian vectors with independent and identically distributed (i.i.d.) components. This entails transforming the norms of the data so that they become chi-distributed with n degrees of freedom. A necessary requirement is that the original data are generated by an isotropic distribution, that is, their probability density function (pdf) is constant over surfaces of n-dimensional spheres (or, more general, n-dimensional ellipsoids). The case of biases in the data, which is of great practical interest, is studied here; as we demonstrate with experiments, there are situations in which even very small amounts of bias can cause RG to fail. This becomes evident especially when the data form clusters in low-dimensional manifolds. To address this shortcoming, we propose a two-step approach which entails (i) first discovering clusters in the data and removing the bias from each, and (ii) performing RG on the bias-compensated data. In experiments with synthetic data, the proposed bias compensation procedure results in significantly better Gaus-sianization than the non-compensated RG method.
In this paper, we present an in-car Chinese noise corpus that can be used in simulating complicated car environment for robust speech recognition research and experiment. The corpus was collected in mainland China in ...
详细信息
This paper introduces the sparse multilayer perceptron (SMLP) which learns the transformation from the inputs to the targets as in multilayer perceptron (MLP) while the outputs of one of the internal hidden layers is ...
详细信息
This paper introduces the sparse multilayer perceptron (SMLP) which learns the transformation from the inputs to the targets as in multilayer perceptron (MLP) while the outputs of one of the internal hidden layers is forced to be sparse. This is achieved by adding a sparse regularization term to the cross-entropy cost and learning the parameters of the network to minimize the joint cost. On the TIMIT phoneme recognition task, the SMLP based system trained using perceptual linear prediction (PLP) features performs better than the conventional MLP based system. Furthermore, their combination yields a phoneme error rate of 21.2percent, a relative improvement of 6.2percent over the baseline.
Delimiting the most informative voice segments of an acoustic signal is often a crucial initial step for any speechprocessing system. In the current work, we propose a novel segmentation approach based on a perceptio...
详细信息
Delimiting the most informative voice segments of an acoustic signal is often a crucial initial step for any speechprocessing system. In the current work, we propose a novel segmentation approach based on a perception-based measure of speech intelligibility. Unlike segmentation approaches based on various forms of voice-activity detection (VAD), the proposed segmentation approach exploits higher-level perceptual information about the signal intelligibility levels. This classification based on intelligibility estimates is integrated into a novel multistream framework for automatic speaker recognition task. The multistream system processes the input acoustic signal along multiple independent streams reflecting various levels of intelligibility and then fusing the decision scores from the multiple steams according to their intelligibility contribution. Our results show that the proposed multistream system achieves significant improvements both in clean and noisy conditions when compared with a baseline and a state-of-the-art voice-activity detection algorithm.
暂无评论