A segmental VQ based efficient speech recognitionmethod is introduced in this *** method takesadvantage of the “initial consonant-final”structure ofChinese *** has less *** consumption but a high ARR(accuraterecogni...
详细信息
A segmental VQ based efficient speech recognitionmethod is introduced in this *** method takesadvantage of the “initial consonant-final”structure ofChinese *** has less *** consumption but a high ARR(accuraterecognition rate)compared with traditional HMM(hiddenMarkov model)or NN(neural network)***-scale test on the task of 11 Chinese digits recognitionshows that the WER(word error rate)can reach 1.91% inspeaker-dependent test and 11.69% in speaker-independenttest,which shows it’s more suitable for *** it has potential to be extensively appliedin monosyllable recognition.
Sub-syllables are the popular speech units in Mandarinspeech recognition. Since there are several confusion setsin Mandarin sub-syllables,it is hard to obtain highrccognition accuracy in acoustic *** this paper weprop...
详细信息
Sub-syllables are the popular speech units in Mandarinspeech recognition. Since there are several confusion setsin Mandarin sub-syllables,it is hard to obtain highrccognition accuracy in acoustic *** this paper wepropose a method of utterance verificaton to improve therecognition performance. The basic idea is to calculate thenormlalized log-likelihood score for each speech unit, and athreshold value for a specific speech unit is determinedthrough a training *** the decision to accept orreject a detected speech unit depends on this *** Mandarin sub-syllable recognition, Mel-scale cepstralcoefficients and log energy calculated for every frame arethe speech features for this task. The result of experimentsdemonstrates the effectiveness of our proposed method.
In this paper, Empirical mode decomposition is applied to analyze speech signal. Local extrema of each intrinsic mode function is extracted as a feature, by virtue of the feature to distinguish the sex of speaker, the...
详细信息
In this paper, Empirical mode decomposition is applied to analyze speech signal. Local extrema of each intrinsic mode function is extracted as a feature, by virtue of the feature to distinguish the sex of speaker, the result is not well. Then, the MFCC of each intrinsic mode function is extracted as feature which used in sex recognition, numerical experiment indicates that this method works well in sexing.
The pronunciation variability is an important issuethat must be faced with when developing practicalautomatic spontaneous speech recognition *** this paper, the factors that may affect therecognition performance are a...
详细信息
The pronunciation variability is an important issuethat must be faced with when developing practicalautomatic spontaneous speech recognition *** this paper, the factors that may affect therecognition performance are analyzed, including thosespecific to the Chinese language. By studying theINTTIAI/FINAL.(IF) characteristics of Chineselanguage and developing the Bayesian equation, wepropose the concepts of generalized INITIAI/FINAL(GIF) and generalized syllable(GS),the GIF modelingand the IF-GIF modeling, as well as thecontext-dependent pronunciation weighting, basedon a well phonetically transcribed seed database. Byusing these methods, the Chinese syllable error rate(SFR) was reduced by 6.3% and 4.2% compared withthe GIF modeling and IF modeling respectively whenthe language model, such as syllable or word N-gram,is not used. The effectiveness of these methodsis alsoproved when more data without the phonetictranseription is used to refine the acoustic modelusing the proposed iterative forced-alignment basedtranscribing (IFABT) method, achieving a 5.7% SERreduction.
It is very important in speech recognition that the training corpus for acoustic modeling can cover as many co-articulation phenomena as possible using a small set of prompting texts. In this paper, an algorithm to au...
详细信息
It is very important in speech recognition that the training corpus for acoustic modeling can cover as many co-articulation phenomena as possible using a small set of prompting texts. In this paper, an algorithm to automatically select a small set of prompting texts from a large scale of texts for Chinese speech recognition is proposed. By using this algorithm, not only each Chinese Initial or Final (IF) but also each di-IF could be well balanced in the finally selected texts.
This paper describes the design and implementation of a mandarin phonetic Embedded Speech Recognition (ESR) system based on handheld devices. By improving acoustic modeling and using speech enhancement properly, ESR c...
详细信息
This paper describes the design and implementation of a mandarin phonetic Embedded Speech Recognition (ESR) system based on handheld devices. By improving acoustic modeling and using speech enhancement properly, ESR can be compact, fast and robust enough with little Word Error Rate (WER) increasing. Another technology named smart AGC is also introduced, which can significantly improve the speech recognition performance in hard noisy environment.
An algorithm to separate overlapping speech signals is proposed in the paper. The first step of the algorithm is to detect the pitch of overlapping speech signals. The second step, the key of the method, is repeating ...
详细信息
An algorithm to separate overlapping speech signals is proposed in the paper. The first step of the algorithm is to detect the pitch of overlapping speech signals. The second step, the key of the method, is repeating pitch adding, which is done by adding the overlapping speech and the next period of it. And the last step is to judge the adscription of each formant of the overlapping speech. By making a little change to this algorithm, a similar method called repeating pitch subtracting is also proposed. Finally, some experiments for evaluating the performance of the algorithms are made. The input signals of these experiments include single word, double words and long sentences.
This paper describes environment compensation approach, which the environment model in the log domain is linearized with the first-order vector Taylor series (VTS) approximation. For achieving the better clean Gaussia...
详细信息
This paper describes environment compensation approach, which the environment model in the log domain is linearized with the first-order vector Taylor series (VTS) approximation. For achieving the better clean Gaussian mixture model (GMM), the clean speech is trained by using the cepstal features and the log-energy feature. By taking inverse discrete cosine transform (IDCT), the GMM in the log domain is obtained. Based on maximization of the joint likelihoods of the clean sequence and the noise sequence, the noise statistics can be effectively estimated by EM algorithm. Experimental Results show that it exhibits considerable improvements in the degraded environment.
In this paper,we perform the speech enhancement based onapproximate Karhunen-Loeve transform. The signal isrepresented by using wavelet packet based on a basis *** eigenvectors are firit evaluated from these bases,the...
详细信息
In this paper,we perform the speech enhancement based onapproximate Karhunen-Loeve transform. The signal isrepresented by using wavelet packet based on a basis *** eigenvectors are firit evaluated from these bases,then a linear estimator based on the eigenvectors is constructedand used to perform noise *** evaluate theperformance of this method by using the Aurora-2 database. TheSNR improvement is calculated. Some waveforms andspectrograms of euhanced speech are also shown. Finally, theenhanced speech is tested for speech recognition. Theseexperimental resuls show that this method achieves satisfactoryenhancement of speech.
This paper presents an overview of the architecture and algorithms implemented in France Telecom's text-independent speaker verification system. We describe individual components including the front-end processing...
详细信息
This paper presents an overview of the architecture and algorithms implemented in France Telecom's text-independent speaker verification system. We describe individual components including the front-end processing, speaker modeling, test and score normalization steps. An effective combination of these components in the system has achieved a better performance in the operating region of NIST interest - the low false alarm operating region. Overall performance results of the system evaluation based on the NIST 1998 and NIST 1999 speaker recognition corpus are presented.
暂无评论