In this article the goals, recent results of the on-going projects of NSF/ARPA sponsored-joint HLT program are introduced. The current progress and future plan on HLT in the European Union are also introduced.
In this article the goals, recent results of the on-going projects of NSF/ARPA sponsored-joint HLT program are introduced. The current progress and future plan on HLT in the European Union are also introduced.
The homogeneous Hidden Markov Model (HMM) (or automatic speech recognition has been widely used today. But some significant defects of this method limit its performance and praclical applications. One of these defects...
详细信息
The homogeneous Hidden Markov Model (HMM) (or automatic speech recognition has been widely used today. But some significant defects of this method limit its performance and praclical applications. One of these defects is that the stability of duration distribution of the speech states which is verified by experiments[1] is not correctly considered in the model In this paper, a duration distribution based inhomogoneous. HMM(DDBHMM) recognition algorithm is introduced. A speaker-independent isolated-word Chinese speech recognition experiment is done and shows that DDBHMM reduces the error rale by about 20%.
We describe new methods for continuous putonghua speech recognition. We have augmented the IBM HMM-based continuous speech recognition system with the following features: First, we treat tones in putonghua as attribu...
详细信息
We describe new methods for continuous putonghua speech recognition. We have augmented the IBM HMM-based continuous speech recognition system <1-3> with the following features: First, we treat tones in putonghua as attributes of certain phonemes, instead of syllables. We call those phonemes with tone tonemes. Second, instantaneous pitch is treated as a variable in the acoustic feature vector, in the same way as cepstra or energy. Third, by designing a set of word-segmentation rules to convert the continuous Chinese text into segmented text, the trigram language model works effectively. By applying those new methods, a speaker-independent, very-large-vocabulary continuous putonghua dictation system can be constructed.
A multi-speaker continuous Putonghua recognizer has been developed composing of 20 speaker-dependent rec-ognizers as sub-systems. Each sub-system is a network of hidden Markov models modeling triphones as the fundamen...
详细信息
A multi-speaker continuous Putonghua recognizer has been developed composing of 20 speaker-dependent rec-ognizers as sub-systems. Each sub-system is a network of hidden Markov models modeling triphones as the fundamental speech units. Over 3GB of speech data have been collected for training from twenty native Putonghua speakers reading carefully designed texts trying to include all phone-to-phone transitions in Putonghua. An A* backward search yields the n-best syllable sequences over the HMMnet for each unknown input utterance which is then passed down to a language model for post-processing. The most suitable word sequence is determined by means of the bigrarn statistics of some 200 word classes covering a vocabulary of over 80,000 words. An enrolment process is required for each new user to select the most suitable speaker-dependent system among the 20 sub-systems according to their recognition performances on a small quantity of speech data collected from the user. The recognition accuracy is usually rather low at the beginning. By correcting the recognition errors and keeping the recognized utterances for periodic system re-train, a recognizer tailored for the user can be arrived at.
暂无评论