A segmental VQ based efficient speech recognitionmethod is introduced in this *** method takesadvantage of the “initial consonant-final”structure ofChinese *** has less *** consumption but a high ARR(accuraterecogni...
详细信息
A segmental VQ based efficient speech recognitionmethod is introduced in this *** method takesadvantage of the “initial consonant-final”structure ofChinese *** has less *** consumption but a high ARR(accuraterecognition rate)compared with traditional HMM(hiddenMarkov model)or NN(neural network)***-scale test on the task of 11 Chinese digits recognitionshows that the WER(word error rate)can reach 1.91% inspeaker-dependent test and 11.69% in speaker-independenttest,which shows it’s more suitable for *** it has potential to be extensively appliedin monosyllable recognition.
Sub-syllables are the popular speech units in Mandarinspeech recognition. Since there are several confusion setsin Mandarin sub-syllables,it is hard to obtain highrccognition accuracy in acoustic *** this paper weprop...
详细信息
Sub-syllables are the popular speech units in Mandarinspeech recognition. Since there are several confusion setsin Mandarin sub-syllables,it is hard to obtain highrccognition accuracy in acoustic *** this paper wepropose a method of utterance verificaton to improve therecognition performance. The basic idea is to calculate thenormlalized log-likelihood score for each speech unit, and athreshold value for a specific speech unit is determinedthrough a training *** the decision to accept orreject a detected speech unit depends on this *** Mandarin sub-syllable recognition, Mel-scale cepstralcoefficients and log energy calculated for every frame arethe speech features for this task. The result of experimentsdemonstrates the effectiveness of our proposed method.
Dealing with polyphones is an important part of Chinesetext-to-speech *** the pronunciation of aChinese character is directly related to the meaning of it,an algorithm based on semantic calculation using How-Netis int...
详细信息
Dealing with polyphones is an important part of Chinesetext-to-speech *** the pronunciation of aChinese character is directly related to the meaning of it,an algorithm based on semantic calculation using How-Netis introduced to determine the pronunciations of thepolyphones in new words,which hasn’t appeared in thepolyphone list,the polyphone knowledge base or *** experiment results prove it can do goodperformance.
The design of spoken dialogue systems is more of an artthan of science or engineering,*** design of thecontrol component—dialogue *** problemsof usability and portabiity still *** is partlybecause there are gaps betw...
详细信息
The design of spoken dialogue systems is more of an artthan of science or engineering,*** design of thecontrol component—dialogue *** problemsof usability and portabiity still *** is partlybecause there are gaps between dialogue modeling anddialogue *** analzing presen dialoguemodels and *** models,we propose aganeric dialogue model for dialogue management in task-oricnted (information-seeking)spoken dialgue systems,which combines both interaction patterns and task *** accounts for both statics and dynamics in inrormation-seeking dialogues and promises to bridge the gaps.
In this paper,we perform the speech enhancement based onapproximate Karhunen-Loeve transform. The signal isrepresented by using wavelet packet based on a basis *** eigenvectors are firit evaluated from these bases,the...
详细信息
In this paper,we perform the speech enhancement based onapproximate Karhunen-Loeve transform. The signal isrepresented by using wavelet packet based on a basis *** eigenvectors are firit evaluated from these bases,then a linear estimator based on the eigenvectors is constructedand used to perform noise *** evaluate theperformance of this method by using the Aurora-2 database. TheSNR improvement is calculated. Some waveforms andspectrograms of euhanced speech are also shown. Finally, theenhanced speech is tested for speech recognition. Theseexperimental resuls show that this method achieves satisfactoryenhancement of speech.
*** 1978,when Panasonic started speechtechnology R&*** Panasonic speech technology grouphave been working to realize user-friendly man-machineinterface with speech technologies including ASR,TTSand spoken dialogue...
详细信息
*** 1978,when Panasonic started speechtechnology R&*** Panasonic speech technology grouphave been working to realize user-friendly man-machineinterface with speech technologies including ASR,TTSand spoken dialogue *** then till today,we have focused our R&D activities on bringing speechtechnology merits to our customers through various kindsof Panasonic consumer electronics products for 23 *** this *** mainly highlight our R&Dactivities of ASR *** following section 2,wefocus strategies and results of our past R&D *** 3,we mention our future vision of speechtechnology R&D for the new century.
The pronunciation variability is an important issuethat must be faced with when developing practicalautomatic spontaneous speech recognition *** this paper, the factors that may affect therecognition performance are a...
详细信息
The pronunciation variability is an important issuethat must be faced with when developing practicalautomatic spontaneous speech recognition *** this paper, the factors that may affect therecognition performance are analyzed, including thosespecific to the Chinese language. By studying theINTTIAI/FINAL.(IF) characteristics of Chineselanguage and developing the Bayesian equation, wepropose the concepts of generalized INITIAI/FINAL(GIF) and generalized syllable(GS),the GIF modelingand the IF-GIF modeling, as well as thecontext-dependent pronunciation weighting, basedon a well phonetically transcribed seed database. Byusing these methods, the Chinese syllable error rate(SFR) was reduced by 6.3% and 4.2% compared withthe GIF modeling and IF modeling respectively whenthe language model, such as syllable or word N-gram,is not used. The effectiveness of these methodsis alsoproved when more data without the phonetictranseription is used to refine the acoustic modelusing the proposed iterative forced-alignment basedtranscribing (IFABT) method, achieving a 5.7% SERreduction.
According to the Hong Kong Tourist Association, thenumber of tourists from mainland is increasing rapidly after1997. Mandarin becomes a very important language in HongKong. Therefore, there is a need to create a machi...
详细信息
According to the Hong Kong Tourist Association, thenumber of tourists from mainland is increasing rapidly after1997. Mandarin becomes a very important language in HongKong. Therefore, there is a need to create a machine to translateMandarin to Cantonese. In this paper, a speech-to-speechtranslation system from Mandarin to Cantonese,for a domainspecific application,Tourist Information Inquiry,will beintroduced. A Mandarin Speech Recognizer and a CantoneseSpeech Synthesizer have been implemented within the specificdomain. The rules of Mandarin text to Cantonese textconversion have been developed.A Mandarin-to-CantoneseDictionary is *** Model and Prosodic features areincorporated to improve recognition performance and speechquality.
As a powerful method modeling stochasticprocesses,HMM is being used successfully inspeech recognition tasks,on condition of anefficient strategy of parameters *** paradigm of the parameter-estimationmethods is Baum we...
详细信息
As a powerful method modeling stochasticprocesses,HMM is being used successfully inspeech recognition tasks,on condition of anefficient strategy of parameters *** paradigm of the parameter-estimationmethods is Baum welch algorithm,based on its firmfoundation of E-M non-supervisor parameterestimation procedure Nevertheless its glory Successin many real systems,Baum welth algorithm revealsits inherent limit:a problem of calculation-overflow,serious sometimes,especially with the emergence ofLarge Vocabulary Continuous Speech Recognition(LVCSR)system,or when data-sparseness have tobe *** paper analyses the problem ofcalculation-overflow in certain new aspects,following by two kinds of robust overflow-resistantstrategies adopted in our gallina system,Subaumelch algorithm and Log-SwappedForward-Backward algorithm,compared in respectof efficiency and accurracy,and evaluated by our testcorpus-863CSL.
Human speech mechanisms are twofold:speech productionand speech *** paper introduces two studiescarried out based on the production and the *** of them is a method to estimatearticulatory targets from speech sounds vi...
详细信息
Human speech mechanisms are twofold:speech productionand speech *** paper introduces two studiescarried out based on the production and the *** of them is a method to estimatearticulatory targets from speech sounds via a physiologicalarticulatory *** potentially clarities certainproblems with the current speech recognition *** estimation,acoustical parameters arc considered as afunction of the articulatory targets for model *** location is estimated based on a comparison ofacoustical parameters between real speech sound andsynthetic sound corresponding to the (?)*** proposedestimation method was evaluated using *** suggests that our physiological articulatorymodel can be a valuable tool tbr the inverse *** model included in this paper is built based onknowledge about human psychoacoustics and auditoryphysiology to enhance speech by detecting and thencanceling *** attention is paid to reducing noiseby using a spatial filtering *** technique adoptsconcepts of the cancellation *** results showthat the spatial filtering is useful in enhancing *** *** filtering method can be used effectively at thefront-end of automatic speech recognition systems.
暂无评论