Regional accents in Mandarin speech result mostly from partial phone changes due to the interlanguage system of non-native speakers. We propose partial change accent models based on accent-specific units with acoustic...
详细信息
ISBN:
(纸本)0780379802
Regional accents in Mandarin speech result mostly from partial phone changes due to the interlanguage system of non-native speakers. We propose partial change accent models based on accent-specific units with acoustic model reconstruction for accented Mandarin speech recognition. We use phonological rules of dialectical pronunciations together with likelihood ratio test to model actual accented variants rather than inherent phonetic confusions, recognizer errors or other data-specific variations. In order to avoid model confusion and lexical confusion with the increased unit inventory, we improve model resolution through reconstructing the pre-trained acoustic model by using the Gaussian mixtures from accent-specific unit models, where the accent-specific units are treated as hidden models. The effectiveness of this approach is evaluated on Cantonese accented Mandarin speech. Our proposed method yields a significant 4.4 % absolute word error rate (WER) reduction without sacrificing the performance of native speech recognition task. Our reconstructed model can be applied to a single system to handle both accented and native speech.
A framework is proposed for enterprise automated call routing system development and large scalable natural language call routing application deployment based on IBM's speech recognition and NLU application engage...
详细信息
<正>A framework is proposed for enterprise automated call routing system development and large scalable natural language call routing application deployment based on IBM’s speech recognition and NLU application eng...
详细信息
<正>A framework is proposed for enterprise automated call routing system development and large scalable natural language call routing application deployment based on IBM’s speech recognition and NLU application engagement practices in recently years. To facilitate employing different call classification algorithms in an easy integration manner,this framework architecture provides a plug & play environment for evaluating promising call routing algorithms and a systematic approach to carry out a large scalable enterprise application deployment. The paradigm in this paper illustrates the complementary effort to develop an automatic call routing application for enterprise call centers and covers from call classification algorithm investigation to application programming model. Experimental results on a live data testing set collected from an enterprise call center shows that the performance of the call classification algorithm implemented in this framework is outstanding.
Modeling pronunciation variations is a critical part of spontaneous Mandarin speech recognition. Such variations include both complete changes and partial changes. Complete changes can usually be modeled by using an a...
详细信息
Modeling pronunciation variations is a critical part of spontaneous Mandarin speech recognition. Such variations include both complete changes and partial changes. Complete changes can usually be modeled by using an alternate phone to replace the canonical phone. Partial changes, which cannot be modeled by conventional methods are variations within the phoneme and include diacritics. In this paper, we propose using partial change phone model (PCPM) as well as auxiliary decision tree to model partial changes. A detailed but robust model can be achieved by merging canonical model with PCPMs through Gaussian distribution reconstruction. The effectiveness of this approach was evaluated on the Hub4NE Mandarin Broadcast News Corpus. The syllable error rate decreased 2.39% absolutely with respect to the baseline.
The multiple-pronunciation lexicon (MPL) is very important to model the pronunciation variations for spontaneous speech recognition. But the introduction of MPL brings out two problems. First, the MPL will increase th...
详细信息
The multiple-pronunciation lexicon (MPL) is very important to model the pronunciation variations for spontaneous speech recognition. But the introduction of MPL brings out two problems. First, the MPL will increase the among-lexicon confusion and degrade the recognizer's performance. Second, the MPL needs more data with phonetic transcription so as to cover as many surface forms as possible. Accordingly, two solutions are proposed, they are the context-dependent weighting method and the iterative forced-alignment based transcription method. The use of them can compensate what the MPL causes and improve the overall performance. Experiments across a naturally spontaneous speech database show that the proposed methods are effective and better than other methods.
It is widely acknowledged that pronunciation modeling is an efficient way to improve recognition performance in spontaneous speech. In pronunciation modeling, almost all methods of generating variation probability are...
详细信息
ISBN:
(纸本)8790834100
It is widely acknowledged that pronunciation modeling is an efficient way to improve recognition performance in spontaneous speech. In pronunciation modeling, almost all methods of generating variation probability are based on relative frequency counting from DP alignment. In this paper, we investigate the local model mismatching caused by pronunciation variations and propose to estimate variation probability from acoustic likelihood score. According to estimated probability, we present a method of reconstructing pre-Trained HMM models to include alternate pronunciations by sharing optimal mixture components instead of distributions. Experimental results show that using reconstructed HMM set reduces syllable error rate by 2.03% absolutely compared to the baseline system, also the accuracy improvement gained from proposed method is almost double with respect to that from previous DP alignment.
In this paper, we describe our approach to the French English Bilingual Task in CLEF 2001. A simple dictionary-based method is used to translate the French query into a bag of weighted English words, the English query...
We describe an architecture for speech recognition based interactive toys and discuss the strategies we have adopted to deal with the requirements for the speech recognizer imposed by this application. In particular, ...
详细信息
One common method for keyword spotting in unconstrained speech is based upon a two pass strategy consisting of Viterbi-decoding to detect and segment possible keyword hits, followed by the computation of a confidence ...
详细信息
One common method for keyword spotting in unconstrained speech is based upon a two pass strategy consisting of Viterbi-decoding to detect and segment possible keyword hits, followed by the computation of a confidence measure to verify those hits. In this paper, we propose a simple one-pass strategy where computation of the confidence measure is computed simultaneously with a Viterbi-like decoding stage. However, backtracking is not required, which when coupled with the need for only a single pass through the utterance significantly reduces the memory requirements of this algorithm. This feature makes it well suited for devices where processing power and memory are limited. Experimental results on a connected digits task show that performance of the decoding is comparable to that using a Viterbi search with backtracking. Experimental results on spotting days of the week in continuous speech indicate that the confidence measure calculated is effective in reducing the number of false alarms.
Pronunciation in spontaneous Mandarin speech tends to be much more variable than in read speech. In current recognition systems, pronunciation dictionaries usually only contain one standard pronunciation for each word...
详细信息
ISBN:
(纸本)7801501144
Pronunciation in spontaneous Mandarin speech tends to be much more variable than in read speech. In current recognition systems, pronunciation dictionaries usually only contain one standard pronunciation for each word, so that the amount of variability that can be modelled is very limited. Most recent research work for modelling variations in spontaneous speech focuses on the lexicon level, which can only solve intra-word variations. Inter-word variations cannot be modelled effectively. Chinese is monosyllabic and has simple syllable structure, giving rise to a high amount of pronunciation variations. In this paper, we propose two methods to model pronunciation variations in spontaneous Mandarin speech. First, we generate probability lexicon to model intra-syllable variations by using DP alignment algorithm between base form and surface strings. Second, we integrate variation probability into the decoder to model intra as well as inter-syllable variations. Experimental results show that modelling intra-syllable variation with a probability lexicon reduces syllable error rate by 0.85% (phone error rate reduction of 1.4%) while adding inter-syllable variation in addition reduces syllable error rate significantly by 4.76% (phone error rate reduction of 7.6%) compared to the baseline system.
暂无评论