Unit selection in concatenative speech synthesis has increasing importance, because it decides about the quality which can be achieved in converting a phonemic and prosodic representation of the input text by means of...
详细信息
ISBN:
(纸本)7801501144
Unit selection in concatenative speech synthesis has increasing importance, because it decides about the quality which can be achieved in converting a phonemic and prosodic representation of the input text by means of concatenating units and unit variants contained in the inventory. Extending the Dresden speech synthesiser DreSS ([7]), to a multilingual system (German, American, Italian, Chinese) brought up the need for a new universal unit selection that could handle different unit types (allophone, diphone, syllable and mixed inventories). Rule based approaches are often language and speaker specific and therefore inflexible. Besides a consequent separation of algorithm and (speaker- and languagedependent) knowledge bases, performance adjustment of the algorithm should be possible depending on the information content of the data base in use. the search for a universal formalism for unit selection meeting all these requirements lead us to a graph based algorithm already successfully applied to other problems in speech processing.
this paper concerns the handling of out-of-vocabulary (OOV) words in the JUPITER weather information system. Specifically our objective is to deal with weather queries regarding unknown cities. We have implemented a s...
ISBN:
(纸本)7801501144
this paper concerns the handling of out-of-vocabulary (OOV) words in the JUPITER weather information system. Specifically our objective is to deal with weather queries regarding unknown cities. We have implemented a system which can detect the presence of an unknown city name, and immediately propose a plausible spelling for that city. Potentially, the city can be dynamically incorporated into the recognizer lexicon. the three-stage system described in [1] was implemented in the JUPITER domain, and this paper will detail the development of a system that uses an ANGIE-based framework to model both spelling and pronunciation simultaneously, and uses automatically derived novel lexical units in the first stage. We report results on an independent test set containing unknown cities. Compared with a single-stage baseline, word error was reduced by 29.3% (from 24.6% to 17.4%) and understanding error was reduced by 67.5% (from 67.0% to 21.8%) on the three-stage configuration.
MUXING is a telephone-based conversational system that allows users to access weather information in Mandarin Chinese over the telephone. Although MUXING utilizes the same architecture as well as most of the same huma...
详细信息
ISBN:
(纸本)7801501144
MUXING is a telephone-based conversational system that allows users to access weather information in Mandarin Chinese over the telephone. Although MUXING utilizes the same architecture as well as most of the same human languagetechnology components as its English predecessor, JUPITER, some modifications to the system were necessary to account for differences between English and Mandarin Chinese. In addition, the weather database needed to be modified to reflect regions of greater interest to potential Chinese users. this paper describes our system development effort, paying particular attention toMandarinspecific changes to the original JUPITER system.
this research focuses on the relationship between turntaking and intensity of motions (head, hand and upper body motions). We examine five spontaneous dialogues conducted in Japanese, which were collected using video ...
详细信息
ISBN:
(纸本)7801501144
this research focuses on the relationship between turntaking and intensity of motions (head, hand and upper body motions). We examine five spontaneous dialogues conducted in Japanese, which were collected using video and audio recording equipment, and optical motion sensing device. We show first that the motions appear more in speech duration than in pause. Next, we present some data showing that motions of chest and hand are more relevant to turn hold, and that motions of head do not have a signal function for differentiating the turn hold form the turn change.
this paper addressed the problem of Out-Of-Vocabulary (OOV) utterance detection in small vocabulary telephone keyword spotting system. We propose a new approach for modeling OOV words in the scenario of a small vocabu...
详细信息
ISBN:
(纸本)7801501144
this paper addressed the problem of Out-Of-Vocabulary (OOV) utterance detection in small vocabulary telephone keyword spotting system. We propose a new approach for modeling OOV words in the scenario of a small vocabulary of telephone keyword spotting system. the paper adopt the semi-continuous Hidden Markov Model with multiple codebooks to modeling the keywords. We propose a two pass procedure to spot the real keyword occurrence. In the first pass, the normal viterbi search procedure is applied, withthe appropriate defined and trained garbage models and silence models. the output of this stage produces the N-best word hypothesis the second pass, which can be seen as a verification procedure, take the first pass output as focuses. this approach is mainly constructing a "dynamic anti-model" based on the detected hypothesis keyword model and the current input acoustic information.
this paper introduces ORION, a conversational system that performs off-line tasks and initiates later contact with a user at a prenegotiated time. Orion has two major episodes of activity: the enrollment of new tasks ...
详细信息
ISBN:
(纸本)7801501144
this paper introduces ORION, a conversational system that performs off-line tasks and initiates later contact with a user at a prenegotiated time. Orion has two major episodes of activity: the enrollment of new tasks and the execution of pending tasks. the task manager periodically checks the pending tasks and updates their status, sending off requests to other servers for information and possibly launching a phone call when a particular task has reached its trigger time. A separate user interface engages in a dialogue with a user to enroll new tasks and/or update existing tasks. ORION is still in an early stage of its development cycle, but it has introduced several interesting new research issues, such as continuous state maintenance and contact verification.
this paper discusses our three-stage approach to a flexible vocabulary speech understanding system, which can detect out-ofvocabulary (OOV) words, and hypothesize their phonetic and orthographic transcriptions. In the...
详细信息
ISBN:
(纸本)7801501144
this paper discusses our three-stage approach to a flexible vocabulary speech understanding system, which can detect out-ofvocabulary (OOV) words, and hypothesize their phonetic and orthographic transcriptions. In the first stage, we introduce the column-bigram finite-state transducer (FST) which, while embedding ANGIE sublexical models, also supports previously unseen data from unknown words. Secondly, the ANGIE models utilize grapheme information, providing tighter linguistic constraint as well as instantaneous sound-to-letter capability during recognition. thirdly, the syllable-level lexical units of the first stage are automatically derived via an iterative procedure to optimize performance. the second-stage recognizer employs ANGIE to output a word network which is parsed by TINA, our natural language (NL) processor, in stage three. Experiments with a JUPITER implementation of this system are described in [1].
In this paper, we describe the use of a powerful machine learning scheme, Support Vector Machines (SVM), within the framework of hidden Markov model (HMM) based speech recognition. the hybrid SVM/HMM system has been d...
详细信息
ISBN:
(纸本)7801501144
In this paper, we describe the use of a powerful machine learning scheme, Support Vector Machines (SVM), within the framework of hidden Markov model (HMM) based speech recognition. the hybrid SVM/HMM system has been developed based on our public domain toolkit. the hybrid system has been evaluated on the OGI Alphadigits corpus and performs at 11.6% WER, as compared to 12.7% with a triphone mixture-Gaussian HMM system, while using only a fifth of the training data used by triphone system. Several important issues that arise out of the nature of SVM classifiers have been addressed. We are in the process of migrating this technology to large vocabulary recognition tasks like SWITCHBOARD.
the basic assumption in intonation models and perhaps generally in prosody models is, that part-of-speech information is of paramount importance for predicting the actual values for the prosodic parameters;be they pit...
详细信息
ISBN:
(纸本)7801501144
the basic assumption in intonation models and perhaps generally in prosody models is, that part-of-speech information is of paramount importance for predicting the actual values for the prosodic parameters;be they pitch, segmental duration or loudness. We have studied whether morphological knowledge, in addition to part-of-speech and functional information, is of any help in predicting prosody in a morphologically rich language such as Finnish. Our research concerns Finnish prosody with respect to pitch and segmental duration. the basic methodology we employ is based on artificial neural networks. It is a continuation of our earlier studies on prosody where we investigated the problem of generating values for prosodic parameters for text-to-speech synthesis. the basic methodology we employ is based on standard multi-layer feed-forward networks that are trained with backpropagation. the results we have obtained show that there are certain advantages in adding morphological knowledge to the network input. Apart from part-of-speech information, there are certain cases where morphological features seem to affect the outcome of both pitch and segmental durations. this behavior can be expected in a morphologically rich language.
Japanese backchannel utterances have recently been studied to incorporate their conversational and social functions into spoken dialogue interfaces. Most of the studies employ corpus-based methodologies to elucidate c...
详细信息
ISBN:
(纸本)7801501144
Japanese backchannel utterances have recently been studied to incorporate their conversational and social functions into spoken dialogue interfaces. Most of the studies employ corpus-based methodologies to elucidate conditions under which backchannel utterances occur. We propose, in this paper, an experimental method to identify backchannel-contexts through controlled manipulation of local prosody of conversational utterances by incorporating speech synthesis technologies. We found that backchannelfacilitating contexts have to be characterized both in terms of the lexical features and the local prosody, whereas backchannelsuppressing contexts can be partially identified in terms of the phrase-end prosody.
暂无评论