this paper describes the development and testing of a pilot spoken dialogue system for bus travel information in the city of Trondheim, Norway. the system driven dialogue was designed on the basis of analyzed recordin...
详细信息
this paper describes the development and testing of a pilot spoken dialogue system for bus travel information in the city of Trondheim, Norway. the system driven dialogue was designed on the basis of analyzed recordings from both human-human operator dialogues, Wizard-of-Oz (WoZ) dialogues, and a text-based in- quiry system for the web. the dialogue system employs a flexible speech recognizer and an utterancc concatenation procedure for speech output. Even though the system is intended for research only, it has been accessible through a public phone number since October 1999. During this period all dialogues have been recorded. From these, approximately 350 dialogues were selected for annotation and comparison to 120 dialogues from the WoZ recordings. the experiments showed that the turn error rate was more than twice as large for the real dialogues as for the WoZ calls, i.e., 13.3% versus 5.7%. thus, the WoZ results did not give a reliable estimate for the true performance. Our experiments indicate that the current flexible speech recognizer should be further opti- mized.
Whether human speech perception depends on a biologically based link between production and perception or whether it is best characterised as a series of acoustic, phonetic, and semantic transformations has remained a...
详细信息
Whether human speech perception depends on a biologically based link between production and perception or whether it is best characterised as a series of acoustic, phonetic, and semantic transformations has remained an unresolved issue. We addressed this question via the use of objective brain research methods combined withadvanced stimulus production methodology. We removed the contribution of the periodic glottal excitation, produced by the vocal folds in the human larynx, from vowel stimuli and found that magnetic responses generated in the auditory cortex respond to this removal. the amplitude of the main deflection of the magnetic responses, Nlm, decreased even though the formant settings, intensity, and duration of the stimuli were identical. Hence, because human brain activity attenuates if vowel stimuli are "distorted" by the removal of their naturally occurring periodic excitation we conclude that speech production and perception mechanisms in the human cortex are fundamentally interrelated.
In order to detect misrecognitions that may result from a mismatch between training and testing data. we use a confidence measure (CM) that collects a set of features during recognition and from the N-best list that i...
详细信息
In order to detect misrecognitions that may result from a mismatch between training and testing data. we use a confidence measure (CM) that collects a set of features during recognition and from the N-best list that is output by the recognizer. A neural network (NN) then calculates the probability that the utterance was recognized correctly based on these features. Since for misrecognized utterances the resuiting phoneme alignments are often erroneous, we introduced some new features that are based on phoneme durations. the durations found by the recognizer are compared to the durations present in the training data base and the resuits of these comparisons serve as input for the NN. A great advantage of the duration-related features is that they are independent of the recognizer in contrast to e.g. acoustic scorebased features. We also use some score-related fearures that have proven to be useful in the past. Simultaneously with determining the confidence for a recognition resilt, we try to detect if in case of a misrecognition the utterance was an out of vocabulary (OOV) utterance. Using the complete set of 46 features we can achieve a correct classification rate of 90%. the word error rate can be reduced by 92% at a false rejection rate of 5.1% on a test task that consists of 35 speakers and includes more than 50% OOV utterances. OOV words were detected correctly in 91% of the cases. the presented CM is also used in a semi-supervised speaker adaptation scheme.
this paper presents a general framework for conversational agents for business applications that supports multi-channel, multi-modal interactions through the use of a channel and modality independent Dialog Move Marku...
详细信息
this paper presents a general framework for conversational agents for business applications that supports multi-channel, multi-modal interactions through the use of a channel and modality independent Dialog Move Markup language. In particular, we describe a prototype system as an instantiation of a general dialog architecture that supports web-based interaction through a combination of modalities such as natural language dialog and user interface components. User studies have revealed that the prototype syste(?) has enhanced user experience in an online shopping environment by significantly reducing the length of interactions in terms of time and the number of clicks. Furthermore, the success in exter ding the general architecture to a prototype system demons(?)rates the applicability and potentiality of such framework in business applications.
Distance measures[1][2][3] based on the covariance matrix of feature vectors were applied to text-independent speaker verification and identification. However, some of them do not satisfy the symmetric property which ...
详细信息
Distance measures[1][2][3] based on the covariance matrix of feature vectors were applied to text-independent speaker verification and identification. However, some of them do not satisfy the symmetric property which is fundamental to a distance measure. In this paper, we propose several sym-metric distance measures based on the covariance matrix of feature vectors, and then construct some advanced measures using the data fusion method [4]. these new distance measures have good mathematic properties and impose little overhead in calculation. We apply these distance measures to text-independent speaker identification and handset detection. A new robust technique is developed for crosshandset speaker identification, and find that compensating the second order statistics is important when dealing withthe mismatch caused by different handsets. the experiment uses the cb1 and cb2 data in the LLHDB corpus [5] for same-handset and cross-handset speaker identification test. We find that the use of delta cepstra decreases the speaker identification error rate by as much as 38%. Data fusion technique could further decrease the error rate by 11%. Applying these distance measures to 2-handset detection problem, the error rate is 12%. Using our new robust technique, the cross-handset speaker identification error rate is could be decreased by around 17%.
the design and implementation of the AT&T Communicator mixed-initiative spoken dialog system is described. the Communicator project, sponsored by DARPA and launched in 1999, is a multi-year multi-site project on a...
详细信息
the design and implementation of the AT&T Communicator mixed-initiative spoken dialog system is described. the Communicator project, sponsored by DARPA and launched in 1999, is a multi-year multi-site project on advanced spoken dialog systems research. the main focus of this paper is on issues related to the design of mixed-initiative systems. In addition to describing our architecture and implementation of the complex travel task, the paper reports on some preliminary evaluation results.
HTML [11, 12] is a well-accepted and widely used language for creating platform-independent documents to be posted on the web, and HTML documents are semistructured in nature according to the HTML specification. We pr...
详细信息
ISBN:
(纸本)0769500846
HTML [11, 12] is a well-accepted and widely used language for creating platform-independent documents to be posted on the web, and HTML documents are semistructured in nature according to the HTML specification. We propose a tool, called webView, which constructs the semistructured data graph (SDG) of an HTML document H to capture the internal structure of data embedded in H and in its (in)directly linked documents. On top of the SDG, webView provides query processing capability for evaluating Set-like queries that are posted against the SDG, i.e., the source document(s), for extracting information from the SDG. Existing methods for extracting structured information from certain HTML documents with static internal structure, such as wrappers and integrators for data warehousing, can benefit from webView.
the proceedings contain 40 papers. the topics discussed include: research and development of advanced database systems for integration of media and user environments;data mining and personalization technologies;suppor...
ISBN:
(纸本)0769500846
the proceedings contain 40 papers. the topics discussed include: research and development of advanced database systems for integration of media and user environments;data mining and personalization technologies;supporting web-based database application development;adaptive and incremental query expansion for cluster-based browsing;mining exception instances to facilitate workflow exception handling;specifying complex process control aspects in workflows for exception handling;design and implementation of a structured information retrieval system for SGML documents;visualization of path expressions in a virtual object-oriented database query language;and early separation of filter and refinement steps in spatial query optimization.
Withthe enormous amount of data stored in the World Wide web, it is increasingly important to design and develop powerful web warehousing tools. the key objective of our web warehousing project, called WHOWEDA (Wareh...
A simple recurrent neural network is trained on a one-step look ahead prediction task for symbol sequences of the context-sensitive an/bsup n/cn language. Using an evolutionary hill climbing strategy for incremental l...
详细信息
暂无评论