In this paper we show how inexact multisubgraph matching can be solved using methodsbased on the projections of vertices (and their connections) into the eigenspaces of graphs - and associated clustering methods. Our...
详细信息
Traditional indexing methods often record physical positions for the specified words, thus fail to recognize context information. We suggest that Chinese text index should work on the layer of sentences. This paper pr...
详细信息
Traditional indexing methods often record physical positions for the specified words, thus fail to recognize context information. We suggest that Chinese text index should work on the layer of sentences. This paper presents an indexing method based on sentences and demonstrates how to use this method to help compute the mutual information of word pairs in a running text. It brings many conveniences to work of naturallanguageprocessing.
We present speech experiments that were carried out to evaluate a topically focusing language model in large vocabulary speech recognition. An ordered topical clustering is first computed as a self-organized mapping o...
详细信息
We present speech experiments that were carried out to evaluate a topically focusing language model in large vocabulary speech recognition. An ordered topical clustering is first computed as a self-organized mapping of a large document collection. language models are then trained for each text cluster or for several neighboring clusters. The obtained organized collection of language models is efficiently utilized in continuous speech recognition to concentrate on the model that corresponds closest to the current topic of discussion. The speech recognition experiments are carried out on a novel Finnish speech database. A property of Finnish that is particularly challenging for speech recognition is the extremely fast vocabulary growth that makes many of the standard word-basedlanguage modeling methods impractical for large vocabulary tasks.
As the demand for multilingual speech recognizers increases, the development of systems which combine automatic language identification, language-specific pronunciation modeling and language-independent acoustic model...
详细信息
As the demand for multilingual speech recognizers increases, the development of systems which combine automatic language identification, language-specific pronunciation modeling and language-independent acoustic models becomes increasingly important. When the recognition grammar is dynamic and obtained directly from written text, the language associated with each grammar item has to be identified using that text. Many methods proposed in the literature require fairly large amounts of text, which may not always be available. This paper describes a text-basedlanguage identification system developed for the identification of the language of short words, e.g., proper names. Two different approaches are compared. The n-gram method commonly used in the literature is first reviewed and further enhanced. We also propose a simple method for language identification that is based on decision trees. The methods are first evaluated in a text-basedlanguage identification task. Both methods are also tested as preprocessors for a multilingual speech recognition task, where the language of each text item has to be determined, in order to choose the correct text-to-pronunciation mapping. The experimental results show that the proposed methods perform very well, and merit further development.
Many naturallanguageprocessing applications require semantic knowledge about topics in order to be possible or to be efficient. So we developed a system, SEGAPSITH, that acquires it automatically from text segments ...
Many naturallanguageprocessing applications require semantic knowledge about topics in order to be possible or to be efficient. So we developed a system, SEGAPSITH, that acquires it automatically from text segments by using an unsupervised and incremental clustering method. In such an approach, an important problem consists of the validation of the learned classes. To do that, we applied another clustering method, that only needs to know the number of classes to build, on the same subset of text segments and we reformulate our evaluation problem in comparing the two classifications. So, we established different criteria to compare them, based either on the words as class descriptors or on the thematic units. Our first results lead to show a great correlation between the two classifications.
作者:
Bassi, AUniv Chile
Fac Ciencias Fis & Matemat Dept Ciencias Computac Santiago Chile
This paper presents a semantic model based on well-known psycholinguistic theories of human memory. It is centered on a spreading activation network, but it departs from classical models by representing associations b...
详细信息
ISBN:
(纸本)0769508103
This paper presents a semantic model based on well-known psycholinguistic theories of human memory. It is centered on a spreading activation network, but it departs from classical models by representing associations between structured units instead of atomic nodes. Network units have an activity level that evolves according to their expected contextual relevance. Spreading activation explains the predictive top-down effect of knowledge. It supports a general heuristics which may be used as the first step of more elaborated methods. This model is suited to deal with the interaction between semantic and episodic memories, as well as many other practical issues regarding naturallanguageprocessing, including the retroactive effect of semantics over perception and the operation in open-worlds.
We present three systems for surface naturallanguage generation that are trainable from annotated corpora. The first two systems, called NLG1 and NLG2, require a corpus marked only with domain-specific semantic attri...
详细信息
ISBN:
(纸本)1558607048
We present three systems for surface naturallanguage generation that are trainable from annotated corpora. The first two systems, called NLG1 and NLG2, require a corpus marked only with domain-specific semantic attributes, while the last system, called NLG3, requires a corpus marked with both semantic attributes and syntactic dependency information. All systems attempt to produce a grammatical naturallanguage phrase from a domain-specific semantic representation. NLG1 serves a baseline system and uses phrase frequencies to generate a whole phrase in one step, while NLG2 and NLG3 use maximum entropy probability models to individually generate each word in the phrase. The systems NLG2 and NLG3 learn to determine both the word choice and the word order of the phrase. We present experiments in which we generate phrases to describe flights in the air travel domain.
Resnik and Yarowsky (1997) made a set of observations about the state-of-the-art in automatic word sense disambiguation and, motivated by those observations, offered several specific proposals regarding improved evalu...
This paper compares different methods of generating intonation for an American Enghsh Text-to-Speech synthesis system. We look at a primarily rule-based approach and two data-driven approaches. For data-driven modehng...
详细信息
We are interested in providing automated services via natural spoken dialog systems. By natural, we mean that the machine understands and acts upon what people actually say, in contrast to what one would Like them to ...
详细信息
We are interested in providing automated services via natural spoken dialog systems. By natural, we mean that the machine understands and acts upon what people actually say, in contrast to what one would Like them to say. There are many issues that arise when such systems are targeted for large populations of non-expert users. In this paper, we focus on the task of automatically routing telephone calls based on a user's fluently spoken response to the open-ended prompt of "How may I help you?". We first describe a database generated from 10,000 spoken transactions between customers and human agents. We then describe methods for automatically acquiring language models for both recognition and understanding from such data. Experimental results evaluating call-classification from speech are reported for that database. These methods have been embedded within a spoken dialog system, with subsequent processing for information retrieval and formfilling. (C) 1997 Elsevier Science B.V.
暂无评论