We present an integrated phrase segmentation/alignment algorithm (ISA) for Statistical Machine Translation. Without the need of building an initial word-to-word alignment or initially segmenting the monolingual text i...
详细信息
ISBN:
(纸本)0780379020
We present an integrated phrase segmentation/alignment algorithm (ISA) for Statistical Machine Translation. Without the need of building an initial word-to-word alignment or initially segmenting the monolingual text into phrases as other methods do, this algorithm segments the sentences into phrases and finds their alignments simultaneously. For each sentence pair, ISA builds a two-dimensional matrix to represent a sentence pair where the value of each cell corresponds to the Point-wise Mutual Information (MI) between the source and target words. Based on the similarities of MI values among cells, we identify the aligned phrase pairs. Once all the phrase pairs are found, we know both how to segment one sentence into phrases and also the alignments between the source and target sentences. We use monolingual bigram language models to estimate the Joint probabilities of the identified phrase pairs. The Joint probabilities are then normalized to conditional probabilities, which are used by the decoder. Despite its simplicity, this approach yields phrase-to-phrase translations with significant higher precisions than our baseline system where phrase translations are extracted from the HMM word alignment. When we combine the phrase-to-phrase translations generated by this algorithm with the baseline system, the improvement on translation quality is even larger.
In this study we present blind equalization techniques for ETSI standard Distributed Speech Recognition (DSR) front-end which compensate for acoustic mismatch caused by input devices. The DSR front-end employs vector ...
详细信息
Controlled and restricted dialogue systems are reliable enough to be deployed in various real world applications. The more conversational a dialogue system becomes, the more difficult and unreliable become recognition...
详细信息
ISBN:
(纸本)3540403809
Controlled and restricted dialogue systems are reliable enough to be deployed in various real world applications. The more conversational a dialogue system becomes, the more difficult and unreliable become recognition and processing. Numerous research projects are struggling to overcome the problems arising with more- or truly conversational dialogue system. We introduce a set of contextual coherence measurements that can improve the reliability of spoken dialogue systems, by including contextual knowledge at various stages in the naturallanguageprocessing pipeline. We show that, situational knowledge can be successfully employed to resolve pragmatic ambiguities and that it can be coupled with ontological knowledge to resolve semantic ambiguities and to choose among competing automatic speech recognition hypotheses.
In this paper, we will propose TeLQAS, which is an ontology-based naturallanguage question/answering system for the domain of Telecommunication Technologies. In an online process, the system accepts the users' qu...
详细信息
ISBN:
(纸本)1932415092
In this paper, we will propose TeLQAS, which is an ontology-based naturallanguage question/answering system for the domain of Telecommunication Technologies. In an online process, the system accepts the users' questions in English, and after retrieving the related text documents from either its local database or web;it summarizes the retrieved text documents with the highest relevance. The privilege of the system is that it is capable of extracting the related text documents automatically from the collections on the web in an offline process using a beneficial text categorization mechanism based on the concepts in the system ontology. It will then store them in its local database. We have developed a prototype for the proposed system in the domain of Optical Telecommunication. The experimental results of the prototype for a collection of 100 different questions in the optical telecommunication domain, shows that TeLQAS has a good accuracy in comparison with other similar question/answering systems.
Ontologies are becoming extremely useful tools for sophisticated software engineering. Designing applications, databases, and knowledge bases with reference to a common ontology can mean shorter development cycles, ea...
详细信息
ISBN:
(纸本)1932415092
Ontologies are becoming extremely useful tools for sophisticated software engineering. Designing applications, databases, and knowledge bases with reference to a common ontology can mean shorter development cycles, easier and faster integration with other software and content, and a more scalable product. Although ontologies are a very promising solution to some of the most pressing problems that confront software engineering, they also raise some issues and difficulties of their own. Consider, for example, the questions below: How can a formal ontology be used effectively by those who lack extensive training in logic and mathematics? How can an ontology be used automatically by applications (e.g. Information Retrieval and naturallanguageprocessing applications) that process free text? How can we know when an ontology is complete? In this paper we will begin by describing the upper-level ontology SUMO (Suggested Upper Merged Ontology), which has been proposed as the initial version of an eventual Standard Upper Ontology (SUO). We will then describe the popular, free, and structured WordNet lexical database. After this preliminary discussion, we will describe the methodology that we are using to align WordNet with the SUMO. We close this paper by discussing how this alignment of WordNet with SUMO will provide answers to the questions posed above.
The proceedings contain 23 papers. The topics discussed include: design of customized web applications with OntoWeaver;how robot baby learns meaningful representations;panel: large knowledge capture projects;learner: ...
ISBN:
(纸本)1581135831
The proceedings contain 23 papers. The topics discussed include: design of customized web applications with OntoWeaver;how robot baby learns meaningful representations;panel: large knowledge capture projects;learner: a system for acquiring commonsense knowledge by analogy;learning programs from traces using version space algebra;OntoShare � a knowledge management environment for virtual communities of practice;enabling domain experts to convey questions to a machine: a modified, template-based approach;aiding knowledge capture by searching for extensions of knowledge models;towards topic-based summarization for interactive document viewing;capturing interest through inference and visualization: ontological user profiling in recommender systems;combining ontology engineering subprocesses to build a time ontology;capturing task knowledge for geospatial imagery;LitLinker: capturing connections across the biomedical literature;sentiment analysis: capturing favorability using naturallanguageprocessing;a generic library of problem solving methods for scheduling applications;and evaluating expert-authored rules for military reasoning.
We can still create,computer programs displaying only the most rudimentary naturallanguageprocessing capabilities. One of the greatest barriers to advanced naturallanguageprocessing is our inability to overcome th...
详细信息
ISBN:
(纸本)3540005323
We can still create,computer programs displaying only the most rudimentary naturallanguageprocessing capabilities. One of the greatest barriers to advanced naturallanguageprocessing is our inability to overcome the linguistic knowledge acquisition bottleneck. In this paper, we describe recent work in a number of areas, including grammar checker development, automatic question answering, and language modeling, where state of the art accuracy is achieved using very simple methods whose power comes entirely from the plethora of text currently available to these systems, as opposed to deep linguistic analysis or the application of state of the art machine learning techniques. This,suggests that the field of NLP might benefit by concentrating less on technology development and more on data acquisition.
This paper presents the first step to integrate computer graphics technology and a naturallanguage interface to enable non-professionals to generate 3D scenes based on story description. We first review story visuali...
详细信息
ISBN:
(纸本)1880843471
This paper presents the first step to integrate computer graphics technology and a naturallanguage interface to enable non-professionals to generate 3D scenes based on story description. We first review story visualization techniques and research on application naturallanguage interfaces in graphic presentation. An approach is then given that uses story-based naturallanguage as premier input source to generate a 3D Virtual Environment. The methodology is based on integrating the techniques of naturallanguageprocessing (NLP) and 3D graphic presentation to construct and manipulate VRML based scene graph in real time. In particular, we describe how to interpret naturallanguage based visual features of the objects into an interactive real-time 3D virtual environment.
The paper outlines a semantic ontology as a minimal set of top-level conceptual distinctions underlying naturallanguage communication. A semantic ontology can serve as the basis for the specification of the meaning, ...
详细信息
ISBN:
(纸本)3540408037
The paper outlines a semantic ontology as a minimal set of top-level conceptual distinctions underlying naturallanguage communication. A semantic ontology can serve as the basis for the specification of the meaning, as the logical form, of agent messages couched in naturallanguage. It represents a general and reusable module in the architecture of multi-agent systems involving human as well as software agents. As a practical example, we will sketch a basic multi-agent system relying on naturallanguage communication.
This paper proposes a new document clustering approach from the viewpoint of partitions of weighted directional graphs (digraph). First, naturallanguageprocessing and feature selection techniques are utilized to rem...
详细信息
ISBN:
(纸本)1932415076
This paper proposes a new document clustering approach from the viewpoint of partitions of weighted directional graphs (digraph). First, naturallanguageprocessing and feature selection techniques are utilized to remove the words that are useless for document clustering. Then, only useful keywords are extracted and the association strengths between them are computed, which can greatly reduce time and space complexities of the clustering algorithm. After that, the extracted keywords are treated as the nodes and the association strengths are used as the weights in the arcs from some keywords to their associated ones. Therefore, a weighted digraph is constructed. The strongly connected components in the keyword digraph are explored heuristically. These components represent the keyword clusters of the document collection. Based on the keyword clusters, each document is clustered according to the similarity of the keywords between the document and each of the keyword clusters. It is revealed from the experiments that using keyword clusters in automatic document clustering can result in high clustering precision rate.
暂无评论