Information Extraction (IE) is the task of extracting knowledge from unstructured text. We present a novel unsupervised approach for information extraction based on graph mutual reinforcement. The proposed approach do...
详细信息
Parallel corpus is a valuable resource used in various fields of multilingual naturallanguageprocessing. One of the most significant problems in using parallel corpora is the lack of their availability. Researchers ...
详细信息
Resolving anaphora is an important step in the identification of named entities such as genes and proteins in biomedical scientific articles. The goal of this work is to resolve associative and coreferential anaphoric...
详细信息
The proceedings contain 15 papers. The topics discussed include: a flexible approach to naturallanguage generation for disabled children;unsupervised part-of-speech tagging employing efficient graph clustering;sub-se...
The proceedings contain 15 papers. The topics discussed include: a flexible approach to naturallanguage generation for disabled children;unsupervised part-of-speech tagging employing efficient graph clustering;sub-sentential alignment using substring co-occurrence counts;annotation schemes and their influence on parsing results;modeling human sentence processing data with a statistical parts-of-speech tagger;semantic discourse segmentation and labeling for route instructions;investigations on event-based summarization;discursive usage of six Chinese punctuation marks;integrated morphological and syntactic disambiguation for modern Hebrew;and a hybrid relational approach for WSD – first results.
We introduce Chinese Whispers, a randomized graph-clustering algorithm, which is time-linear in the number of edges. After a detailed definition of the algorithm and a discussion of its strengths and weaknesses, the p...
We introduce Chinese Whispers, a randomized graph-clustering algorithm, which is time-linear in the number of edges. After a detailed definition of the algorithm and a discussion of its strengths and weaknesses, the performance of Chinese Whispers is measured on naturallanguageprocessing (NLP) problems as diverse as language separation, acquisition of syntactic word classes and word sense disambiguation. At this, the fact is employed that the small-world property holds for many graphs in NLP.
To determine how close two language models (e.g., n-grams models) are, we can use several distance measures. If we can represent the models as distributions, then the similarity is basically the similarity of distribu...
详细信息
The ability to accurately model the content structure of text is important for many naturallanguageprocessing applications. This paper describes experiments with generative models for analyzing the discourse structu...
详细信息
Document indexing and representation of term-document relations are very important for document clustering and retrieval. In this paper, we combine a graph-based dimensionality reduction method with a corpus-based ass...
Document indexing and representation of term-document relations are very important for document clustering and retrieval. In this paper, we combine a graph-based dimensionality reduction method with a corpus-based association measure within the Generalized Latent Semantic Analysis framework. We evaluate the graph-based GLSA on the document clustering task.
In this paper we present a graph-based approach to question answering. The method assumes a graph representation of question sentences and text sentences. Question answering rules are automatically learnt from a train...
In this paper we present a graph-based approach to question answering. The method assumes a graph representation of question sentences and text sentences. Question answering rules are automatically learnt from a training corpus of questions and answer sentences with the answer annotated. The method is independent from the graph representation formalism chosen. A particular example is presented that uses a specific graph representation of the logical contents of sentences.
We discuss several feature sets for novelty detection at the sentence level, using the data and procedure established in task 2 of the TREC 2004 novelty track. In particular, we investigate feature sets derived from g...
We discuss several feature sets for novelty detection at the sentence level, using the data and procedure established in task 2 of the TREC 2004 novelty track. In particular, we investigate feature sets derived from graph representations of sentences and sets of sentences. We show that a highly connected graph produced by using sentence-level term distances and pointwise mutual information can serve as a source to extract features for novelty detection. We compare several feature sets based on such a graph representation. These feature sets allow us to increase the accuracy of an initial novelty classifier which is based on a bag-of-word representation and KL divergence. The final result ties with the best system at TREC 2004.
暂无评论