Current graph-based approaches to automatic text summarization, such as LexRank and TextRank, assume a static graph which does not model how the input texts emerge. A suitable evolutionary text graph model may impart ...
详细信息
Current graph-based approaches to automatic text summarization, such as LexRank and TextRank, assume a static graph which does not model how the input texts emerge. A suitable evolutionary text graph model may impart a better understanding of the texts and improve the summarization process. We propose a timestamped graph (TSG) model that is motivated by human writing and reading processes, and show how text units in this model emerge over time. In our model, the graphs used by LexRank and TextRank are specific instances of our timestamped graph with particular parameter settings. We apply timestamped graphs on the standard DUC multi-document text summarization task and achieve comparable results to the state of the art.
This paper introduces multi-level association graphs (MLAGs), a new graph-based framework for information retrieval (IR). The goal of that framework is twofold: first, it is meant to be a meta model of IR, i.e. it sub...
详细信息
This paper introduces multi-level association graphs (MLAGs), a new graph-based framework for information retrieval (IR). The goal of that framework is twofold: first, it is meant to be a meta model of IR, i.e. it subsumes various IR models under one common representation. Second, it allows to model different forms of search, such as feedback, associative retrieval and browsing at the same time. It is shown how the new integrated model gives insights and stimulates new ideas for IR algorithms. One of these new ideas is presented and evaluated, yielding promising experimental results.
We propose to use graph-based diffusion techniques with data-dependent kernels to build unigram language models. Our approach entails building graphs, where each vertex corresponds uniquely to a word from a closed voc...
详细信息
We propose to use graph-based diffusion techniques with data-dependent kernels to build unigram language models. Our approach entails building graphs, where each vertex corresponds uniquely to a word from a closed vocabulary, and the existence of an edge (with an appropriate weight) between two words indicates some form of similarity between them. In one of our constructions, we place an edge between two words if the number of times these words were seen in a training set differs by at most one count. This graph construction results in a similarity matrix with small intrinsic dimension, since words with the same counts have the same neighbors. Experimental results from a benchmark task from language modeling show that our method is competitive with the Good-Turing estimator.
Degree distributions for word forms cooccurrences for large Russian text collections are obtained. Two power laws fit the distributions pretty good, thus supporting Dorogovtsev-Mendes model for Russian. Few different ...
详细信息
Degree distributions for word forms cooccurrences for large Russian text collections are obtained. Two power laws fit the distributions pretty good, thus supporting Dorogovtsev-Mendes model for Russian. Few different Russian text collections were studied, and statistical errors are shown to be negligible. The model exponents for Russian are found to differ from those for English, the difference probably being due to the difference in the collections structure. On the contrary, the estimated size of the supposed kernel lexicon appeared to be almost the same for the both languages, thus supporting the idea of importance of word forms for a perceptual lexicon of a human.
In the past, NLP has always been based on the explicit or implicit use of linguistic knowledge. In classical computer linguistic applications explicit rule based approaches prevail, while machine learning algorithms u...
详细信息
The proceedings contain 62 papers. The topics discussed include: using POMDPS for dialog management;speech technology opportunities and challenges;information extraction from speech;graph-basedmethods for language pr...
详细信息
ISBN:
(纸本)1424408733
The proceedings contain 62 papers. The topics discussed include: using POMDPS for dialog management;speech technology opportunities and challenges;information extraction from speech;graph-basedmethods for languageprocessing and information retrieval;voice-activated question answering;applications of spoken language technology and systems;designing voice user interfaces: evidence from the field;understanding and modeling communication scenes;multilingual languageprocessing;widening the NLP pipeline for spoken languageprocessing;spoken language generation;recent advances in automatic speech summarization;a novel DTW-based distance measure for speaker segmentation;domain-independent topic segmentation using a string kernel on recognized sub-word sequences;and summarization of spoken lectures based on linguistic surface and prosodic information.
Information Extraction (IE) is the task of extracting knowledge from unstructured text. We present a novel unsupervised approach for information extraction based on graph mutual reinforcement. The proposed approach do...
详细信息
Parallel corpus is a valuable resource used in various fields of multilingual naturallanguageprocessing. One of the most significant problems in using parallel corpora is the lack of their availability. Researchers ...
详细信息
Resolving anaphora is an important step in the identification of named entities such as genes and proteins in biomedical scientific articles. The goal of this work is to resolve associative and coreferential anaphoric...
详细信息
The proceedings contain 15 papers. The topics discussed include: a flexible approach to naturallanguage generation for disabled children;unsupervised part-of-speech tagging employing efficient graph clustering;sub-se...
The proceedings contain 15 papers. The topics discussed include: a flexible approach to naturallanguage generation for disabled children;unsupervised part-of-speech tagging employing efficient graph clustering;sub-sentential alignment using substring co-occurrence counts;annotation schemes and their influence on parsing results;modeling human sentence processing data with a statistical parts-of-speech tagger;semantic discourse segmentation and labeling for route instructions;investigations on event-based summarization;discursive usage of six Chinese punctuation marks;integrated morphological and syntactic disambiguation for modern Hebrew;and a hybrid relational approach for WSD – first results.
暂无评论