Many organizations possess large collections of textual reports that document how a problem is solved or analysed, e.g. medical patient records, industrial accident reports, lawsuit records and investigation reports. ...
详细信息
In this paper, we introduce DegExt, a graph-basedlanguage-independent keyphrase extractor, which extends the keyword extraction method described in Litvak and Last (graph-based keyword extraction for single-document ...
详细信息
In this paper, we introduce DegExt, a graph-basedlanguage-independent keyphrase extractor, which extends the keyword extraction method described in Litvak and Last (graph-based keyword extraction for single-document summarization. In: proceedings of the workshop on multi-source multilingual information extraction and summarization, pp 17-24, 2008). We compare DegExt with two state-of-the-art approaches to keyphrase extraction: GenEx (Turney in Inf Retr 2: 303-336, 2000) and TextRank (Mihalcea and Tarau in Textrank-bringing order into texts. In: proceedings of the conference on empirical methods in naturallanguageprocessing. Barcelona, Spain, 2004). We evaluated DegExt on collections of benchmark summaries in two different languages: English and Hebrew. Our experiments on the English corpus show that DegExt significantly outperforms TextRank and GenEx in terms of precision and area under curve for summaries of 15 keyphrases or more at the expense of a mostly non-significant decrease in recall and F-measure, when the extracted phrases are matched against gold standard collection. Due to DegExt's tendency to extract bigger phrases than GenEx and TextRank, when the single extracted words are considered, DegExt outperforms them both in terms of recall and F-measure. In the Hebrew corpus, DegExt performs the same as TextRank disregarding the number of keyphrases. An additional experiment shows that DegExt applied to the TextRank representation graphs outperforms the other systems in the text classification task. For documents in both languages, DegExt surpasses both GenEx and TextRank in terms of implementation simplicity and computational complexity.
Entity Resolution is the task of identifying which records in a database refer to the same entity. A standard machine learning pipeline for the entity resolution problem consists of three major components: blocking, p...
详细信息
Bootstrapping has recently become the focus of much attention in naturallanguageprocessing to reduce labeling cost. In bootstrapping, unlabeled instances can be harvested from the initial labeled "seed" se...
详细信息
The proceedings contain 8 papers. The topics discussed include: a new parametric estimation method for graph-based clustering;extracting signed social networks from text;using link analysis to discover interesting mes...
ISBN:
(纸本)9781937284374
The proceedings contain 8 papers. The topics discussed include: a new parametric estimation method for graph-based clustering;extracting signed social networks from text;using link analysis to discover interesting messages spread across twitter;graphbased similarity measures for synonym extraction from parsed text;semantic relatedness for biomedical word sense disambiguation;identifying untyped relation mentions in a corpus given an ontology;cause-effect relation learning;and bringing the associative ability to social tag recommendation.
Semantic graph representation of text is an important part of naturallanguageprocessing applications such as text summarisation. We have studied two ways of constructing the semantic graph of a document from depende...
详细信息
This article presents two methods allowing correcting complaints containing spelling errors, by using the spelling and contextual neighbors' graph. This graph is made of forms or words found in a learning corpus. ...
详细信息
Extractive summarization typically uses sentences as summarization units. In contrast, joint compression and summarization can use smaller units such as words and phrases, resulting in summaries containing more inform...
详细信息
The development of a multi-document summarizer using automatic key-phrase extraction has been described. This summarizer has two main parts;first part is automatic extraction of Key-phrases from the documents and seco...
详细信息
暂无评论