In this paper, we introduce DegExt, a graph-basedlanguage-independent keyphrase extractor, which extends the keyword extraction method described in Litvak and Last (graph-based keyword extraction for single-document ...
详细信息
In this paper, we introduce DegExt, a graph-basedlanguage-independent keyphrase extractor, which extends the keyword extraction method described in Litvak and Last (graph-based keyword extraction for single-document summarization. In: proceedings of the workshop on multi-source multilingual information extraction and summarization, pp 17-24, 2008). We compare DegExt with two state-of-the-art approaches to keyphrase extraction: GenEx (Turney in Inf Retr 2: 303-336, 2000) and TextRank (Mihalcea and Tarau in Textrank-bringing order into texts. In: proceedings of the conference on empirical methods in naturallanguageprocessing. Barcelona, Spain, 2004). We evaluated DegExt on collections of benchmark summaries in two different languages: English and Hebrew. Our experiments on the English corpus show that DegExt significantly outperforms TextRank and GenEx in terms of precision and area under curve for summaries of 15 keyphrases or more at the expense of a mostly non-significant decrease in recall and F-measure, when the extracted phrases are matched against gold standard collection. Due to DegExt's tendency to extract bigger phrases than GenEx and TextRank, when the single extracted words are considered, DegExt outperforms them both in terms of recall and F-measure. In the Hebrew corpus, DegExt performs the same as TextRank disregarding the number of keyphrases. An additional experiment shows that DegExt applied to the TextRank representation graphs outperforms the other systems in the text classification task. For documents in both languages, DegExt surpasses both GenEx and TextRank in terms of implementation simplicity and computational complexity.
The proceedings contain 19 papers. The topics discussed include: effect of language and error models on efficiency of finite-state spell-checking and correction;practical finite state optimality theory;handling unknow...
The proceedings contain 19 papers. The topics discussed include: effect of language and error models on efficiency of finite-state spell-checking and correction;practical finite state optimality theory;handling unknown words in Arabic FST morphology;Urdu – Roman transliteration via finite state transducers;integrating aspectually relevant properties of verbs into a morphological analyzer for english;finite-state technology in a verse-making tool;DAGGER: a toolkit for automata on directed acyclic graphs;WFST-basedgrapheme-to-phoneme conversion: open source tools for alignment, model-building and decoding;and finite-state acoustic and translation model composition in statistical speech translation: empirical assessment.
Bootstrapping has recently become the focus of much attention in naturallanguageprocessing to reduce labeling cost. In bootstrapping, unlabeled instances can be harvested from the initial labeled "seed" se...
详细信息
The proceedings contain 9 papers. The topics discussed include: a combination of topic models with max-margin learning for relation detection;nonparametric Bayesian word sense induction;invariants and variability of s...
ISBN:
(纸本)9781937284008
The proceedings contain 9 papers. The topics discussed include: a combination of topic models with max-margin learning for relation detection;nonparametric Bayesian word sense induction;invariants and variability of synonymy networks: self mediated agreement by confluence;word sense induction by community detection;using a Wikipedia-based semantic relatedness measure for document clustering;GrawlTCQ: terminology and corpora building by ranking simultaneously terms, queries and documents using graph random walks;simultaneous similarity learning and feature-weight learning for document clustering;unrestricted quantifier scope disambiguation;and from ranked words to dependency trees: two-stage unsupervised non-projective dependency parsing.
The proceedings contain 17 papers. The topics discussed include: graph-based clustering for computational linguistics: a survey;towards the automatic creation of a wordnet from a term-based lexical network;an investig...
ISBN:
(纸本)1932432779
The proceedings contain 17 papers. The topics discussed include: graph-based clustering for computational linguistics: a survey;towards the automatic creation of a wordnet from a term-based lexical network;an investigation on the influence of frequency on the lexical organization of verbs;robust and efficient page rank for word sense disambiguation;hierarchical spectral partitioning of bipartite graphs to cluster dialects and identify distinguishing features;a character-based intersection graph approach to linguistic phylogeny;spectral approaches to learning in the graph domain;cross-lingual comparison between distributionally determined word similarity networks;co-occurrence cluster features for lexical substitutions in context;contextually-mediated semantic similarity graphs for topic segmentation;and experiments with CST-based multidocument summarization.
The proceedings contain 16 papers. The topics discussed include: graph-based clustering for computational linguistics: a survey;towards the automatic creation of a wordnet from a term-based lexical network;an investig...
详细信息
ISBN:
(纸本)1932432779
The proceedings contain 16 papers. The topics discussed include: graph-based clustering for computational linguistics: a survey;towards the automatic creation of a wordnet from a term-based lexical network;an investigation on the influence of frequency on the lexical organization of verbs;robust and efficient page rank for word sense disambiguation;hierarchical spectral partitioning of bipartite graphs to cluster dialects and identify distinguishing features;a character-based intersection graph approach to linguistic phylogeny;spectral approaches to learning in the graph domain;and cross-lingual comparison between distributionally determined word similarity networks.
The proceedings contain 12 papers. The special focus in this conference is on naturallanguageprocessing. The topics include: Preparing verbalex printed edition;web application for semantic network editing;portable l...
The proceedings contain 12 papers. The special focus in this conference is on naturallanguageprocessing. The topics include: Preparing verbalex printed edition;web application for semantic network editing;portable lexical analysis for parsing of morphologically-rich languages;acquiring data for textual entailment recognition;semi-automatic theme-rheme identification;intrinsic methods for comparison of corpora;typos in Czech corpora;expanding translation memories;methods for detection of word usage over time;towards the realistic naturallanguage representations and type-based search of idiomatic expression.
Semantic graph representation of text is an important part of naturallanguageprocessing applications such as text summarisation. We have studied two ways of constructing the semantic graph of a document from depende...
详细信息
The proceedings contain 22 papers. The topics discussed include: a dataset for Arabic textual entailment;answering questions from multiple documents – the role of multi-document summarization;multi-document summariza...
The proceedings contain 22 papers. The topics discussed include: a dataset for Arabic textual entailment;answering questions from multiple documents – the role of multi-document summarization;multi-document summarization using automatic key-phrase extraction;automatic evaluation of summary using textual entailment;towards a discourse model for knowledge elicitation;detecting negated and uncertain information in biomedical and review texts;cross-language plagiarism detection methods;rule-based named entity extraction for ontology population;towards definition extraction using conditional random fields;and event-centered simplification of news stories.
The growing need for Chinese naturallanguageprocessing (NLP) is largely in a range of research and commercial applications. However, most of the currently Chinese NLP tools or components still have a wide range of i...
详细信息
暂无评论