Existing unsupervised keyphrase extraction methods typically emphasize the importance of the candidate keyphrase itself, ignoring other important factors such as the influence of uninfor-mative sentences. We hypothesi...
详细信息
Existing unsupervised keyphrase extraction methods typically emphasize the importance of the candidate keyphrase itself, ignoring other important factors such as the influence of uninfor-mative sentences. We hypothesize that the salient sentences of a document are particularly important as they are most likely to contain keyphrases, especially for long documents. To our knowledge, our work is the first attempt to exploit sentence salience for unsupervised keyphrase extraction by modeling hierarchical multi-granularity features. Specifically, we propose a novel position-aware graph-based unsupervised keyphrase extraction model, which includes two model variants. The pipeline model first extracts salient sentences from the document, followed by keyphrase extraction from the extracted salient sentences. In contrast to the pipeline model which models multi-granularity features in a two-stage paradigm, the joint model accounts for both sentence and phrase representations of the source document simultaneously via hierarchical graphs. Concretely, the sentence nodes are introduced as an inductive bias, injecting sentence-level information for determining the importance of candidate keyphrases. We compare our model against strong baselines on three benchmark datasets including Inspec, DUC 2001, and SemEval 2010. Experimental results show that the simple pipeline-based approach achieves promising results, indicating that keyphrase extraction task benefits from the salient sentence extraction task. The joint model, which mitigates the potential accumulated error of the pipeline model, gives the best performance and achieves new state-of-the-art results while generalizing better on data from different domains and with different lengths. In particular, for the SemEval 2010 dataset consisting of long documents, our joint model outperforms the strongest baseline UKERank by 3.48%, 3.69% and 4.84% in terms of F1@5, F1@10 and F1@15, respectively. We also conduct qualitative experimen
Query focused multi-document summarization is a process of automatic query biased text compression of a document set. Lately, the graph-based and ranking methods have been intensively attracted the researchers from ex...
详细信息
ISBN:
(纸本)9783319572611
Query focused multi-document summarization is a process of automatic query biased text compression of a document set. Lately, the graph-based and ranking methods have been intensively attracted the researchers from extractive document summarization domain. The uniform sentence connecteness or non-uniform document-sentence connecteness, such as sentence similarity weighted by document importance, were the main features used by work to date. Contrary, in this paper we present a novel five-layered heterogeneous graph model. It emphasizes not only sentence and document level relations but also the influence of lower level relations (e.g. a part of sentence similarity) and higher level relations (i.e. query to sentences similarity). based on this model, we developed an iterative sentence rankingalgorithm, based on the existing well known PageRank algorithm. Moreover, for text similarity calculations we used universal paraphrase embeddings that outperform various strong baselines on many text similarity tasks and many domains. Experiments are conducted on the DUC 2005 data sets and the ROUGE (Recall-Oriented Understudy for Gisting Evaluation) evaluation results demonstrate the advantages of the proposed approach.
graph-based ranking algorithm such as TextRank shows a remarkable effect on keyword extraction. However, these algorithms build graphs only considering the lexical sequence of the documents. Hence, graphs generated by...
详细信息
ISBN:
(数字)9783319320557
ISBN:
(纸本)9783319320557;9783319320540
graph-based ranking algorithm such as TextRank shows a remarkable effect on keyword extraction. However, these algorithms build graphs only considering the lexical sequence of the documents. Hence, graphs generated by these algorithm can not reflect the semantic relationships between documents. In this paper, we demonstrate that there exists an information loss in the graph-building process from textual documents to graphs. These loss will lead to the misjudgment of the algorithm. In order to solve this problem, we propose a new approach called Topic-based TextRank. Different from the traditional algorithm, our approach takes the lexical meaning of the text unit (i.e. words and phrase) into account. The result of our experiments shows that our proposed algorithm can outperform the state-of-the-art algorithms.
Extractive multi-document summarization systems usually rank sentences in a document set with some ranking strategy and then select a few highly ranked sentences into the summary. One of the most popular ranking algor...
详细信息
Extractive multi-document summarization systems usually rank sentences in a document set with some ranking strategy and then select a few highly ranked sentences into the summary. One of the most popular rankingalgorithms is the graph-based ranking algorithm. In this paper, we investigate making use of semantic role information to enhance the graph-based ranking algorithm for multi-document summarization. We first parse the sentences and obtain the semantic roles, and then propose a novel SRRank algorithm and two extensions to make better use of the semantic role information. Our proposed algorithms can simultaneously rank the sentences, semantic roles and words in a heterogeneous ranking process. Experimental results on two DUC datasets demonstrate that our proposed algorithms significantly outperform a few baselines, and the semantic role information is validated to be very helpful for multi-document summarization.
In this study, we propose a model for generating single-document abstractive summaries, based on the conceptual representation of the text. Although there are studies that take into account the partial syntactic or se...
详细信息
In this study, we propose a model for generating single-document abstractive summaries, based on the conceptual representation of the text. Although there are studies that take into account the partial syntactic or semantic representation of the text, so far, a complete semantic representation of texts has not been used for generating summaries. Our model uses a complete semantic representation of text by means of conceptual graph structures. In this context, the task of generating the summary is reduced to summarize the set of corresponding conceptual graphs. In order to do this, a set of operations on graphs is applied: generalization, join or association, ranking, and pruning. Furthermore, a hierarchy of concepts (WordNet) and heuristic rules based on the semantic patterns from VerbNet are used in order to support such operations. The resulting set of graphs depicts the text summary at the conceptual level. The method was evaluated on the DUC 2003 data collection. The results show that the method is effective for summarizing short texts.
In recent years, graph-based models and rankingalgorithms have drawn considerable attention from the extractive document summarization community. Most existing approaches take into account sentence-level relations (e...
详细信息
In recent years, graph-based models and rankingalgorithms have drawn considerable attention from the extractive document summarization community. Most existing approaches take into account sentence-level relations (e.g. sentence similarity) but neglect the difference among documents and the influence of documents on sentences. In this paper, we present a novel document-sensitive graph model that emphasizes the influence of global document set information on local sentence evaluation. By exploiting document-document and document-sentence relations, we distinguish intra-document sentence relations from inter-document sentence relations. In such a way, we move towards the goal of truly summarizing multiple documents rather than a single combined document. based on this model, we develop an iterative sentence rankingalgorithm, namely DsR (Document-Sensitive ranking). Automatic ROUGE evaluations on the DUC data sets show that DsR outperforms previous graph-based models in both generic and query-oriented summarization tasks.
暂无评论