Automatic text summarization is a topic of great interest in many fields of knowledge. Particularly, queryoriented extractive multi-document text summarization methods have increased their importance recently, since ...
详细信息
Automatic text summarization is a topic of great interest in many fields of knowledge. Particularly, queryoriented extractive multi-document text summarization methods have increased their importance recently, since they can automatically generate a summary according to a query given by the user. One way to address this problem is by multi-objective optimization approaches. In this paper, a memetic algorithm, specifically a Multi-Objective Shuffled Frog-Leaping Algorithm (MOSFLA) has been developed, implemented, and applied to solve the query-oriented extractive multi-document text summarization problem. Experiments have been conducted with datasets from Text Analysis Conference (TAC), and the obtained results have been evaluated with Recall-oriented Understudy for Gisting Evaluation (ROUGE) metrics. The results have shown that the proposed approach has achieved important improvements with respect to the works of scientific literature. Specifically, 25.41%, 7.13%, and 30.22% of percentage improvements in ROUGE-1, ROUGE-2, and ROUGESU4 scores have been respectively reached. In addition, MOSFLA has been applied to medicine texts from the Topically Diverse query Focus summarization (TD-QFS) dataset as a case study.
Today there is a huge amount of information from a lot of various resources such as World Wide Web, news articles, e-books and emails. On the one hand, human beings face a shortage of time, and on the other hand, due ...
详细信息
ISBN:
(纸本)9781538653647
Today there is a huge amount of information from a lot of various resources such as World Wide Web, news articles, e-books and emails. On the one hand, human beings face a shortage of time, and on the other hand, due to the social and occupational needs, they need to obtain the most important information from various resources. Automatic text summarization enables us to access the most important content in the shortest possible time. In this paper a query-oriented text summarization technique is proposed by extracting the most informative sentences. To this end, a number of features are extracted from the sentences, each of which evaluates the importance of the sentences from an aspect. In this paper 11 of the best features are extracted from each of the sentences. This paper has shown that use of more suitable features leads to improved summaries generated. In order to evaluate the automatic generated summaries, the ROUGE criterion has been used.
Traditional graph based sentence ranking algorithms such as LexRank and HITS model the documents to be summarized as a text graph where nodes represent sentences and edges represent pairwise relations. Such modeling c...
详细信息
Traditional graph based sentence ranking algorithms such as LexRank and HITS model the documents to be summarized as a text graph where nodes represent sentences and edges represent pairwise relations. Such modeling cannot capture complex group relationship shared among multiple sentences which can be useful for sentence ranking. In this paper, we propose to take advantage of hypergraph to remedy this defect. In a text hypergraph, nodes still represent sentences, yet hyperedges are allowed to connect more than two sentences. With a text hypergraph, we are thus able to integrate both group relationship and pairwise relationship into a unified framework. Then, a hypergraph based semi-supervised sentence ranking algorithm is developed for query-oriented extractive summarization, where the influence of query is propagated to sentences through the structure of the constructed text hypergraph. When evaluated on DUC datasets, performance of our proposed approach shows improvements compared to a number of baseline systems. (C) 2013 Elsevier Inc. All rights reserved.
We present methods of extractive query-oriented single-document summarization using a deep auto encoder (AE) to compute a feature space from the term-frequency (tf) input. Our experiments explore both local and global...
详细信息
We present methods of extractive query-oriented single-document summarization using a deep auto encoder (AE) to compute a feature space from the term-frequency (tf) input. Our experiments explore both local and global vocabularies. We investigate the effect of adding small random noise to local tf as the input representation of AE, and propose an ensemble of such noisy AEs which we call the Ensemble Noisy Auto-Encoder (ENAE). ENAE is a stochastic version of an AE that adds noise to the input text and selects the top sentences from an ensemble of noisy runs. In each individual experiment of the ensemble, a different randomly generated noise is added to the input representation. This architecture changes the application of the AE from a deterministic feed-forward network to a stochastic runtime model. Experiments show that the AE using local vocabularies clearly provide a more discriminative feature space and improves the recall on average 11.2%. The ENAE can make further improvements, particularly in selecting informative sentences. To cover a wide range of topics and structures, we perform experiments on two different publicly available email corpora that are specifically designed for text summarization. We used ROUGE as a fully automatic metric in text summarization and we presented the average ROUGE-2 recall for all experiments. (C) 2016 Elsevier Ltd. All rights reserved.
query-oriented summarization addresses the problem of information overload and help people get the main ideas within a short time. Summaries are composed by sentences. So, the basic idea of composing a salient summary...
详细信息
ISBN:
(纸本)9783319635798;9783319635781
query-oriented summarization addresses the problem of information overload and help people get the main ideas within a short time. Summaries are composed by sentences. So, the basic idea of composing a salient summary is to construct quality sentences both for user specific queries and multiple documents. Sentence embedding has been shown effective in summarization tasks. However, these methods lack of the latent topic structure of contents. Hence, the summary lies only on vector space can hardly capture multi-topical content. In this paper, our proposed model incorporates the topical aspects and continuous vector representations, which jointly learns semantic rich representations encoded by vectors. Then, leveraged by topic filtering and embedding ranking model, the summarization can select desirable salient sentences. Experiments demonstrate outstanding performance of our proposed model from the perspectives of prominent topics and semantic coherence.
Capturing the compositional process from words to documents is a key challenge in natural language processing and information retrieval: Extractive style query-oriented multi-document summarization generates a summary...
详细信息
Capturing the compositional process from words to documents is a key challenge in natural language processing and information retrieval: Extractive style query-oriented multi-document summarization generates a summary by extracting a proper set of sentences from multiple documents based on pre-given query. This paper proposes a novel document summarization framework based on deep learning model, which has been shown outstanding extraction ability in many real-world applications. The framework consists of three parts: concepts extraction, summary generation, and reconstruction validation. A new query-oriented extraction technique is proposed to extract information distributed in multiple documents. Then, the whole deep architecture is fine-tuned by minimizing the information loss in reconstruction validation. According to the concepts extracted from deep architecture layer by layer, dynamic programming is used to seek most informative set of sentences for the summary. Experiment on three benchmark datasets (DUC 2005, 2006, and 2007) assess and confirm the effectiveness of the proposed framework and algorithms. Experiment results show that the proposed method outperforms state-of-the-art extractive summarization approaches. Moreover, we also provide the statistical analysis of query words based on Amazon's Mechanical Turk (MTurk) crowdsourcing platform. There exists underlying relationships from topic words to the content which can contribute to summarization task. (C) 2015 Elsevier Ltd. All rights reserved.
In this paper, we investigate how to combine the link-aware and link-free information in sentence ranking for query-oriented summarization. Although the link structure has been emphasized in the existing graph-based s...
详细信息
ISBN:
(纸本)9783642008306
In this paper, we investigate how to combine the link-aware and link-free information in sentence ranking for query-oriented summarization. Although the link structure has been emphasized in the existing graph-based summarization models, there is lack of pertinent analysis on how to use the links. By contrasting the text graph with the web graph. we propose to evaluate significance of sentences based on neighborhood graph model. Taking the advantage of the link information provided on the graph, each sentence is evaluated according to its own value as well as the cumulative impacts from its neighbors. For a task like query-oriented summarization, it is critical to explore how to reflect the influence of the query. To better incorporate query information into the model. we further design a query-sensitive similarity measure to estimate the association between a pair of sentences. When evaluated on DUC 2005 dataset. the results of the pro-posed approach are promising.
In this paper, we exploit the role of named entities in measuring document/query sentence relevance in query-oriented extractive summarization. Named entity driven associations are defined as the informative, semantic...
详细信息
ISBN:
(纸本)9783540891963
In this paper, we exploit the role of named entities in measuring document/query sentence relevance in query-oriented extractive summarization. Named entity driven associations are defined as the informative, semantic-sensitive text bi-grams consisting of at least one named entity or the semantic class of a named entity. They are extracted automatically according to seven pre-defined templates. Question types are also taken into consideration if they are available when dealing with query questions. To alleviate problems with low coverage, named entity based association and uni-gram models are integrated together to compensate each other in similarity calculation. Automatic ROUGE evaluations indicate that the proposed idea can produce a very good system that among the best-performing system at the DUC 2005.
In recent years, graph-based models and ranking algorithms have drawn considerable attention from the extractive document summarization community. Most existing approaches take into account sentence-level relations (e...
详细信息
In recent years, graph-based models and ranking algorithms have drawn considerable attention from the extractive document summarization community. Most existing approaches take into account sentence-level relations (e.g. sentence similarity) but neglect the difference among documents and the influence of documents on sentences. In this paper, we present a novel document-sensitive graph model that emphasizes the influence of global document set information on local sentence evaluation. By exploiting document-document and document-sentence relations, we distinguish intra-document sentence relations from inter-document sentence relations. In such a way, we move towards the goal of truly summarizing multiple documents rather than a single combined document. Based on this model, we develop an iterative sentence ranking algorithm, namely DsR (Document-Sensitive Ranking). Automatic ROUGE evaluations on the DUC data sets show that DsR outperforms previous graph-based models in both generic and query-oriented summarization tasks.
In this paper, we develop a novel cluster-sensitive graph model for query-oriented multi-document summarization. Upon it, an iterative algorithm, namely QoCsR, is built. As there is existence of natural clusters in th...
详细信息
ISBN:
(纸本)9783540786450
In this paper, we develop a novel cluster-sensitive graph model for query-oriented multi-document summarization. Upon it, an iterative algorithm, namely QoCsR, is built. As there is existence of natural clusters in the graph in the case that a document comprises a collection of sentences, we suggest distinguishing intra- and inter-document sentence relations in order to take into consideration the influence of cluster (i.e. document) global information on local sentence evaluation. In our model, five kinds of relations are involved among the three objects, i.e. document, sentence and query. Three of them are new and normally ignored in previous graph-based models. All these relations are then appropriately formulated in the QoCsR algorithm though in different ways. ROUGE evaluations shows that QoCsR can outperform the best DUC 2005 participating systems.
暂无评论