In this paper, we have presented an optimization approach to documentsummarization. The potential of optimization based documentsummarization models has not been well explored to date. This is partially the difficul...
详细信息
In this paper, we have presented an optimization approach to documentsummarization. The potential of optimization based documentsummarization models has not been well explored to date. This is partially the difficulty to formulate the criteria used for objective assessment. We modeled documentsummarization as the linear and nonlinear optimization problems. These models generally attempt simultaneously to balance coverage and diversity in the summary. To solve the optimization problem we developed a novel particle swarm optimization (PSO) algorithm. Experiments showed our linear and nonlinear models produce very competitive results, which significantly outperform the NIST baselines in both years. More important, although linear and nonlinear models are comparable to the top three systems S24, S15, and S12 in the DUC2006, they are even superior to the best participating system in the DUC2005.
With the rapid growth of information on the Internet and electronic government recently, automatic multi-documentsummarization has become an important task. Multi-documentsummarization is an optimization problem req...
详细信息
With the rapid growth of information on the Internet and electronic government recently, automatic multi-documentsummarization has become an important task. Multi-documentsummarization is an optimization problem requiring simultaneous optimization of more than one objective function. In this study, when building summaries from multiple documents, we attempt to balance two objectives, content coverage and redundancy. Our goal is to investigate three fundamental aspects of the problem, i.e. designing an optimization model, solving the optimization problem and finding the solution to the best summary. We model multi-documentsummarization as a Quadratic Boolean Programing (QBP) problem where the objective function is a weighted combination of the content coverage and redundancy objectives. The objective function measures the possible summaries based on the identified salient sentences and overlap information between selected sentences. An innovative aspect of our model lies in its ability to remove redundancy while selecting representative sentences. The QBP problem has been solved by using a binary differential evolution algorithm. Evaluation of the model has been performed on the DUC2002, DUC2004 and DUC2006 data sets. We have evaluated our model automatically using ROUGE toolkit and reported the significance of our results through 95% confidence intervals. The experimental results show that the optimization-based approach for documentsummarization is truly a promising research direction. (C) 2012 Elsevier Ltd. All rights reserved.
We propose a novel methodology for extractive, genericsummarization of text documents. The Maximum Independent Set, which has not been used previously in any summarization study, has been utilized within the context ...
详细信息
We propose a novel methodology for extractive, genericsummarization of text documents. The Maximum Independent Set, which has not been used previously in any summarization study, has been utilized within the context of this study. In addition, a text processing tool, which we named KUSH, is suggested in order to preserve the semantic cohesion between sentences in the representation stage of introductory texts. Our anticipation was that the set of sentences corresponding to the nodes in the independent set should be excluded from the summary. Based on this anticipation, the nodes forming the Independent Set on the graphs are identified and removed from the graph. Thus, prior to quantification of the effect of the nodes on the global graph, a limitation is applied on the documents to be summarized. This limitation prevents repetition of word groups to be included in the summary. Performance of the proposed approach on the document Understanding Conference (DUC-2002 and DUC-2004) datasets was calculated using ROUGE evaluation metrics. The developed model achieved a 0.38072 ROUGE performance value for 100-word summaries, 0.51954 for 200-word summaries, and 0.59208 for 400-word summaries. The values reported throughout the experimental processes of the study reveal the contribution of this innovative method. (C) 2019 Production and hosting by Elsevier B.V. on behalf of Faculty of Computers and Artificial Intelligence, Cairo University.
暂无评论