版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Univ Tabriz Dept Elect & Comp Engn Tabriz Iran
出 版 物:《COMPUTATIONAL INTELLIGENCE》 (计算智能)
年 卷 期:2020年第36卷第2期
页 面:783-812页
核心收录:
学科分类:08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)]
主 题:multidocument summarization term association term weighting weighted pattern
摘 要:Nowadays, it is necessary that users have access to information in a concise form without losing any critical information. Document summarization is an automatic process of generating a short form from a document. In itemset-based document summarization, the weights of all terms are considered the same. In this paper, a new approach is proposed for multidocument summarization based on weighted patterns and term association measures. In the present study, the weights of the terms are not equal in the context and are computed based on weighted frequent itemset mining. Indeed, the proposed method enriches frequent itemset mining by weighting the terms in the corpus. In addition, the relationships among the terms in the corpus have been considered using term association measures. Also, the statistical features such as sentence length and sentence position have been modified and matched to generate a summary based on the greedy method. Based on the results of the DUC 2002 and DUC 2004 datasets obtained by the ROUGE toolkit, the proposed approach can outperform the state-of-the-art approaches significantly.