检索结果-内蒙古大学图书馆

Two-Phase multidocument summarization Through Content-Attention-Based Subtopic Detection

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS 2021年第6期8卷 1379-1392页

作者： Dong, Luobing Satpute, Meghana N. Wu, Weili Du, Ding-Zhu Xidian Univ Sch Comp Sci & Technol Xian 710071 Peoples R China Univ Texas Dallas Dept Comp Sci Richardson TX 75080 USA

multidocument summarization problem deals with extracting main information and ideas from a set of related documents. Solution to this problem is to find an extraction strategy that aims at finding a small subset of sentences that is able to cover the most important information about the whole document set. Although a large number of machine-learning-based methods have shown great promise, the lack of high-quality training data poses an inherent obstacle to them. Furthermore, because of the proliferation of low-quality documents on the Internet, the existing summarization strategies, which are merely based on statistical features, get poor performance. In this article, we propose a new two-phase multidocument summarization strategy using content attention-based subtopic detection. First, inspired by distance dynamics-based community detection mechanism, we extract subtopics from the set of documents by having insight into their own content attention and also underlying semantic relations. Instead of complicated neural attention mechanisms, we propose a simple iteration-based content attention method to complete the subtopic detection task. Second, we formulate summarization from different subtopics as a combinatorial optimization problem of minimizing sentence distance and maximizing topic diversity. We prove the submodularity of the above optimization problem, which allows us to propose a new multidocument summarization algorithm based on the greedy mechanism. Finally, we experimentally validate our new algorithms on BBC news summary and wikiHow data. The results show our new algorithms outperform the state-of-the-art methods.

关键词： Semantics Optimization Indexes Feature extraction Internet Heuristic algorithms Search engines BBC news dataset content attention-based subtopic detection (CABSD) algorithm content attention document set dynamic distance low-quality document multidocument summarization objective function optimization semantic attraction semantic distance submodularity subtopic detection subtopic set

来源：评论

学校读者我要写书评

暂无评论

The combination of term relations analysis and weighted frequent itemset model for multidocument summarization

引用

COMPUTATIONAL INTELLIGENCE 2020年第2期36卷 783-812页

作者： Chaghari, Arash Feizi-Derakhshi, Mohammad-Reza Balafar, Mohammad-Ali Univ Tabriz Dept Elect & Comp Engn Tabriz Iran

Nowadays, it is necessary that users have access to information in a concise form without losing any critical information. Document summarization is an automatic process of generating a short form from a document. In itemset-based document summarization, the weights of all terms are considered the same. In this paper, a new approach is proposed for multidocument summarization based on weighted patterns and term association measures. In the present study, the weights of the terms are not equal in the context and are computed based on weighted frequent itemset mining. Indeed, the proposed method enriches frequent itemset mining by weighting the terms in the corpus. In addition, the relationships among the terms in the corpus have been considered using term association measures. Also, the statistical features such as sentence length and sentence position have been modified and matched to generate a summary based on the greedy method. Based on the results of the DUC 2002 and DUC 2004 datasets obtained by the ROUGE toolkit, the proposed approach can outperform the state-of-the-art approaches significantly.

关键词： multidocument summarization term association term weighting weighted pattern

来源：评论

学校读者我要写书评

暂无评论

multidocument Aspect Classification for Aspect-Based Abstractive summarization

引用

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS 2024年第1期11卷 1483-1492页

作者： Wang, Ye Zhou, Yingmin Wang, Mengzhu Chen, Zhenghan Cai, Zhiping Chen, Junyang Leung, Victor C. M. Natl Univ Def Technol Coll Comp Changsha 410073 Peoples R China Baidu Inc Beijing 100080 Peoples R China Peking Univ Sch Software & Microelect Beijing 100091 Peoples R China Shenzhen Univ Coll Comp Sci & Software Engn Shenzhen 518061 Peoples R China

multidocument aspect-based summarization (AspSumm) aims to generate focused summaries based on the target aspects from a cluster of relevant documents. Generating such summaries can better satisfy readers' specific points of interest, as readers may have different concerns about the same articles. However, previous methods usually generate aspect-based summaries based on the given aspects without using the relationship among aspects to assist in the summarization. In this work, we propose a two-stage general framework for multidocument AspSumm. The model first discovers the latent relationship among aspects and then uses relevant sentences selected by aspect discovery to generate abstractive summaries. We exploit latent dependencies among aspects using a tag mask training (TMT) strategy, which increases the interpretability of the model. In addition to improvements in summarization over aspect-based strong baselines, experimental results show that our proposed model can accurately discover multidomain aspects on the WikiAsp dataset.

关键词： Training Task analysis Transformers Encyclopedias Online services Internet Generators Aspect-based summarization (AspSumm) multidocument summarization pretrained model

来源：评论

学校读者我要写书评

暂无评论

An ontology-based information extraction and summarization of multiple news articles

引用

International Journal of Information Technology (Singapore) 2020年第2期12卷 547-557页

作者： Venkatachalam, Swathilakshmi Subbiah, Lakshmana Pandian Rajendiran, Regan Venkatachalam, Nithya Department of Computer Science and Engineering Sri Manakula Vinayagar Engineering College Puducherry India Department of Computer Science and Engineering Pondicherry Engineering College Puducherry India Department of Computer Science and Engineering University College of Engineering Anna University Villupuram India Department of Computer Science and Engineering University College of Engineering Anna University Panruti India

Ontology based information extraction and summarization process in news content retrieved the news based on the user query. The user query should be of any context about the news content. So that, users need not be aware of the information that they search. This type of ontology based multi-document summarization system mainly focused on abstract document found in the news content. So, we proposed a work to combine information extraction, Abstraction based summarization and natural language generation to generate efficient multi-document summarization from multiple news articles. In this paper, we proposed an IE-supported summarization system that automatically extracts keyword for text summarization in Tamil e-newspaper datasets. The system first requires that the user need to select the particular domain. A set of news articles related to the domain is searched by search engine. One or more scenario templates (likes extraction domains such as Science, and technology, sports and natural disasters) are used to activate the system. The user optionally provides filters and preferences on the scenario template slots, specifying what information he/she wants to be reported in the summary. Then the system invokes the Summarizer to generate a natural language summary of the extracted information subject to the user’s constraints. © 2019, Bharati Vidyapeeth's Institute of Computer Applications and Management.

关键词： Information extraction multidocument summarization Ontology Tamil

来源：评论

学校读者我要写书评

暂无评论

A Hybrid Ensemble Word Embedding based Classification Model for Multi-document summarization Process on Large Multi-domain Document Sets

引用

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS 2021年第9期12卷 141-152页

作者： Devi, S. Anjali Sivakumar, S. Koneru Lakshmaiah Educ Fdn Dept Comp Sci & Engn Guntur 522502 Andhra Pradesh India

Contextual text feature extraction and classification play a vital role in the multi-document summarization process. Natural language processing (NLP) is one of the essential text mining tools which is used to preprocess and analyze the large document sets. Most of the conventional single document feature extraction measures are independent of contextual relationships among the different contextual feature sets for the document categorization process. Also, these conventional word embedding models such as TF-ID, ITF-ID and Glove are difficult to integrate into the multi-domain feature extraction and classification process due to a high misclassification rate and large candidate sets. To address these concerns, an advanced multi-document summarization framework was developed and tested on number of large training datasets. In this work, a hybrid multi-domain glove word embedding model, multi-document clustering and classification model were implemented to improve the multi-document summarization process for multi-domain document sets. Experimental results prove that the proposed multi-document summarization approach has improved efficiency in terms of accuracy, precision, recall, F-score and run time (ms) than the existing models.

关键词： Word embedding models text classification multidocument summarization contextual feature similarity natural language processing

来源：评论

学校读者我要写书评

暂无评论

multidocument summarization of Engineering Papers Based on Macro- and Microstructure

引用

JOURNAL OF COMPUTING AND INFORMATION SCIENCE IN ENGINEERING 2011年第1期11卷 11008-11008页

作者： Zhan, Jiaming Liu, Ying Loh, Han Tong Natl Univ Singapore Singapore 117576 Singapore

This paper focuses on automatic summarization of multiple engineering papers. A summarization approach based on documents' macro-and microstructure has been proposed. The macrostructure consists of a list of ranked topics from engineering papers. Topics are discovered by extracting and grouping frequently appearing word sequences into equivalence classes. Hence, the macrostructure symbolically presents the topical links in different papers. Meanwhile, the microstructure is defined as the rhetorical structure within a single paper. The identification of microstructure is approached as a classification problem. Each sentence in a paper is automatically labeled with one of the predefined rhetorical categories. Unlike existing summarization methods that first separate documents into nonoverlapping clusters and then summarize each cluster individually, our approach aims to summarize multiple documents according to the characteristics suggested at macro-and microstructure levels. The experimental study showed that our proposed approach outperformed peer systems in terms of recall-oriented understudy for gisting evaluation scores and readers' responsiveness. In an independent manual categorization task using the summaries generated by our approach and peer systems, we also performed better in terms of precision and recall. [DOI: 10.1115/1.3563048]

关键词： multidocument summarization macrostructure microstructure document structure analysis summarization evaluation

来源：评论

学校读者我要写书评

暂无评论

Query Focused Multi-document summarization Based on Five-Layered Graph and Universal Paraphrastic Embeddings 6th

Query Focused Multi-document Summarization Based on Five-Lay...

引用

6th Computer Science On-Line Conference (CSOC)

作者： Canhasi, Ercan Gjirafa Inc 28A Prishtine Kosovo

ISBN: (纸本)9783319572611

Query focused multi-document summarization is a process of automatic query biased text compression of a document set. Lately, the graph-based and ranking methods have been intensively attracted the researchers from extractive document summarization domain. The uniform sentence connecteness or non-uniform document-sentence connecteness, such as sentence similarity weighted by document importance, were the main features used by work to date. Contrary, in this paper we present a novel five-layered heterogeneous graph model. It emphasizes not only sentence and document level relations but also the influence of lower level relations (e.g. a part of sentence similarity) and higher level relations (i.e. query to sentences similarity). Based on this model, we developed an iterative sentence ranking algorithm, based on the existing well known PageRank algorithm. Moreover, for text similarity calculations we used universal paraphrase embeddings that outperform various strong baselines on many text similarity tasks and many domains. Experiments are conducted on the DUC 2005 data sets and the ROUGE (Recall-Oriented Understudy for Gisting Evaluation) evaluation results demonstrate the advantages of the proposed approach.

关键词： multidocument summarization Graph-based summarization Graph-based ranking algorithm PageRank

来源：评论

学校读者我要写书评

暂无评论

Unsupervised Query-Focused Multi-Document summarization using the Cross Entropy Method 17

Unsupervised Query-Focused Multi-Document Summarization usin...

引用

40th International ACM SIGIR conference on research and development in Information Retrieval

作者： Feigenblat, Guy Roitman, Haggai Boni, Odellia Konopnicki, David IBM Res Haifa IL-31905 Haifa Israel

ISBN: (纸本)9781450350228

We present a novel unsupervised query-focused multi-document summarization approach. To this end, we generate a summary by extracting a subset of sentences using the Cross-Entropy (CE) Method. The proposed approach is generic and requires no domain knowledge. Using an evaluation over DUC 2005-2007 datasets with several other state-of-the-art baseline methods, we demonstrate that, our approach is both effective and efficient.

关键词： cross entropy method multidocument summarization

来源：评论

学校读者我要写书评

暂无评论

Integrating Document Clustering and multidocument summarization

引用

ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA 2011年第3期5卷 14-14页

作者： Wang, Dingding Zhu, Shenghuo Li, Tao Chi, Yun Gong, Yihong Florida Int Univ Sch Comp Sci Miami FL 33199 USA NEC Labs Amer Inc Cupertino CA 95014 USA

Document understanding techniques such as document clustering and multidocument summarization have been receiving much attention recently. Current document clustering methods usually represent the given collection of documents as a document-term matrix and then conduct the clustering process. Although many of these clustering methods can group the documents effectively, it is still hard for people to capture the meaning of the documents since there is no satisfactory interpretation for each document cluster. A straightforward solution is to first cluster the documents and then summarize each document cluster using summarization methods. However, most of the current summarization methods are solely based on the sentence-term matrix and ignore the context dependence of the sentences. As a result, the generated summaries lack guidance from the document clusters. In this article, we propose a new language model to simultaneously cluster and summarize documents by making use of both the document-term and sentence-term matrices. By utilizing the mutual influence of document clustering and summarization, our method makes;(1) a better document clustering method with more meaningful interpretation;and (2) an effective document summarization method with guidance from document clustering. Experimental results on various document datasets show the effectiveness of our proposed method and the high interpretability of the generated summaries.

关键词： Document clustering multidocument summarization nonnegative matrix factorization with given bases

来源：评论

学校读者我要写书评

暂无评论

The Two-Stage Unsupervised Approach to multidocument summarization

引用

AUTOMATIC CONTROL AND COMPUTER SCIENCES 2009年第5期43卷 276-284页

作者： Alyguliyev, R. M. Natl Acad Sci Azerbaijan Inst Informat Technol Ul F Agaeva 9 AZ-1141 Baku Azerbaijan

This paper suggests an approach for creating a summary for a set of documents with revealing the topics and extracting informative sentences. The topics are determined through clustering of sentences, and the informative sentences are extracted using the ranking algorithm. The result of the summarization has been shown depends on the clustering method, the ranking algorithm, and the similarity measure. The experiments on an open benchmark datasets DUC2001 and DUC2002 have showed that the suggested clustering methods and the ranking algorithm show better results than the known k-means method and the ranking algorithms PageRank and HITS.

关键词： multidocument summarization sentence clustering sentence ranking

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：