检索结果-内蒙古大学图书馆

multi-document text summarization Using Sentence Extraction

International Conference on Artificial Intelligence and Evolutionary Computations in Engineering Systems (ICAIECES)

作者： Ahuja, Ravinder Anand, Willson Jaypee Inst Informat Technol Noida 201307 Uttar Pradesh India

ISBN: (纸本)9789811031748;9789811031731

This paper presents a method for generating multi-document text summary building on single document text summaries and by combining those single document text summaries using cosine similarity. For the generation of single document text summaries features like document feature, sentence position feature, normalized sentence length feature, numerical data feature, and proper noun feature are used. Single document text summaries are combined after calculating cosine similarity between the different single document text summaries generated and from each combination, sentences with high total sentence weight are extracted to generate multi-document text summary. The average F-measure of 0.30493 on DUC 2002 dataset has been observed, which is comparable to two of five top performing multi-document text summarization systems reported on the DUC 2002 dataset.

关键词： multi-document text summarization Cosine similarity Extraction of features

来源：评论

学校读者我要写书评

暂无评论

Relevance of Sentence Features for multi-document text summarization Using Human-Written Reference Summaries 16th

Relevance of Sentence Features for Multi-document Text Summa...

引用

16th Mexican Conference on Pattern Recognition (MCPR)

作者： Neri Mendoza, Veronica Ledeneva, Yulia Arnulfo Garcia-Hernandez, Rene Hernandez Castaneda, Angel Autonomous Univ State Mexico Inst Literario 100 Toluca 50000 State Of Mexico Mexico

ISBN: (纸本)9783031628351;9783031628368

For multi-document text summarization, text features are fundamental because they determine the importance of each sentence from source documents. Therefore, selected sentences create a summary that represents the most essential information. In the state-of-the-art, several techniques and methods have been proposed that use different text features and select sentences. However, some features may be more important than others. Thus, differentiating between important and unimportant features is a difficult task. This work proposes a method to generate extractive multi-document text summaries based on statistical and linguistic text features. We calculated the relevance coefficient of each feature to determine its degree of importance through the human-written reference summaries. To perform such calculus, we use 19 text features. After this calculus, we employ a Genetic Algorithm (GA) that selects sentences to generate summaries. In a general way, the proposed method consists of three steps: feature weighting, concatenation and pre-processing of source documents, and feature extraction with sentence selection. In our experiments, we used the DUC01 dataset in two different lengths to evaluate the performance of the proposed method. The results show improvement over state-of-the-art methods.

关键词： Sentence features multi-document text summarization Statistical and linguistic text features Human-written reference summaries Genetic algorithm

来源：评论

学校读者我要写书评

暂无评论

multi-document extractive text summarization: A comparative assessment on features

引用

KNOWLEDGE-BASED SYSTEMS 2019年 183卷 104848-000页

作者： Mutlu, Begum Sezer, Ebru A. Akcayol, M. Ali Gazi Univ Dept Comp Engn TR-06570 Ankara Turkey Hacettepe Univ Dept Comp Engn TR-06800 Ankara Turkey

text summarization is the process of generating a brief version of a text that preserves the salient information of the text. For information retrieval, it is a good dimension reduction solution. In addition, it reduces the required reading time. This study focused on extracting informative summaries from multiple documents using commonly used hand-crafted features from the literature. The first investigation focused on the generation of a feature vector. The features were the number of sentences, term frequency, similarity with the title, term frequency-inverse sentence frequency, sentence position, sentence length, sentence-sentence similarity, bushy-path results, phrases of the sentence, proper nouns, n-gram co-occurrence, and length of the document. Secondly, several combinations of these features were examined and a shallow multi-layer perceptron and two differently modeled fuzzy inference systems were used to extract salient sentences from texts in the document Understanding Conference (DUC) dataset. The summarization performances of these models were evaluated using original classification performance metrics, and recall-oriented understudy for gisting evaluation (ROUGE)-n. This study recommended the use of fuzzy systems based on a feature vector and a fuzzy rule set for extractive text summarization. The extraction methods were evaluated against a changing compression ratio. Results of experiments showed that the implemented neural model tended to incorrectly infer sentences that were not considered salient by human annotators. However, for distinguishing between summary-worthy and summary-unworthy sentences, the fuzzy inference systems performed better than the utilized neural network, as well as better than the existing fuzzy inference-based text summarization approaches in the literature. (C) 2019 Elsevier B.V. All rights reserved.

关键词： multi-document text summarization Feature space Sentence scoring-extraction Artificial neural network Fuzzy inference system

来源：评论

学校读者我要写书评

暂无评论

Extractive text summarization of arabic multi-document using fuzzy C-means and Latent Dirichlet Allocation

引用

INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT 2024年第2期15卷 713-726页

作者： Al-Taani, Ahmad T. T. Al-Sayadi, Sami H. H. Yarmouk Univ Dept Comp Sci Irbid Jordan

In this research, we investigated the performance of the combination of fuzzy c-means and latent Dirichlet allocation algorithms for Arabic multi-document summarization. The summary should include the most essential sentences from multi-documents with the same topic. The TAC-2011 corpus is used for experiments, first, the documents in the corpus are clustered using fuzzy c-means algorithm. The aim of the clustering process here is to classify the documents according to their topics, e.g., economic, politic, sport, etc. The results are compared against some recent Arabic summarization approaches that used ant colony and discriminant analysis algorithms. The proposed approach has obtained competitive results compared to those recent approaches.

关键词： multi-document text summarization Arabic Language Extractive-based summarization Singular value decomposition (SVD) Fuzzy C-Means algorithm Latent Dirichlet allocation (LDA) algorithm

来源：评论

学校读者我要写书评

暂无评论

An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition

引用

BMC BIOINFORMATICS 2015年第1期16卷 1-28页

作者： Tsatsaronis, George Balikas, Georgios Malakasiotis, Prodromos Partalas, Ioannis Zschunke, Matthias Alvers, Michael R. Weissenborn, Dirk Krithara, Anastasia Petridis, Sergios Polychronopoulos, Dimitris Almirantis, Yannis Pavlopoulos, John Baskiotis, Nicolas Gallinari, Patrick Artieres, Thierry Ngomo, Axel-Cyrille Ngonga Heino, Norman Gaussier, Eric Barrio-Alvers, Liliana Schroeder, Michael Androutsopoulos, Ion Paliouras, Georgios Tech Univ Dresden Biotechnol Ctr D-01307 Dresden Germany Transinsight GmbH D-01307 Dresden Germany NCSR Demokritos Athens 60228 Greece Athens Univ Econ & Business Athens 10434 Greece Univ Paris 06 F-75005 Paris France Univ Leipzig D-04109 Leipzig Germany Univ Grenoble 1 F-38041 St Martin Dheres France

Background: This article provides an overview of the first BIOASQ challenge, a competition on large-scale biomedical semantic indexing and question answering (QA), which took place between March and September 2013. BIOASQ assesses the ability of systems to semantically index very large numbers of biomedical scientific articles, and to return concise and user-understandable answers to given natural language questions by combining information from biomedical articles and ontologies. Results: The 2013 BIOASQ competition comprised two tasks, Task 1a and Task 1b. In Task 1a participants were asked to automatically annotate new PUBMED documents with MESH headings. Twelve teams participated in Task 1a, with a total of 46 system runs submitted, and one of the teams performing consistently better than the MTI indexer used by NLM to suggest MESH headings to curators. Task 1b used benchmark datasets containing 29 development and 282 test English questions, along with gold standard (reference) answers, prepared by a team of biomedical experts from around Europe and participants had to automatically produce answers. Three teams participated in Task 1b, with 11 system runs. The BIOASQ infrastructure, including benchmark datasets, evaluation mechanisms, and the results of the participants and baseline methods, is publicly available. Conclusions: A publicly available evaluation infrastructure for biomedical semantic indexing and QA has been developed, which includes benchmark datasets, and can be used to evaluate systems that: assign MESH headings to published articles or to English questions;retrieve relevant RDF triples from ontologies, relevant articles and snippets from PUBMED Central;produce "exact" and paragraph-sized "ideal" answers (summaries). The results of the systems that participated in the 2013 BIOASQ competition are promising. In Task 1a one of the systems performed consistently better from the NLM's MTI indexer. In Task 1b the systems received high scores in the man

关键词： BIOASQ Competition Hierarchical text Classification Semantic indexing Information retrieval Passage retrieval Question answering multi-document text summarization

来源：评论

学校读者我要写书评

暂无评论

Detection of Difference between News Articles on the Same Topic Based on Sequential Comparison

Detection of Difference between News Articles on the Same To...

引用

19th European-Japanese Conference on Information Modelling and Knowledge Bases (EJC)

作者： Noro, Tomoya Tokuda, Takehiro Tokyo Inst Technol Dept Comp Sci Tokyo Japan

ISBN: (纸本)9781607504771;9781607500896

Currently, a lot of news articles are published on the Web, and it is getting easier for us to read them. However, the number of articles are too large for us to read all of them. Although some Web sites cluster/classify news articles into some topics (categories), it is not enough since a large number of articles are still in each topic. Detecting difference between articles on one topic will be one of the solution to comprehend the whole topic. In this paper, we propose a method for detection of difference between news articles on the same topic. Articles are sequentially compared by three different comparison units: paragraphs, sentences, and simple sentences. Our method is evaluated by applying it to Japanese news articles.

关键词： difference detection sequential comparison multi-document text summarization news articles vector space model dependency structure

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：