This paper presents a method for generating multi-documenttext summary building on single documenttext summaries and by combining those single documenttext summaries using cosine similarity. For the generation of s...
详细信息
ISBN:
(纸本)9789811031748;9789811031731
This paper presents a method for generating multi-documenttext summary building on single documenttext summaries and by combining those single documenttext summaries using cosine similarity. For the generation of single documenttext summaries features like document feature, sentence position feature, normalized sentence length feature, numerical data feature, and proper noun feature are used. Single documenttext summaries are combined after calculating cosine similarity between the different single documenttext summaries generated and from each combination, sentences with high total sentence weight are extracted to generate multi-documenttext summary. The average F-measure of 0.30493 on DUC 2002 dataset has been observed, which is comparable to two of five top performing multi-document text summarization systems reported on the DUC 2002 dataset.
For multi-document text summarization, text features are fundamental because they determine the importance of each sentence from source documents. Therefore, selected sentences create a summary that represents the mos...
详细信息
ISBN:
(纸本)9783031628351;9783031628368
For multi-document text summarization, text features are fundamental because they determine the importance of each sentence from source documents. Therefore, selected sentences create a summary that represents the most essential information. In the state-of-the-art, several techniques and methods have been proposed that use different text features and select sentences. However, some features may be more important than others. Thus, differentiating between important and unimportant features is a difficult task. This work proposes a method to generate extractive multi-documenttext summaries based on statistical and linguistic text features. We calculated the relevance coefficient of each feature to determine its degree of importance through the human-written reference summaries. To perform such calculus, we use 19 text features. After this calculus, we employ a Genetic Algorithm (GA) that selects sentences to generate summaries. In a general way, the proposed method consists of three steps: feature weighting, concatenation and pre-processing of source documents, and feature extraction with sentence selection. In our experiments, we used the DUC01 dataset in two different lengths to evaluate the performance of the proposed method. The results show improvement over state-of-the-art methods.
textsummarization is the process of generating a brief version of a text that preserves the salient information of the text. For information retrieval, it is a good dimension reduction solution. In addition, it reduc...
详细信息
textsummarization is the process of generating a brief version of a text that preserves the salient information of the text. For information retrieval, it is a good dimension reduction solution. In addition, it reduces the required reading time. This study focused on extracting informative summaries from multiple documents using commonly used hand-crafted features from the literature. The first investigation focused on the generation of a feature vector. The features were the number of sentences, term frequency, similarity with the title, term frequency-inverse sentence frequency, sentence position, sentence length, sentence-sentence similarity, bushy-path results, phrases of the sentence, proper nouns, n-gram co-occurrence, and length of the document. Secondly, several combinations of these features were examined and a shallow multi-layer perceptron and two differently modeled fuzzy inference systems were used to extract salient sentences from texts in the document Understanding Conference (DUC) dataset. The summarization performances of these models were evaluated using original classification performance metrics, and recall-oriented understudy for gisting evaluation (ROUGE)-n. This study recommended the use of fuzzy systems based on a feature vector and a fuzzy rule set for extractive textsummarization. The extraction methods were evaluated against a changing compression ratio. Results of experiments showed that the implemented neural model tended to incorrectly infer sentences that were not considered salient by human annotators. However, for distinguishing between summary-worthy and summary-unworthy sentences, the fuzzy inference systems performed better than the utilized neural network, as well as better than the existing fuzzy inference-based textsummarization approaches in the literature. (C) 2019 Elsevier B.V. All rights reserved.
In this research, we investigated the performance of the combination of fuzzy c-means and latent Dirichlet allocation algorithms for Arabic multi-documentsummarization. The summary should include the most essential s...
详细信息
In this research, we investigated the performance of the combination of fuzzy c-means and latent Dirichlet allocation algorithms for Arabic multi-documentsummarization. The summary should include the most essential sentences from multi-documents with the same topic. The TAC-2011 corpus is used for experiments, first, the documents in the corpus are clustered using fuzzy c-means algorithm. The aim of the clustering process here is to classify the documents according to their topics, e.g., economic, politic, sport, etc. The results are compared against some recent Arabic summarization approaches that used ant colony and discriminant analysis algorithms. The proposed approach has obtained competitive results compared to those recent approaches.
Background: This article provides an overview of the first BIOASQ challenge, a competition on large-scale biomedical semantic indexing and question answering (QA), which took place between March and September 2013. BI...
详细信息
Background: This article provides an overview of the first BIOASQ challenge, a competition on large-scale biomedical semantic indexing and question answering (QA), which took place between March and September 2013. BIOASQ assesses the ability of systems to semantically index very large numbers of biomedical scientific articles, and to return concise and user-understandable answers to given natural language questions by combining information from biomedical articles and ontologies. Results: The 2013 BIOASQ competition comprised two tasks, Task 1a and Task 1b. In Task 1a participants were asked to automatically annotate new PUBMED documents with MESH headings. Twelve teams participated in Task 1a, with a total of 46 system runs submitted, and one of the teams performing consistently better than the MTI indexer used by NLM to suggest MESH headings to curators. Task 1b used benchmark datasets containing 29 development and 282 test English questions, along with gold standard (reference) answers, prepared by a team of biomedical experts from around Europe and participants had to automatically produce answers. Three teams participated in Task 1b, with 11 system runs. The BIOASQ infrastructure, including benchmark datasets, evaluation mechanisms, and the results of the participants and baseline methods, is publicly available. Conclusions: A publicly available evaluation infrastructure for biomedical semantic indexing and QA has been developed, which includes benchmark datasets, and can be used to evaluate systems that: assign MESH headings to published articles or to English questions;retrieve relevant RDF triples from ontologies, relevant articles and snippets from PUBMED Central;produce "exact" and paragraph-sized "ideal" answers (summaries). The results of the systems that participated in the 2013 BIOASQ competition are promising. In Task 1a one of the systems performed consistently better from the NLM's MTI indexer. In Task 1b the systems received high scores in the man
Currently, a lot of news articles are published on the Web, and it is getting easier for us to read them. However, the number of articles are too large for us to read all of them. Although some Web sites cluster/class...
详细信息
ISBN:
(纸本)9781607504771;9781607500896
Currently, a lot of news articles are published on the Web, and it is getting easier for us to read them. However, the number of articles are too large for us to read all of them. Although some Web sites cluster/classify news articles into some topics (categories), it is not enough since a large number of articles are still in each topic. Detecting difference between articles on one topic will be one of the solution to comprehend the whole topic. In this paper, we propose a method for detection of difference between news articles on the same topic. Articles are sequentially compared by three different comparison units: paragraphs, sentences, and simple sentences. Our method is evaluated by applying it to Japanese news articles.
暂无评论