Due to the fast evolution of the information on the Internet, update summarization has received much attention in recent years. It is to summarize an evolutionary document collection at current time supposing the user...
详细信息
Due to the fast evolution of the information on the Internet, update summarization has received much attention in recent years. It is to summarize an evolutionary document collection at current time supposing the users have read some related previous documents. In this paper, we propose a graph-ranking-based method. It performs constrained reinforcements on a sentence graph, which unifies previous and current documents, to determine the salience of the sentences. The constraints ensure that the most salient sentences in current documents are updates to previous documents. Since this method is NP-hard, we then propose its approximate method, which is polynomial time solvable. Experiments on the TAC 2008 and 2009 benchmark data sets show the effectiveness and efficiency of our method.
In this paper, a novel summarization method that uses nonnegative matrix factorization (NMF) and the -clus--tering method is introduced to extract meaningful sentences relevant to a given query. The proposed method de...
详细信息
In this paper, a novel summarization method that uses nonnegative matrix factorization (NMF) and the -clus--tering method is introduced to extract meaningful sentences relevant to a given query. The proposed method decomposes a sentence into the linear combination of sparse nonnegative semantic features so that it can represent a sentence as the sum of a few semantic features that are comprehensible intuitively. It can improve the quality of document summaries because it can avoid extracting those sentences whose similarities with the query are high but that are meaningless by using the similarity between the query and the semantic features. In addition, the proposed approach uses the clustering method to remove noise and avoid the biased inherent semantics of the documents being reflected in summaries. The method can -ensure the coherence of summaries by using the rank score of sentences with respect to semantic -features. The experimental results demonstrate that the proposed method has better performance than other -methods that use the thesaurus, the latent semantic analysis (LSA), the K-means, and the NMF.
The power of geographic information system is to help managers make critical decisions they face daily. The ability to make sound decisions relies upon the availability of relevant information. Typically, spatial data...
详细信息
ISBN:
(纸本)9783642013492
The power of geographic information system is to help managers make critical decisions they face daily. The ability to make sound decisions relies upon the availability of relevant information. Typically, spatial databases do not contain much information that could support the decision making, Process ill all situations. To extend the available dataset. we propose ail approach for the enrichment of geographical databases (GDB) and especially their semantic component. This enrichment is performed by providing knowledge extracted from web documents to supplement the aspatial data of the GDB. The knowledge extraction process is reached through the generation of condensed representation of the relevant information derived from a web corpus. This process is carried Out it) a distributed fashion that complies with the multi-agents paradigm.
Sentence extraction is a widely adopted text summarization technique where the most important sentences are extracted from document(s) and presented as a summary. The first step towards sentence extraction is to rank ...
详细信息
Sentence extraction is a widely adopted text summarization technique where the most important sentences are extracted from document(s) and presented as a summary. The first step towards sentence extraction is to rank sentences in order of importance as in the summary. This paper proposes a novel graph-based ranking method, iSpreadRank, to perform this task. iSpreadRank models a set of topic-related documents into a sentence similarity network. Based on such a network model, iSpreadRank exploits the spreading activation theory to formulate a general concept from social network analysis: the importance of a node in a network (i.e., a sentence in this paper) is determined not only by the number of nodes to which it connects, but also by the importance of its connected nodes. The algorithm recursively re-weights the importance of sentences by spreading their sentence-specific feature scores throughout the network to adjust the importance of other sentences. Consequently, a ranking of sentences indicating the relative importance of sentences is reasoned. This paper also develops in approach to produce a generic extractive summary according to the inferred sentence ranking. The proposed summarization method is evaluated using the DUC 2004 data set, and found to perform well. Experimental results show that the proposed method obtains a ROUGE-1 score of 0.38068, which represents a slight difference of 0.00156, when compared with the best participant in the DUC 2004 evaluation. (C) 2007 Elsevier Ltd. All rights reserved.
A more and more generalized problem in effective information access is the presence in the same corpus of multiple documents that contain similar information. Generally, users may be interested in locating, for a topi...
详细信息
A more and more generalized problem in effective information access is the presence in the same corpus of multiple documents that contain similar information. Generally, users may be interested in locating, for a topic addressed by a group of similar documents, one or several particular aspects. This kind of task, called instance or aspectual retrieval, has been explored in several TREC Interactive Tracks. In this article, we propose in addition to the classification capacity of clustering techniques, the possibility of offering a indicative extract about the contents of several sources by means of multidocument summarization techniques. Two kinds of summaries are provided. The first one covers the similarities of each cluster of documents retrieved. The second one shows the particularities of each document with respect to the common topic in the cluster. The document multitopic structure has been used in order to determine similarities and differences of topics in the cluster of documents. The system is independent of document domain and genre. An evaluation of the proposed system with users proves significant improvements in effectiveness. The results of previous experiments that have compared clustering algorithms are also reported.
Fuzzy co-clustering (FCC) is a technique that performs simultaneous fuzzy clustering of objects and features. Recently, several FCC algorithms have been proposed to handle clustering of high-dimensional datasets. The ...
详细信息
ISBN:
(纸本)9781424409822
Fuzzy co-clustering (FCC) is a technique that performs simultaneous fuzzy clustering of objects and features. Recently, several FCC algorithms have been proposed to handle clustering of high-dimensional datasets. The success of these FCC efforts is obvious as it results in both document and word clusters with fuzzy memberships. This paper reports our efforts made on multi-document summarization (MDS) using fuzzy co-clustering approach. The word-membershipsare utilized in the MDS, which appear a good alternative interpretation to a document cluster comparing with the conventional frequency-based approaches. We explain the key differences between a summarizer based on memberships approach against the conventional approach and closely investigate on why in principle the fuzzy co-clustering approach has the high potential to outperform the frequency based approaches for MDS. Experiential study on benchmark dataset DUC 2004 shows very promising results, which encourages the further research in the area.
Capturing relevant information is important in supporting decision making. In this paper, we propose a new summarization method based on cluster analysis, concept space, and statistical approach to extract the essence...
详细信息
ISBN:
(纸本)0780377249
Capturing relevant information is important in supporting decision making. In this paper, we propose a new summarization method based on cluster analysis, concept space, and statistical approach to extract the essence from a collection of documents. A prototype system has been developed to condense a set of documents into a list of key issues and expands the key issues to form a summary. Cluster analysis and concept space was used as a bridge to connect convergent and divergent processes. Such approach reduces information loss due to vocabulary switching in the summarization process. In the divergent process, it selects the anchored sentences from the original documents to form a summary based on the concept terms generated previously. A user evaluation has been conducted for its usefulness and other performance indices. The results indicate that such approach is promising.
Recently, there have been significant advances in several areas of language technology, including clustering, text categorization, and summarization. However, efforts to combine technology from these areas in a practi...
详细信息
Recently, there have been significant advances in several areas of language technology, including clustering, text categorization, and summarization. However, efforts to combine technology from these areas in a practical system for information access have been limited. In this paper, we present Columbia's Newsblaster system for online news summarization. Many of the tools developed at Columbia over the years are combined together to produce a system that crawls the web for news articles, clusters them on specific topics and produces multidocument summaries for each cluster.
This paper describes a system for the summarization of multiple text-only news-like documents. We address two main issues: clustering of documents in order to find the main topics that should be mentioned in the multi...
详细信息
This paper describes a system for the summarization of multiple text-only news-like documents. We address two main issues: clustering of documents in order to find the main topics that should be mentioned in the multidocument summary and organization of the information in order to create a summary that presents the information in a logical way and is easy to read. The system is based on a single-document summarizer that uses text-extraction.
暂无评论