Unsupervised extractive summarization aims to extract salient sentences from documents without labeled corpus. Existing methods are mostly graph-based by computing sentence centrality. these methods usually tend to se...
详细信息
ISBN:
(纸本)9781954085541
Unsupervised extractive summarization aims to extract salient sentences from documents without labeled corpus. Existing methods are mostly graph-based by computing sentence centrality. these methods usually tend to select sentences within the same facet, however, which often leads to the facet bias problem especially when the document has multiple facets (i.e. long-document and multidocuments). To address this problem, we proposed a novel facet-aware centrality-based ranking model. We let the model pay more attention to different facets by introducing a sentence-document weight. the weight is added to the sentence centrality score. We evaluate our method on a wide range of summarization tasks that include 8 representative benchmark datasets. Experimental results show that our method consistently outperforms strong baselines especially in long and multi-document scenarios and even performs comparably to some supervised models. Extensive analyses confirm that the performance gains come from alleviating the facet bias problem.
the proceedings contain 6 papers. the topics discussed include: embedding senses for efficient graph-based word sense disambiguation;context tailoring for text normalization;cross-lingual question answering using comm...
ISBN:
(纸本)9781941643884
the proceedings contain 6 papers. the topics discussed include: embedding senses for efficient graph-based word sense disambiguation;context tailoring for text normalization;cross-lingual question answering using common semantic space;network motifs may improve quality assessment of text documents;better together: combining language and social interactions into a shared representation;and visualization of dynamic reference graphs.
Sentiment lexicons are widely used as an intuitive and inexpensive way of tackling sentiment classification, often within a simple lexicon word-counting approach or as part of a supervised model. However, it is an ope...
详细信息
Word Sense Induction (WSI) is an unsupervised approach for learning the multiple senses of a word. graph-based approaches to WSI frequently represent word co-occurrence as a graph and use the statistical properties of...
详细信息
the proceedings contain 9 papers. the topics discussed include: a combination of topic models with max-margin learning for relation detection;nonparametric Bayesian word sense induction;invariants and variability of s...
ISBN:
(纸本)9781937284008
the proceedings contain 9 papers. the topics discussed include: a combination of topic models with max-margin learning for relation detection;nonparametric Bayesian word sense induction;invariants and variability of synonymy networks: self mediated agreement by confluence;word sense induction by community detection;using a Wikipedia-based semantic relatedness measure for document clustering;GrawlTCQ: terminology and corpora building by ranking simultaneously terms, queries and documents using graph random walks;simultaneous similarity learning and feature-weight learning for document clustering;unrestricted quantifier scope disambiguation;and from ranked words to dependency trees: two-stage unsupervised non-projective dependency parsing.
We propose the use of a nonparametric Bayesian model, the Hierarchical Dirichlet Process (HDP), for the task of Word Sense Induction. Results are shown through comparison against Latent Dirichlet Allocation (LDA), a p...
详细信息
Usually unsupervised dependency parsing tries to optimize the probability of a corpus by modifying the dependency model that was presumably used to generate the corpus. In this article we explore a different view in w...
详细信息
In this paper, we present GrawlTCQ, a new bootstrapping algorithm for building specialized terminology, corpora and queries, based on a graph model. We model links between documents, terms and queries, and use a rando...
详细信息
ISBN:
(纸本)9781937284008
In this paper, we present GrawlTCQ, a new bootstrapping algorithm for building specialized terminology, corpora and queries, based on a graph model. We model links between documents, terms and queries, and use a random walk with restart algorithm to compute relevance propagation. We have evaluated GrawlTCQ on an AFP English corpus of 57,441 news over 10 categories. For corpora building, GrawlTCQ outperforms the Boot-CaT tool, which is vastly used in the domain. For 1,000 documents retrieved, we improve mean precision by 25%. GrawlTCQ has also shown to be faster and more robust than Boot-CaT over iterations.
this work extends the study of Germann et al. (2010) in investigating the lexical organization of verbs. Particularly, we look at the influence of frequency on the process of lexical acquis ition and use. We examine d...
详细信息
暂无评论