检索结果-内蒙古大学图书馆

8th workshop on graph-based methods for natural language processing, Textgraphs 2013, at the Conference on Empirical methods in natural language processing, EMNLP 2013

作者： Sizov, Gleb Öztürk, Pinar Department of Computer Science Norwegian University of Science and Technology Trondheim Norway

ISBN: (纸本)9781937284978

Many organizations possess large collections of textual reports that document how a problem is solved or analysed, e.g. medical patient records, industrial accident reports, lawsuit records and investigation reports. Effective use of expert knowledge contained in these reports may greatly increase productivity of the organization. In this article, we propose a method for automatic extraction of reasoning chains that contain information used by the author of a report to analyse the problem at hand. For this purpose, we developed a graph-based text representation that makes the relations between textual units explicit. This representation is acquired automatically from a report using natural language processing tools including syntactic and discourse parsers. When applied to aviation investigation reports, our method generates reasoning chains that reveal the connection between initial information about the aircraft incident and its causes. © 2013 Association for Computational Linguistics

关键词： graphic methods

来源：评论

学校读者我要写书评

暂无评论

DegExt: a language-independent keyphrase extractor

引用

JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING 2013年第3期4卷 377-387页

作者： Litvak, Marina Last, Mark Kandel, Abraham Sami Shamoon Acad Coll Engn Dept Software Engn IL-84100 Beer Sheva Israel Ben Gurion Univ Negev Dept Informat Syst Engn IL-84105 Beer Sheva Israel Univ S Florida Dept Comp Sci & Engn Tampa FL 33620 USA

In this paper, we introduce DegExt, a graph-based language-independent keyphrase extractor, which extends the keyword extraction method described in Litvak and Last (graph-based keyword extraction for single-document summarization. In: proceedings of the workshop on multi-source multilingual information extraction and summarization, pp 17-24, 2008). We compare DegExt with two state-of-the-art approaches to keyphrase extraction: GenEx (Turney in Inf Retr 2: 303-336, 2000) and TextRank (Mihalcea and Tarau in Textrank-bringing order into texts. In: proceedings of the conference on empirical methods in natural language processing. Barcelona, Spain, 2004). We evaluated DegExt on collections of benchmark summaries in two different languages: English and Hebrew. Our experiments on the English corpus show that DegExt significantly outperforms TextRank and GenEx in terms of precision and area under curve for summaries of 15 keyphrases or more at the expense of a mostly non-significant decrease in recall and F-measure, when the extracted phrases are matched against gold standard collection. Due to DegExt's tendency to extract bigger phrases than GenEx and TextRank, when the single extracted words are considered, DegExt outperforms them both in terms of recall and F-measure. In the Hebrew corpus, DegExt performs the same as TextRank disregarding the number of keyphrases. An additional experiment shows that DegExt applied to the TextRank representation graphs outperforms the other systems in the text classification task. For documents in both languages, DegExt surpasses both GenEx and TextRank in terms of implementation simplicity and computational complexity.

关键词： Keyphrase extraction Summarization Text mining graph-based document representation Node centrality

来源：评论

学校读者我要写书评

暂无评论

graph-based approaches for organization entity resolution in MapReduce 8

Graph-based approaches for organization entity resolution in...

引用

8th workshop on graph-based methods for natural language processing, Textgraphs 2013, at the Conference on Empirical methods in natural language processing, EMNLP 2013

作者： Kardes, Hakan Konidena, Deepak Agrawal, Siddharth Huff, Micah Sun, Ang Inome Inc BellevueWA United States

ISBN: (纸本)9781937284978

Entity Resolution is the task of identifying which records in a database refer to the same entity. A standard machine learning pipeline for the entity resolution problem consists of three major components: blocking, pairwise linkage, and clustering. The blocking step groups records by shared properties to determine which pairs of records should be examined by the pairwise linker as potential duplicates. Next, the linkage step assigns a probability score to pairs of records inside each block. If a pair scores above a user-defined threshold, the records are presumed to represent the same entity. Finally, the clustering step turns the input records into clusters of records (or profiles), where each cluster is uniquely associated with a single real-world entity. This paper describes the blocking and clustering strategies used to deploy a massive database of organization entities to power a major commercial People Search Engine. We demonstrate the viability of these algorithms for large data sets on a 50-node hadoop cluster. © 2013 Association for Computational Linguistics

关键词： MapReduce

来源：评论

学校读者我要写书评

暂无评论

Understanding seed selection in bootstrapping 8

Understanding seed selection in bootstrapping

引用

8th workshop on graph-based methods for natural language processing, Textgraphs 2013, at the Conference on Empirical methods in natural language processing, EMNLP 2013

作者： Ehara, Yo Sato, Issei Oiwa, Hidekazu Nakagawa, Hiroshi Graduate School of Information Science and Technology United States Information Technology Center University of Tokyo / 7-3-1 Hongo Bunkyo-ku Tokyo Japan JSPS Research Fellow Kojimachi Business Center Building 5-3-1 Kojimachi Chiyoda-ku Tokyo Japan

ISBN: (纸本)9781937284978

Bootstrapping has recently become the focus of much attention in natural language processing to reduce labeling cost. In bootstrapping, unlabeled instances can be harvested from the initial labeled "seed" set. The selected seed set affects accuracy, but how to select a good seed set is not yet clear. Thus, an "iterative seeding" framework is proposed for bootstrapping to reduce its labeling cost. Our framework iteratively selects the unlabeled instance that has the best "goodness of seed" and labels the unlabeled instance in the seed set. Our framework deepens understanding of this seeding process in bootstrapping by deriving the dual problem. We propose a method called expected model rotation (EMR) that works well on not well-separated data which frequently occur as realistic data. Experimental results show that EMR can select seed sets that provide significantly higher mean reciprocal rank on realistic data than existing naive selection methods or random seed sets. © 2013 Association for Computational Linguistics

关键词： Iterative methods

来源：评论

学校读者我要写书评

暂无评论

基于Wiki链接结构图聚类的领域词典构建方法

引用

小型微型计算机系统 2014年第6期35卷 1286-1292页

作者：尹文科朱明陈天昊中国科学技术大学电子工程与信息科学系合肥230027 中国电子科技集团公司第二十八研究所信息系统工程重点实验室南京210007 中国科学技术大学自动化系合肥230021

领域词典在信息检索、自然语言处理,以及问答系统等方面有着重要的应用.由于自然语言的复杂性,基于NLP的领域词典构建方法难以取得理想的结果.近年来Wiki百科得到了广泛的使用.Wiki不仅包含海量的文章,还拥有丰富的链接结构.基于超链接... 详细信息

领域词典在信息检索、自然语言处理,以及问答系统等方面有着重要的应用.由于自然语言的复杂性,基于NLP的领域词典构建方法难以取得理想的结果.近年来Wiki百科得到了广泛的使用.Wiki不仅包含海量的文章,还拥有丰富的链接结构.基于超链接的锚描述性和主题局部性,提出一种基于有权无向链接结构图聚类的领域词典自动构建方法.该方法首先利用Wiki构建关于某特定领域的无向链接结构图,然后使用LSI算法和余弦相似度计算每条链接的权重,再利用CPMw算法对该有权无向链接结构图进行聚类,从而得到最终的领域词典.实验表明,本文提出的方法可以获得更好的领域词典构建结果.

关键词：领域典构建 Wiki CPMw LSI

来源：评论

学校读者我要写书评

暂无评论

ACL 2012 - Textgraphs 2012: workshop on graph-based methods for natural language processing, workshop proceedings

ACL 2012 - TextGraphs 2012: Workshop on Graph-Based Methods ...

引用

7th workshop on graph-based methods for natural language processing, Textgraphs 2012

ISBN: (纸本)9781937284374

The proceedings contain 8 papers. The topics discussed include: a new parametric estimation method for graph-based clustering;extracting signed social networks from text;using link analysis to discover interesting messages spread across twitter;graph based similarity measures for synonym extraction from parsed text;semantic relatedness for biomedical word sense disambiguation;identifying untyped relation mentions in a corpus given an ontology;cause-effect relation learning;and bringing the associative ability to social tag recommendation.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Dense semantic graph and its application in single document summarisation 7

Dense semantic graph and its application in single document ...

引用

7th International workshop on Information Filtering and Retrieval, DART 2013 - workshop of the 13th AI*IA Conference

作者： Joshi, Monika Wang, Hui McClean, Sally University of Ulster Co. AntrimBT37 0QB United Kingdom University of Ulster Co. LondonderryBT52 1SA United Kingdom

Semantic graph representation of text is an important part of natural language processing applications such as text summarisation. We have studied two ways of constructing the semantic graph of a document from dependency parsing of its sentences. The first graph is derived from the subject-object-verb representation of sentence, and the second graph is derived from considering more dependency relations in the sentence by a shortest distance dependency path calculation, resulting in a dense semantic graph. We have shown through experiments that dense semantic graphs gives better performance in semantic graph based unsupervised extractive text summarisation. Copyright © 2013 for the individual papers by the papers' authors.

关键词： graphic methods

来源：评论

学校读者我要写书评

暂无评论

Using contexts for automatic or semi-automatic correction of customer complaints 20

Using contexts for automatic or semi-automatic correction of...

引用

20e Conference Traitement Automatique des Langues Naturelles et la 15e Rencontres des Etudiants Chercheurs en Informatique pour le Traitement Automatique des Langues, TALN-RECITAL 2013 - 20th Conference on Automatic natural language processing and the 15th Meeting of Computer Science Student Researchers for natural language processing, TALN-RECITAL 2013

作者： Suignard, Philippe Kerrcua, Soriane Electricité de France R&D 1 avenue du Général de Gaulle Clamart92141 France A.I.D 4 rue Henri Le Sidaner Versailles78000 France

This article presents two methods allowing correcting complaints containing spelling errors, by using the spelling and contextual neighbors' graph. This graph is made of forms or words found in a learning corpus. A link between two forms conveys the fact that the two forms ''look alike'' and share similar contexts. The first method is semi¬ automatic and consists in producing a substitutional dictionary from this graph. The second method, more ambitious, is fully automatic. It is based on contexts to determine to which word corresponds such abbreviated or erroneous form. The results thus obtained allow us to improve the existing process regarding the creation of a substitutional dictionary at EDF. © 2013 proceedings of TALN. All rights reserved.

关键词： Context Distributional analysis graph Spelling correction

来源：评论

学校读者我要写书评

暂无评论

Fast joint compression and summarization via graph cuts

Fast joint compression and summarization via graph cuts

引用

2013 Conference on Empirical methods in natural language processing, EMNLP 2013

作者： Qian, Xian Liu, Yang University of Texas at Dallas 800 W. Campbell Rd. RichardsonTX United States

ISBN: (纸本)9781937284978

Extractive summarization typically uses sentences as summarization units. In contrast, joint compression and summarization can use smaller units such as words and phrases, resulting in summaries containing more information. The goal of compressive summarization is to find a subset of words that maximize the total score of concepts and cutting dependency arcs under the grammar constraints and summary length constraint. We propose an efficient decoding algorithm for fast compressive summarization using graph cuts. Our approach first relaxes the length constraint using Lagrangian relaxation. Then we propose to bound the relaxed objective function by the supermodular binary quadratic programming problem, which can be solved efficiently using graph max-flow/min-cut. Since finding the tightest lower bound suffers from local optimality, we use convex relaxation for initialization. Experimental results on TAC2008 dataset demonstrate our method achieves competitive ROUGE score and has good readability, while is much faster than the integer linear programming (ILP) method. © 2013 Association for Computational Linguistics.

关键词： Integer programming

来源：评论

学校读者我要写书评

暂无评论

Multi-document summarization using automatic key-phrase extraction

Multi-document summarization using automatic key-phrase extr...

引用

2013 Recent Advances in natural language processing, RANLP 2013

作者： Bhaskar, Pinaki Department of Computer Science and Engineering Jadavpur University Kolkata700032 India

The development of a multi-document summarizer using automatic key-phrase extraction has been described. This summarizer has two main parts;first part is automatic extraction of Key-phrases from the documents and second part is automatic generation of a multidocument summary based on the extracted key-phrases. The CRF based Automatic Keyphrase extraction system has been used here. A document graph-based topic/query focused automatic multi-document summarizer is used for summarization where extracted keyphrases are used as topic. The summarizer has been tested on the standard TAC 2008 test data sets of the Update Summarization Track. Evaluation using the ROUGE-1.5.5 tool has resulted in ROUGE-2 and ROUGE–SU-4 scores of 0.10548 and 0.13582 respectively. © 2013 Incoma Ltd. All rights reserved.

关键词： graphic methods

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：