检索结果-内蒙古大学图书馆

2013 Cross Language Evaluation Forum Conference, CLEF 2013

作者： Flekova, Lucie Gurevych, Iryna Ubiquitous Knowledge Processing Lab Department of Computer Science Technische Universität Darmstadt Germany Ubiquitous Knowledge Processing Lab German Institute for Educational Research and Educational Information Germany

Would you target your audience differently, knowing the real age and gender of the text authors on your website forum? This paper examines hundreds of thousands of online documents, e.g. chat lines or blog posts, showing that computers are capable to address this task better than humans, without relying on content stereotypes. Pointing out that age and gender profiling are not independent problems, we approach the task as a multiclass classification problem, combining the age and gender information to define six classes. Utilizing a wide range of stylistic and content features and a large number of readability measures we demonstrate the high predictive abilities of the parts of speech, the punctuation and the amount of emotions and slang used in the text, independently of the topic discussed.

关键词： Social networking (online)

来源：评论

学校读者我要写书评

暂无评论

引用

2nd Joint Conference on Lexical and Computational Semantics, *SEM 2013

作者： Zesch, Torsten Levy, Omer Gurevych, Iryna Dagan, Ido Ubiquitous Knowledge Processing Lab Computer Science Department Technische Universität Darmstadt Germany Natural Language Processing Lab Computer Science Department Bar-Ilan University Israel

ISBN: (纸本)9781937284497

Our system combines text similarity measures with a textual entailment system. In the main task, we focused on the influence of lexicalized versus unlexicalized features, and how they affect performance on unseen questions and domains. We also participated in the pilot partial entailment task, where our system significantly outperforms a strong baseline. c 2013 Association for Computational Linguistics

关键词：

来源：评论

学校读者我要写书评

暂无评论

Recognizing partial textual entailment

Recognizing partial textual entailment

引用

51st Annual Meeting of the Association for Computational Linguistics, ACL 2013

作者： Levy, Omer Zesch, Torsten Dagan, Ido Gurevych, Iryna Natural Language Processing Lab. Computer Science Department Bar-Ilan University Israel Ubiquitous Knowledge Processing Lab. Computer Science Department Technische Universität Darmstadt Germany

ISBN: (纸本)9781937284510

Textual entailment is an asymmetric relation between two text fragments that describes whether one fragment can be inferred from the other. It thus cannot capture the notion that the target fragment is "almost entailed" by the given text. The recently suggested idea of partial textual entailment may remedy this problem. We investigate partial entailment under the faceted entailment model and the possibility of adapting existing textual entailment methods to this setting. Indeed, our results show that these methods are useful for recognizing partial entailment. We also provide a preliminary assessment of how partial entailment may be used for recognizing (complete) textual entailment. © 2013 Association for Computational Linguistics.

关键词： Text processing

来源：评论

学校读者我要写书评

暂无评论

The impact of topic bias on quality flaw prediction in Wikipedia

The impact of topic bias on quality flaw prediction in Wikip...

引用

51st Annual Meeting of the Association for Computational Linguistics, ACL 2013

作者： Ferschke, Oliver Gurevych, Iryna Rittberger, Marc Ubiquitous Knowledge Processing Lab. Department of Computer Science Technische Universität Darmstadt Germany Information Center for Education German Institute for Educational Research and Educational Information Germany

ISBN: (纸本)9781937284503

With the increasing amount of user generated reference texts in the web, automatic quality assessment has become a key challenge. However, only a small amount of annotated data is available for training quality assessment systems. Wikipedia contains a large amount of texts annotated with cleanup templates which identify quality flaws. We show that the distribution of these labels is topically biased, since they cannot be applied freely to any arbitrary article. We argue that it is necessary to consider the topical restrictions of each label in order to avoid a sampling bias that results in a skewed classifier and overly optimistic evaluation results. We factor out the topic bias by extracting reliable training instances from the revision history which have a topic distribution similar to the labeled articles. This approach better reflects the situation a classifier would face in a real-life application. © 2013 Association for Computational Linguistics.

关键词： Data mining

来源：评论

学校读者我要写书评

暂无评论

Automatically classifying edit categories in wikipedia revisions

Automatically classifying edit categories in wikipedia revis...

引用

2013 Conference on Empirical Methods in Natural Language processing, EMNLP 2013

作者： Daxenberger, Johannes Gurevych, Iryna Ubiquitous Knowledge Processing Lab. Department of Computer Science Technische Universität Darmstadt Germany Information Center for Education German Institute for Educational Research and Educational Information Germany

ISBN: (纸本)9781937284978

In this paper, we analyze a novel set of features for the task of automatic edit category classification. Edit category classification assigns categories such as spelling error correction, paraphrase or vandalism to edits in a document. Our features are based on differences between two versions of a document including meta data, textual and language properties and markup. In a supervised machine learning experiment, we achieve a micro-averaged F1 score of 62 on a corpus of edits from the English Wikipedia. In this corpus, each edit has been multi-labeled according to a 21-category taxonomy. A model trained on the same data achieves state-of-the-art performance on the related task of fluency edit classification. We apply pattern mining to automatically labeled edits in the revision histories of different Wikipedia articles. Our results suggest that high-quality articles show a higher degree of homogeneity with respect to their collaboration patterns as compared to random articles. © 2013 Association for Computational Linguistics.

关键词： Error correction

来源：评论

学校读者我要写书评

暂无评论

A semi-informative aware approach using topic model for medical search

A semi-informative aware approach using topic model for medi...

引用

IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

作者： Qinmin Vivian Hu Liang He Mingyao Li Jimmy Xiangji Huang E. Mark Haacke Shanghai Key Laboratory of Multidimensional Information Processing MR Research Facility Wayne State University Detroit MI USA Department of Computer Science & Technology East China Normal University Shanghai China Information Retrieval and Knowledge Management Research Lab York University Toronto Canada

We propose a semi-informative aware approach using the topic model on query expansion problem in the biomedicine domain. the demographics and disease information is applied to semi-structure the topic model as the “known” label, compared to the traditional latent topics in topic modelling. Then, we suggest to select three terms from the top ranked documents to expand the query, based on the assumption in the pseudo relevance feedback method that the top ranked results in the first retrieval around are relevant. After that, we conduct the experiments on the TREC medical records data sets with extensive analysis and discussions. Numerically, we achieve the improvements of 7.41% on MAP, 9.29% on Bpref and 5.60% on P@10 respectively over the strong baselines.

关键词： Indexes Diseases Information retrieval Numerical models Biological system modeling Educational institutions Mathematical model

来源：评论

学校读者我要写书评

暂无评论

Hierarchy identification for automatically generating table-of-contents

Hierarchy identification for automatically generating table-...

引用

9th International Conference on Recent Advances in Natural Language processing, RANLP 2013

作者： Erbs, Nicolai Gurevych, Iryna Zesch, Torsten Ubiquitous Knowledge Processing Lab. Department of Computer Science Technische Universität Darmstadt Germany Information Center for Education German Institute for Educational Research and Educational Information Germany Language Technology University of Duisburg-Essen Germany

A table-of-contents (TOC) provides a quick reference to a document's content and structure. We present the first study on identifying the hierarchical structure for automatically generating a TOC using only textual features instead of structural hints e.g. from HTML-tags. We create two new datasets to evaluate our approaches for hierarchy identification. We find that our algorithm performs on a level that is sufficient for a fully automated system. For documents without given segment titles, we extend our work by automatically generating segment titles. We make the datasets and our experimental framework publicly available in order to foster future research in TOC generation.

关键词： Automation

来源：评论

学校读者我要写书评

暂无评论

Lecture Notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Preface

Lecture Notes in Computer Science (including subseries Lectu...

引用

Lecture Notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2013年 8105 LNAI卷 VII-VIII页

作者： Gurevych, Iryna Biemann, Chris Zesch, Torsten Ubiquitous Knowledge Processing Lab. Department of Computer Science Technische Universität Darmstadt Darmstadt Germany Frankfurt am Main Germany FG Language Technology Department of Computer Science Technische Universität Darmstadt Darmstadt Germany

来源：评论

学校读者我要写书评

暂无评论

How text segmentation algorithms gain from topic models

How text segmentation algorithms gain from topic models

引用

2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2012

作者： Riedl, Martin Biemann, Chris Ubiquitous Knowledge Processing Lab Computer Science Department Technische Universität Darmstadt Hochschulstrasse 10 DarmstadtD-64289 Germany

ISBN: (纸本)1937284204

This paper introduces a general method to incorporate the LDA Topic Model into text segmentation algorithms. We show that semantic information added by Topic Models significantly improves the performance of two wordbased algorithms, namely TextTiling and C99. Additionally, we introduce the new TopicTiling algorithm that is designed to take better advantage of topic information. We show consistent improvements over word-based methods and achieve state-of-the art performance on a standard dataset. © 2012 Association for Computational Linguistics.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

Behind the Article: Recognizing Dialog Acts in Wikipedia Talk Pages 12

Behind the Article: Recognizing Dialog Acts in Wikipedia Tal...

引用

Conference of The European Chapter of The Association for Computational Linguistics

作者： Oliver Ferschke Iryna Gurevych Yevgen Chebotar Ubiquitous Knowledge Processing Lab (UKP-TUDA) Department of Computer Science Technische Universitaet Darmstadt Ubiquitous Knowledge Processing Lab (UKP-DIPF)German Institute for Educational Research and Educational Information Ubiquitous Knowledge Processing Lab (UKP-TUDA) Department of Computer Science Technische Universitaet Darmstadt

ISBN: (纸本)9781622760428

In this paper, we propose an annotation schema for the discourse analysis of Wikipedia Talk pages aimed at the coordination efforts for article improvement. We apply the annotation schema to a corpus of 100 Talk pages from the Simple English Wikipedia and make the resulting dataset freely available for download~1. Furthermore, we perform automatic dialog act classification on Wikipedia discussions and achieve an average F_1 -score of 0.82 with our classification pipeline.

关键词： Wikipedia Speaking Conversation Discourse Analysis schema ANNOTATIONS Taxonomy

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：