检索结果-内蒙古大学图书馆

Dropped personal pronoun recovery in Chinese SMS*

natural language ENGINEERING 2017年第6期23卷 905-927页

作者： Giannella, Chris Winder, Ransom Petersen, Stacy Mitre Corp Dept Human Language Technol 7515 Colshire Dr Mclean VA 22102 USA Georgetown Univ Dept Linguist 3700 O St NW Washington DC USA

In written Chinese, personal pronouns are commonly dropped when they can be inferred from context. This practice is particularly common in informal genres like Short Message Service messages sent via cell phones. Restoring dropped personal pronouns can be a useful preprocessing step for information extraction. Dropped personal pronoun recovery can be divided into two subtasks: (1) detecting dropped personal pronoun slots and (2) determining the identity of the pronoun for each slot. We address a simpler version of restoring dropped personal pronouns wherein only the person numbers are identified. After applying a word segmenter, we used a linear-chain conditional random field to predict which words were at the start of an independent clause. Then, using the independent clause start information, as well as lexical and syntactic information, we applied a conditional random field or a maximum-entropy classifier to predict whether a dropped personal pronoun immediately preceded each word and, if so, the person number of the dropped pronoun. We conducted a series of experiments using a manually annotated corpus of Chinese Short Message Service. Our approaches substantially outperformed a rule-based approach based partially on rules developed by Chung and Gildea (2010, Effects of Empty Categories on Machine Translation. proceedings of the Conference on Empirical methods in natural language processing (EMNLP). Association for Computational Linguistics. pp. 636-45). Our approaches also outperformed (though by a considerably smaller margin) a machine-learning approach based closely on work by Yang, Liu, and Xue in (2015, Recovering Dropped Pronouns from Chinese Text Messages. proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics. pp. 309-13). Features derived from parsing largely did not help our approaches. We conclude that, given independent clause start information, the parse information we

关键词： Word Mobile phones Information Extraction Short message service Chinese philosophy Parsing Markov random fields SLOT COMPUTATIONAL LINGUISTICS Chinese art

来源：评论

学校读者我要写书评

暂无评论

language-independent gender prediction on Twitter 2

Language-independent gender prediction on Twitter

引用

2nd workshop on natural language processing and Computational Social Science, NLP+CSS 2017 at the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017

作者： Ljubešić, Nikola Fišer, Darja Erjavec, Tomaž Dept. of Knowledge Technologies Jožef Stefan Institute Jamova cesta 39 Ljubljana1000 Slovenia Faculty of Arts University of Ljubljana Aškerčeva cesta 2 Ljubljana1000 Slovenia

ISBN: (纸本)9781945626654

In this paper we present a set of experiments and analyses on predicting the gender of Twitter users based on language-independent features extracted either from the text or the metadata of users' tweets. We perform our experiments on the TwiSty dataset containing manual gender annotations for users speaking six different languages. Our classification results show that, while the prediction model based on language-independent features performs worse than the bag-of-words model when training and testing on the same language, it regularly outperforms the bag-of-words model when applied to different languages, showing very stable results across various languages. Finally we perform a comparative analysis of feature effect sizes across the six languages and show that differences in our features correspond to cultural distances. © 2017 Association for Computational Linguistics.

关键词： Information retrieval

来源：评论

学校读者我要写书评

暂无评论

Debunking sentiment lexicons: A case of domain-specific sentiment classification for Croatian 6

Debunking sentiment lexicons: A case of domain-specific sent...

引用

6th workshop on Balto-Slavic natural language processing, BSNLP 2017 at the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017

作者： Gombar, Paula Medić, Zoran Alagić, Domagoj Šnajder, Jan Text Analysis and Knowledge Engineering Lab Faculty of Electrical Engineering and Computing University of Zagreb Unska 3 Zagreb10000 Croatia

ISBN: (纸本)9781945626456

Sentiment lexicons are widely used as an intuitive and inexpensive way of tackling sentiment classification, often within a simple lexicon word-counting approach or as part of a supervised model. However, it is an open question whether these approaches can compete with supervised models that use only word-representation features. We address this question in the context of domain-specific sentiment classification for Croatian. We experiment with the graph-based acquisition of sentiment lexicons, analyze their quality, and investigate how effectively they can be used in sentiment classification. Our results indicate that, even with as few as 500 labeled instances, a supervised model substantially outperforms a word-counting model. We also observe that adding lexicon-based features does not significantly improve supervised sentiment classification. © 2017 Association for Computational Linguistics.

关键词： graphic methods

来源：评论

学校读者我要写书评

暂无评论

A report on the 2017 native language identification shared task 12

A report on the 2017 native language identification shared t...

引用

12th workshop on Innovative Use of NLP for Building Educational Applications, BEA 2017, held in conjunction with EMNLP 2017

作者： Malmasi, Shervin Evanini, Keelan Cahill, Aoife Tetreault, Joel Pugh, Robert Hamill, Christopher Napolitano, Diane Qian, Yao Harvard Medical School BostonMA United States Macquarie University Sydney Australia Educational Testing Service PrincetonNJ United States Grammarly New YorkNY United States Educational Testing Service San FranciscoCA United States

ISBN: (纸本)9781945626852

Native language Identification (NLI) is the task of automatically identifying the native language (L1) of an individual based on their language production in a learned language. It is typically framed as a classification task where the set of L1s is known a priori. Two previous shared tasks on NLI have been organized where the aim was to identify the L1 of learners of English based on essays (2013) and spoken responses (2016) they provided during a standardized assessment of academic English proficiency. The 2017 shared task combines the inputs from the two prior tasks for the first time. There are three tracks: NLI on the essay only, NLI on the spoken response only (based on a transcription of the response and i-vector acoustic features), and NLI using both responses. We believe this makes for a more interesting shared task while building on the methods and results from the previous two shared tasks. In this paper, we report the results of the shared task. A total of 19 teams competed across the three different sub-tasks. The fusion track showed that combining the written and spoken responses provides a large boost in prediction accuracy. Multiple classifier systems (e.g. ensembles and meta-classifiers) were the most effective in all tasks, with most based on traditional classifiers (e.g. SVMs) with lexical/syntactic features. © EMNLP 2017 - 12th workshop on Innovative Use of NLP for Building Educational Applications, BEA 2017 - proceedings of the workshop. All rights reserved.

关键词： natural language processing systems

来源：评论

学校读者我要写书评

暂无评论

Proactive Learning for Named Entity Recognition 16

Proactive Learning for Named Entity Recognition

引用

16th SIGBioMed workshop on Biomedical natural language processing, BioNLP 2017

作者： Li, Maolin Nguyen, Nhung T.H. Ananiadou, Sophia National Centre for Text Mining School of Computer Science University of Manchester United Kingdom

ISBN: (纸本)9781945626593

The goal of active learning is to minimise the cost of producing an annotated dataset, in which annotators are assumed to be perfect, i.e., they always choose the correct labels. However, in practice, annotators are not infallible, and they are likely to assign incorrect labels to some instances. Proactive learning is a generalisation of active learning that can model different kinds of annotators. Although proactive learning has been applied to certain labelling tasks, such as text classification, there is little work on its application to named entity (NE) tagging. In this paper, we propose a proactive learning method for producing NE annotated corpora, using two annotators with different levels of expertise, and who charge different amounts based on their levels of experience. To optimise both cost and annotation quality, we also propose a mechanism to present multiple sentences to annotators at each iteration. Experimental results for several corpora show that our method facilitates the construction of high-quality NE labelled datasets at minimal cost. © 2017 Association for Computational Linguistics

关键词： Iterative methods

来源：评论

学校读者我要写书评

暂无评论

Towards an integrated pipeline for aspect-based sentiment analysis in various domains 8

Towards an integrated pipeline for aspect-based sentiment an...

引用

8th workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, WASSA 2017, in conjunction with the Conference on Empirical methods in natural language processing, EMNLP 2017

作者： de Clercq, Orphée Lefever, Els Jacobs, Gilles Carpels, Tijl Hoste, Véronique LT3 Language and Translation Technology Team Ghent University Belgium Hello Customer Belgium

ISBN: (纸本)9781945626951

This paper presents an integrated ABSA pipeline for Dutch that has been developed and tested on qualitative user feedback coming from three domains: retail, banking and human resources. The two latter domains provide service-oriented data, which has not been investigated before in ABSA. By performing in-domain and cross-domain experiments the validity of our approach was investigated. We show promising results for the three ABSA subtasks, aspect term extraction, aspect category classification and aspect polarity classification. © 2017 Association for Computational Linguistics.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Reader-aware multi-document summarization: An enhanced model and the first dataset

Reader-aware multi-document summarization: An enhanced model...

引用

EMNLP 2017 workshop on New Frontiers in Summarization, NFiS 2017

作者： Li, Piji Bing, Lidong Lam, Wai Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong Hong Kong Ai Lab Tencent Inc. Shenzhen China

ISBN: (纸本)9781945626890

We investigate the problem of readeraware multi-document summarization (RA-MDS) and introduce a new dataset for this problem. To tackle RA-MDS, we extend a variational auto-encodes (VAEs) based MDS framework by jointly considering news documents and reader comments. To conduct evaluation for summarization performance, we prepare a new dataset. We describe the methods for data collection, aspect annotation, and summary writing as well as scrutinizing by experts. Experimental results show that reader comments can improve the summarization performance, which also demonstrates the usefulness of the proposed dataset. The annotated dataset for RA-MDS is available online1. © EMNLP *** right reserved.

关键词： natural language processing systems

来源：评论

学校读者我要写书评

暂无评论

10th workshop on Statistical Machine Translation, WMT 2015 at the 2015 Conference on Empirical methods in natural language processing, EMNLP 2015 - proceedings

10th Workshop on Statistical Machine Translation, WMT 2015 a...

引用

10th workshop on Statistical Machine Translation, WMT 2015 at the 2015 Conference on Empirical methods in natural language processing, EMNLP 2015

ISBN: (纸本)9781941643327

The proceedings contain 60 papers. The topics discussed include: findings of the 2015 workshop on Statistical Machine Translation;statistical machine translation with automatic identification of translationese;data selection with fewer words;DFKI’s experimental hybrid MT system for WMT 2015;ParFDA for fast deployment of accurate statistical machine translation systems, benchmarks, and statistics;CUNI in WMT15: chimera strikes again;CimS - the CIS and IMS joint submission to WMT 2015 addressing morphological and syntactic differences in English to German SMT;the Karlsruhe Institute of Technology translation systems for the WMT 2015;new language pairs in TectoMT;tuning phrase-based segmented translation for a morphologically complex target language;and the AFRL-MITLL WMT15 system: there’s more than one way to decode it!.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Semantic query processing: Estimating relational purity

Semantic query processing: Estimating relational purity

引用

Lernen, Wissen, Daten, Analyse - 2017 Learning. Knowledge. Data. Analytics, LWDA 2017

作者： Kalo, Jan-Christoph Lofi, Christoph Maseli, René Pascal Balke, Wolf-Tilo Institut Für Informationssysteme TU Braunschweig Germany Web Information Systems Group Delft University of Technology Netherlands

The use of semantic information found in structured knowledge bases has become an integral part of the processing pipeline of modern intelligent in-formation systems. However, such semantic information is frequently insufficient to capture the rich semantics demanded by the applications, and thus corpus-based methods employing natural language processing techniques are often used conjointly to provide additional information. However, the semantic expres-siveness and interaction of these data sources with respect to query processing result quality is often not clear. Therefore, in this paper, we introduce the notion of relational purity which represents how well the explicitly modelled relation-ships between two entities in a structured knowledge base capture the implicit (and usually more diverse) semantics found in corpus-based word embeddings. The purity score gives valuable insights into the completeness of a knowledge base, but also into the expected quality of complex semantic queries relying on reasoning over relationships, as for example analogy queries. © 2017 by The Paper's Authors.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

7th International workshop on Spoken Dialogue Systems, IWSDS 2016

引用

7th International workshop on Spoken Dialogue Systems, IWSDS 2016

ISBN: (纸本)9789811025846

The proceedings contain 40 papers. The special focus in this conference is on The Northernmost Spoken Dialogue workshop, methods, Techniques for Spoken Dialogue Systems, Socio-Cognitive language processing, Towards Multilingual, Multimodal, Open Domain Spoken Dialogue Systems, Evaluation of Human-Robot Dialogue in Social Robotics, Dialogue Quality Assessment and Dialogue State Tracking Challenge 4. The topics include: DigiSami and digital natives;interaction technology for the north Sami language;a comparative study of text preprocessing techniques for natural language call routing;compact and interpretable dialogue state representation with genetic sparse distributed memory;incremental human-machine dialogue simulation;active learning for example-based dialog systems;question selection based on expected utility to acquire information through dialogue;a simple deep reinforcement learning dialogue system;breakdown detector for chat-oriented dialogue;user involvement in collaborative decision-making dialog systems;entropy-driven dialog for topic classification;detecting and tackling uncertainty;fisher kernels on phase-based features for speech emotion recognition;internationalisation and localisation of spoken dialogue systems;a multi-lingual evaluation of the vAssist spoken dialog system. Comparing disco and RavenClaw;an open-source modular web-based multimodal dialog framework;a framework to break the barrier across domains in spoken dialog systems;towards an open-domain social dialog system;extrinsic versus intrinsic evaluation of natural language generation for spoken dialogue systems and social robotics;engagement in dialogue with social robots;the negotiation dialogue game;convolutional neural networks for multi-topic dialog state tracking and dialogues with social robots.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：