In written Chinese, personal pronouns are commonly dropped when they can be inferred from context. This practice is particularly common in informal genres like Short Message Service messages sent via cell phones. Rest...
详细信息
In written Chinese, personal pronouns are commonly dropped when they can be inferred from context. This practice is particularly common in informal genres like Short Message Service messages sent via cell phones. Restoring dropped personal pronouns can be a useful preprocessing step for information extraction. Dropped personal pronoun recovery can be divided into two subtasks: (1) detecting dropped personal pronoun slots and (2) determining the identity of the pronoun for each slot. We address a simpler version of restoring dropped personal pronouns wherein only the person numbers are identified. After applying a word segmenter, we used a linear-chain conditional random field to predict which words were at the start of an independent clause. Then, using the independent clause start information, as well as lexical and syntactic information, we applied a conditional random field or a maximum-entropy classifier to predict whether a dropped personal pronoun immediately preceded each word and, if so, the person number of the dropped pronoun. We conducted a series of experiments using a manually annotated corpus of Chinese Short Message Service. Our approaches substantially outperformed a rule-based approach based partially on rules developed by Chung and Gildea (2010, Effects of Empty Categories on Machine Translation. proceedings of the Conference on Empirical methods in naturallanguageprocessing (EMNLP). Association for Computational Linguistics. pp. 636-45). Our approaches also outperformed (though by a considerably smaller margin) a machine-learning approach based closely on work by Yang, Liu, and Xue in (2015, Recovering Dropped Pronouns from Chinese Text Messages. proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics. pp. 309-13). Features derived from parsing largely did not help our approaches. We conclude that, given independent clause start information, the parse information we
In this paper we present a set of experiments and analyses on predicting the gender of Twitter users based on language-independent features extracted either from the text or the metadata of users' tweets. We perfo...
详细信息
Sentiment lexicons are widely used as an intuitive and inexpensive way of tackling sentiment classification, often within a simple lexicon word-counting approach or as part of a supervised model. However, it is an ope...
详细信息
Native language Identification (NLI) is the task of automatically identifying the native language (L1) of an individual based on their language production in a learned language. It is typically framed as a classificat...
详细信息
The goal of active learning is to minimise the cost of producing an annotated dataset, in which annotators are assumed to be perfect, i.e., they always choose the correct labels. However, in practice, annotators are n...
详细信息
This paper presents an integrated ABSA pipeline for Dutch that has been developed and tested on qualitative user feedback coming from three domains: retail, banking and human resources. The two latter domains provide ...
We investigate the problem of readeraware multi-document summarization (RA-MDS) and introduce a new dataset for this problem. To tackle RA-MDS, we extend a variational auto-encodes (VAEs) based MDS framework by jointl...
详细信息
The proceedings contain 60 papers. The topics discussed include: findings of the 2015 workshop on Statistical Machine Translation;statistical machine translation with automatic identification of translationese;data se...
ISBN:
(纸本)9781941643327
The proceedings contain 60 papers. The topics discussed include: findings of the 2015 workshop on Statistical Machine Translation;statistical machine translation with automatic identification of translationese;data selection with fewer words;DFKI’s experimental hybrid MT system for WMT 2015;ParFDA for fast deployment of accurate statistical machine translation systems, benchmarks, and statistics;CUNI in WMT15: chimera strikes again;CimS - the CIS and IMS joint submission to WMT 2015 addressing morphological and syntactic differences in English to German SMT;the Karlsruhe Institute of Technology translation systems for the WMT 2015;new language pairs in TectoMT;tuning phrase-based segmented translation for a morphologically complex target language;and the AFRL-MITLL WMT15 system: there’s more than one way to decode it!.
The use of semantic information found in structured knowledge bases has become an integral part of the processing pipeline of modern intelligent in-formation systems. However, such semantic information is frequently i...
详细信息
The proceedings contain 40 papers. The special focus in this conference is on The Northernmost Spoken Dialogue workshop, methods, Techniques for Spoken Dialogue Systems, Socio-Cognitive languageprocessing, Towards Mu...
ISBN:
(纸本)9789811025846
The proceedings contain 40 papers. The special focus in this conference is on The Northernmost Spoken Dialogue workshop, methods, Techniques for Spoken Dialogue Systems, Socio-Cognitive languageprocessing, Towards Multilingual, Multimodal, Open Domain Spoken Dialogue Systems, Evaluation of Human-Robot Dialogue in Social Robotics, Dialogue Quality Assessment and Dialogue State Tracking Challenge 4. The topics include: DigiSami and digital natives;interaction technology for the north Sami language;a comparative study of text preprocessing techniques for naturallanguage call routing;compact and interpretable dialogue state representation with genetic sparse distributed memory;incremental human-machine dialogue simulation;active learning for example-based dialog systems;question selection based on expected utility to acquire information through dialogue;a simple deep reinforcement learning dialogue system;breakdown detector for chat-oriented dialogue;user involvement in collaborative decision-making dialog systems;entropy-driven dialog for topic classification;detecting and tackling uncertainty;fisher kernels on phase-based features for speech emotion recognition;internationalisation and localisation of spoken dialogue systems;a multi-lingual evaluation of the vAssist spoken dialog system. Comparing disco and RavenClaw;an open-source modular web-based multimodal dialog framework;a framework to break the barrier across domains in spoken dialog systems;towards an open-domain social dialog system;extrinsic versus intrinsic evaluation of naturallanguage generation for spoken dialogue systems and social robotics;engagement in dialogue with social robots;the negotiation dialogue game;convolutional neural networks for multi-topic dialog state tracking and dialogues with social robots.
暂无评论