Decipherment of homophonic substitution ciphers using language models (LMs) is a wellstudied task in NLP. Previous work in this topic scores short local spans of possible plaintext decipherments using n-gram LMs. The ...
详细信息
ISBN:
(纸本)9781948087841
Decipherment of homophonic substitution ciphers using language models (LMs) is a wellstudied task in NLP. Previous work in this topic scores short local spans of possible plaintext decipherments using n-gram LMs. The most widely used technique is the use of beam search with n-gram LMs proposed by Nuhn et al. (2013). We propose a beam search algorithm that scores the entire candidate plaintext at each step of the decipherment using a neural LM. We augment beam search with a novel rest cost estimation that exploits the prediction power of a neural LM. We compare against the state of the art n-gram based methods on many different decipherment tasks. On challenging ciphers such as the Beale cipher we provide significantly better error rates with much smaller beam sizes.
Model robustness to bias is often determined by the generalization on carefully designed out-of-distribution datasets. Recent debiasing methods in naturallanguage understanding (NLU) improve performance on such datas...
详细信息
ISBN:
(纸本)9781955917094
Model robustness to bias is often determined by the generalization on carefully designed out-of-distribution datasets. Recent debiasing methods in naturallanguage understanding (NLU) improve performance on such datasets by pressuring models into making unbiased predictions. An underlying assumption behind such methods is that this also leads to the discovery of more robust features in the model's inner representations. We propose a general probing-based framework that allows for posthoc interpretation of biases in language models, and use an information-theoretic approach to measure the extractability of certain biases from the model's representations. We experiment with several NLU datasets and known biases, and show that, counter-intuitively, the more a language model is pushed towards a debiased regime, the more bias is actually encoded in its inner representations.(1)
Distant supervision is a scheme to generate noisy training data for relation extraction by aligning entities of a knowledge base with text. In this work we combine the output of a discriminative at-least-one learner w...
详细信息
This paper presents a set of dimensions to characterize the association between two people. We distinguish between interactions (when somebody refers to somebody in a conversation) and relationships (a sequence of int...
详细信息
ISBN:
(纸本)9781948087841
This paper presents a set of dimensions to characterize the association between two people. We distinguish between interactions (when somebody refers to somebody in a conversation) and relationships (a sequence of interactions). We work with dialogue scripts from the TV show Friends, and do not impose any restrictions on the interactions and relationships. We introduce and analyze a new corpus, and present experimental results showing that the task can be automated.
In this paper we report an empirical study on semi-supervised Chinese word segmentation using co-training. We utilize two segmenters: 1) a word-based segmenter leveraging a word-level language model, and 2) a characte...
Generating paraphrases from given sentences involves decoding words step by step from a large vocabulary. To learn a decoder, supervised learning which maximizes the likelihood of tokens always suffers from the exposu...
详细信息
ISBN:
(纸本)9781950737901
Generating paraphrases from given sentences involves decoding words step by step from a large vocabulary. To learn a decoder, supervised learning which maximizes the likelihood of tokens always suffers from the exposure bias. Although both reinforcement learning (RL) and imitation learning (IL) have been widely used to alleviate the bias, the lack of direct comparison leads to only a partial image on their benefits. In this work, we present an empirical study on how RL and IL can help boost the performance of generating paraphrases, with the pointer-generator as a base model(1). Experiments on the benchmark datasets show that (1) imitation learning is constantly better than reinforcement learning;and (2) the pointer-generator models with imitation learning outperform the state-of-the-art methods with a large margin.
Obtaining large-scale annotated data for NLP tasks in the scientific domain is challenging and expensive. We release SCIBERT, a pretrained language model based on BERT (Devlin et al., 2019) to address the lack of high...
详细信息
ISBN:
(纸本)9781950737901
Obtaining large-scale annotated data for NLP tasks in the scientific domain is challenging and expensive. We release SCIBERT, a pretrained language model based on BERT (Devlin et al., 2019) to address the lack of highquality, large-scale labeled scientific data. SCIBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks. We evaluate on a suite of tasks including sequence tagging, sentence classification and dependency parsing, with datasets from a variety of scientific domains. We demonstrate statistically significant improvements over BERT and achieve new state-of-the-art results on several of these tasks.
A large number of Open Relation Extraction approaches have been proposed recently, covering a wide range of NLP machinery, from "shallow" (e.g., part-of-speech tagging) to "deep" (e.g., semantic ro...
详细信息
In transductive learning, an unlabeled test set is used for model training. While this setting deviates from the common assumption of a completely unseen test set, it is applicable in many real-world scenarios, where ...
详细信息
ISBN:
(纸本)9781950737901
In transductive learning, an unlabeled test set is used for model training. While this setting deviates from the common assumption of a completely unseen test set, it is applicable in many real-world scenarios, where the texts to be processed are known in advance. However, despite its practical advantages, transductive learning is underexplored in naturallanguageprocessing. Here, we conduct an empirical study of transductive learning for neural models and demonstrate its utility in syntactic and semantic tasks. Specifically, we fine-tune language models (LMs) on an unlabeled test set to obtain test-set-specific word representations. Through extensive experiments, we demonstrate that despite its simplicity, transductive LM fine-tuning consistently improves state-ofthe-art neural models in both in-domain and out-of-domain settings.
We introduce a new dataset consisting of naturallanguage interactions annotated with medical family histories, obtained during interactions with a genetic counselor and through crowdsourcing, following a questionnair...
ISBN:
(纸本)9781950737901
We introduce a new dataset consisting of naturallanguage interactions annotated with medical family histories, obtained during interactions with a genetic counselor and through crowdsourcing, following a questionnaire created by experts in the domain. We describe the data collection process and the annotations performed by medical professionals, including illness and personal attributes (name, age, gender, family relationships) for the patient and their family members. An initial system that performs argument identification and relation extraction shows promising results - average F-score of 0.87 on complex sentences on the targeted relations.
暂无评论