作者:
Qu, AnlinNiu, JianweiMo, ShashaBeihang Univ
Sch Comp Sci & Engn State Key Lab Virtual Real Technol & Syst Beijing 100191 Peoples R China Beihang Univ
Sch Comp Sci & Engn Beijing Adv Innovat Ctr Big Data & Brain Comp Beijing 100191 Peoples R China Beihang Univ
Hangzhou Innovat Res Inst Hangzhou 310051 Peoples R China Zhengzhou Univ
Res Inst Ind Technol Zhengzhou 450001 Peoples R China Beihang Univ
Sch Cyber Sci & Technol Beijing 100191 Peoples R China
languagemodeling is an important problem in Natural language Processing (NLP), and the multi-layer Transformer network is currently the most advanced and effective model for this task. However, there exist two inhere...
详细信息
ISBN:
(纸本)9781728171227
languagemodeling is an important problem in Natural language Processing (NLP), and the multi-layer Transformer network is currently the most advanced and effective model for this task. However, there exist two inherent defects in its multi-head self-attention structure: (1) attention information loss: the lower-level attention weights cannot be explicitly passed through upper layers, which may lead the network lose some pivotal attention information captured by lower-level layers;(2) multi-head bottleneck: the dimension of each head in vanilla Transformer is relatively small and the process of each head is independent, which introduces an expressive bottleneck and makes subspace learning inadequate constitutionally. To overcome these two weaknesses, a novel neural architecture named Guide-Transformer is proposed in this paper. The Guide-Transformer utilizes horizontal and vertical attention information to guide the original process of the multi-head self-attention sublayer without introducing excessive complexity. The experimental results on three authoritative languagemodeling benchmarks demonstrate the effectiveness of Guide-Transformer. For the popular perplexity (ppl) and bits-per-character (bpc) evaluation metrics, Guide-Transformer achieves moderate improvements over the powerful baseline model.
Our goal is to generate coherent text accurately in terms of their semantic information and syntactic structure. Embedding methods and neurallanguage models are indispensable in generating coherent text as they learn...
详细信息
ISBN:
(纸本)9781450391153
Our goal is to generate coherent text accurately in terms of their semantic information and syntactic structure. Embedding methods and neurallanguage models are indispensable in generating coherent text as they learn semantic information, and syntactic structure, respectively, and they are indispensable methods for generating coherent text. We focus here on parts of speech (POS) (e.g. noun, verb, preposition, etc.) so as to enhance these models, and allow us to generate truly coherent text more efficiently than is possible by using any of them in isolation. This leads us to derive Words and Topics and POS 2 Vec (WTP2Vec) as an embedding method, and Structure Aware Unified language Model (SAUL) as a neurallanguage model. Experiments show that our approach enhances previous models and generates coherent and semantically valid text with natural syntactic structure.
Compiler fuzzing tools such as Csmith have uncovered many bugs in compilers by randomly sampling programs from a generative model. The success of these tools is often attributed to their ability to generate unexpected...
详细信息
ISBN:
(纸本)9781450393034
Compiler fuzzing tools such as Csmith have uncovered many bugs in compilers by randomly sampling programs from a generative model. The success of these tools is often attributed to their ability to generate unexpected corner case inputs that developers tend to overlook during manual testing. At the same time, their chaotic nature makes fuzzer-generated test cases notoriously hard to interpret, which has lead to the creation of input simplification tools such as C-Reduce (for C compiler bugs). In until now unrelated work, researchers have also shown that human-written software tends to be rather repetitive and predictable to language models. Studies show that developers deliberately write more predictable code, whereas code with bugs is relatively unpredictable. In this study, we ask the natural questions of whether this high predictability property of code also, and perhaps counter-intuitively, applies to fuzzer-generated code. That is, we investigate whether fuzzer-generated compiler inputs are deemed unpredictable by a language model built on human-written code and surprisingly conclude that it is not. To the contrary, Csmith fuzzer-generated programs are more predictable on a per-token basis than human-written C programs. Furthermore, bug-triggering tended to be more predictable still than random inputs, and the C-Reduce minimization tool did not substantially increase this predictability. Rather, we find that bug-triggering inputs are unpredictable relative to Csmith's own generative model. This is encouraging;our results suggest promising research directions on incorporating predictability metrics in the fuzzing and reduction tools themselves.
Background: The prognosis, diagnosis, and treatment of many genetic disorders and familial diseases significantly improve if the family history (FH) of a patient is known. Such information is often written in the free...
详细信息
Background: The prognosis, diagnosis, and treatment of many genetic disorders and familial diseases significantly improve if the family history (FH) of a patient is known. Such information is often written in the free text of clinical notes. Objective: The aim of this study is to develop automated methods that enable access to FH data through natural language processing. Methods: We performed information extraction by using transformers to extract disease mentions from notes. We also experimented with rule-based methods for extracting family member (FM) information from text and coreference resolution techniques. We evaluated different transfer learning strategies to improve the annotation of diseases. We provided a thorough error analysis of the contributing factors that affect such information extraction systems. Results: Our experiments showed that the combination of domain-adaptive pretraining and intermediate-task pretraining achieved an F1 score of 81.63% for the extraction of diseases and FMs from notes when it was tested on a public shared task data set from the National Natural language Processing Clinical Challenges (N2C2), providing a statistically significant improvement over the baseline (P<.001). In comparison, in the 2019 N2C2/Open Health Natural language Processing Shared Task, the median F1 score of all 17 participating teams was 76.59%. Conclusions: Our approach, which leverages a state-of-the-art named entity recognition model for disease mention detection coupled with a hybrid method for FM mention detection, achieved an effectiveness that was close to that of the top 3 systems participating in the 2019 N2C2 FH extraction challenge, with only the top system convincingly outperforming our approach in terms of precision.
languagemodeling (LM) is a subtask in Natural language Processing (NLP), and the goal of LM is to build a statistical language model that can learn and estimate a probability distribution of natural language over sen...
详细信息
ISBN:
(纸本)9781538658895
languagemodeling (LM) is a subtask in Natural language Processing (NLP), and the goal of LM is to build a statistical language model that can learn and estimate a probability distribution of natural language over sentences of terms. Recently, many recurrent neural network based LM, a type of deep neural network for dealing with sequential data, have been proposed and achieved remarkable results. However, they only rely upon the analysis on the words occurred in the sentences even though every sentence contains various useful morphological information, such as Part-of-Speech (POS) tag that is necessary for constituting a sentence and can be used for an analysis as a feature. Although morphological information can be useful for LM, using that information as the input data to neural network based LM is not straightforward because adding features between words as a one-dimensional array can cause the vanishing gradient problem by increasing the time steps of recurrent neural network. In order to solve this problem, in this paper, we propose a CNN-LSTM based language model that deals with textual data regarding a multi-dimensional data with respect to the input of the network. To train this multi- dimensional input to Long-Short Term Memory (LSTM), we use a convolutional neural network (CNN) with a 1x1 filter for dimensionality reduction of input data to avoid the vanishing gradient problem by decreasing the time step between input words. In addition, our approach that uses multi-dimension data reduced by CNN can be used as a plugin with many customized LSTM based LM. On the Penn Treebank corpus, our model has shown improvement of the perplexity with not only vanilla LSTM but customized LSTM models.
暂无评论