Recent years have witnessed the rising popularity of naturallanguageprocessing (NLP) and related fields such as Artificial Intelligence (AI) and Machine Learning (ML). Many online courses and resources are available...
详细信息
ISBN:
(纸本)9781577358091
Recent years have witnessed the rising popularity of naturallanguageprocessing (NLP) and related fields such as Artificial Intelligence (AI) and Machine Learning (ML). Many online courses and resources are available even for those without a strong background in the field. Often the student is curious about a specific topic but does not quite know where to begin studying. To answer the question of "what should one learn first,"we apply an embedding-based method to learn prerequisite relations for course concepts in the domain of NLP. We introduce Lecture Bank, a dataset containing 1,352 English lecture files collected from university courses which are each classified according to an existing taxonomy as well as 208 manually-labeled prerequisite relation topics, which is publicly available(1). The dataset will be useful for educational purposes such as lecture preparation and organization as well as applications such as reading list generation. Additionally, we experiment with neural graph-based networks and non-neural classifiers to learn these prerequisite relations from our dataset.
Objective: In this era of digitized health records, there has been a marked interest in using de-identified patient records for conducting various health related surveys. To assist in this research effort, we develope...
详细信息
Objective: In this era of digitized health records, there has been a marked interest in using de-identified patient records for conducting various health related surveys. To assist in this research effort, we developed a novel clinical data representation model entitled medical knowledge-infused convolutional neural network (MKCNN), which is used for learning the clinical trial criteria eligibility status of patients to participate in cohort studies. Materials and methods: In this study, we propose a clinical text representation infused with medical knowledge (MK). first, we isolate the noise from the relevant data using a medically relevant description extractor;then we utilize log-likelihood ratio based weights from selected sentences to highlight "met" and "not-met" knowledge-infused representations in bichannel setting for each instance. The combined medical knowledge-infused representation (MK) from these modules helps identify significant clinical criteria semantics, which in turn renders effective learning when used with a convolutional neural network architecture. Results: MKCNN outperforms other Medical Knowledge (MK) relevant learning architectures by approximately 3%;notably SVM and XGBoost implementations developed in this study. MKCNN scored 86.1% on F1metric, a gain of 6% above the average performance assessed from the submissions for n2c2 task. Although pattern/rule-basedmethods show a higher average performance for the n2c2 clinical data set, MKCNN significantly improves performance of machine learning implementations for clinical datasets. Conclusion: MKCNN scored 86.1% on the F1 score metric. In contrast to many of the rule-based systems introduced during the n2c2 challenge workshop, our system presents a model that heavily draws on machine-based learning. In addition, the MK representations add more value to clinical comprehension and interpretation of natural texts.
In the era of ’big data’, advanced storage and computing technologies allow people to build and process large-scale datasets, which promote the development of many fields such as speech recognition, naturallanguage...
详细信息
In the era of ’big data’, advanced storage and computing technologies allow people to build and process large-scale datasets, which promote the development of many fields such as speech recognition, naturallanguageprocessing and computer vision. Traditional approaches can not handle the heterogeneity and complexity of some novel data structures. In this dissertation, we want to explore how to combine different tools to develop new methodologies in analyzing certain kinds of structured data, motivated by real-world problems. Multi-group design, such as the Alzheimer’s Disease Neuroimaging Initiative (ADNI), has been undertaken by recruiting subjects based on their multi-class primary disease status, while some extensive secondary outcomes are also collected. Analysis by standard approaches is usually distorted because of the unequal sampling rates of different classes. In the first part of the dissertation, we develop a general regression framework for the analysis of secondary phenotypes collected in multi-group association studies. Our regression framework is built on a conditional model for the secondary outcome given the multi-group status and covariates and its relationship with the population regression of interest of the secondary outcome given the covariates. Then, we develop generalized estimation equations to estimate the parameters of interest. We use simulations and a large-scale imaging genetic data analysis of the ADNI data to evaluate the effect of the multi-group sampling scheme on standard genome-wide association analyses based on linear regression methods, while comparing it with our statistical methods that appropriately adjust for the multi-group sampling scheme. In the past few decades, network data has been increasingly collected and studied in diverse areas, including neuroimaging, social networks and knowledge graphs. In the second part of the dissertation, we investigate the graph-based semi-supervised learning problem with nonignorable non
We introduce a machine learning approach for the identification of "white spaces" in scientific knowledge. Our approach addresses this task as link prediction over a graph that contains over 2M influence sta...
详细信息
ROUGE is one of the first and most widely used evaluation metrics for text summarization. However, its assessment merely relies on surface similarities between peer and model summaries. Consequently, ROUGE is unable t...
详细信息
ISBN:
(纸本)9781948087841
ROUGE is one of the first and most widely used evaluation metrics for text summarization. However, its assessment merely relies on surface similarities between peer and model summaries. Consequently, ROUGE is unable to fairly evaluate summaries including lexical variations and paraphrasing. We propose a graph-based approach adopted into ROUGE to evaluate summaries based on both lexical and semantic similarities. Experiment results over TAC AESOP datasets show that exploiting the lexico-semantic similarity of the words used in summaries would significantly help ROUGE correlate better with human judgments.
A novel graph-to-tree conversion mechanism called the deep-tree generation (DTG) algorithm is first proposed to predict text data represented by graphs. The DTG method can generate a richer and more accurate represent...
详细信息
Traditional community detection algorithms are mainly based on network structure, while ignoring a large number of node attributes. In this paper, we propose a two-stage overlapping community detection method which co...
详细信息
Word sense induction (WSI), which addresses polysemy by unsupervised discovery of multiple word senses, resolves ambiguities for downstream NLP tasks and also makes word representations more interpretable. This paper ...
详细信息
We propose a novel approach to semantic dependency parsing (SDP) by casting the task as an instance of multi-lingual machine translation, where each semantic representation is a different foreign dialect. To that end,...
详细信息
ISBN:
(纸本)9781948087841
We propose a novel approach to semantic dependency parsing (SDP) by casting the task as an instance of multi-lingual machine translation, where each semantic representation is a different foreign dialect. To that end, we first generalize syntactic linearization techniques to account for the richer semantic dependency graph structure. Following, we design a neural sequence-to-sequence framework which can effectively recover our graph linearizations, performing almost on-par with previous SDP state-of-the-art while requiring less parallel training annotations. Beyond SDP, our linearization technique opens the door to integration of graph-based semantic representations as features in neural models for downstream applications.
暂无评论