The recently increased focus on misinformation has stimulated research in fact checking, the task of assessing the truthfulness of a claim. Research in automating this task has been conducted in a variety of disciplin...
详细信息
Graphs have been widely used as modeling tools in naturallanguageprocessing (NLP), Text Mining (TM) and Information Retrieval (IR). Traditionally, the unigram bag-of-words representation is applied;that way, a docum...
详细信息
ISBN:
(纸本)9781450360142
Graphs have been widely used as modeling tools in naturallanguageprocessing (NLP), Text Mining (TM) and Information Retrieval (IR). Traditionally, the unigram bag-of-words representation is applied;that way, a document is represented as a multiset of its terms, disregarding dependencies between the terms. Although several variants and extensions of this modeling approach have been proposed, the main weakness comes from the underlying term independence assumption;the order of the terms within a document is completely disregarded and any relationship between terms is not taken into account in the final task. To deal with this problem, the research community has explored various representations, and to this direction, graphs constitute a well-developed model for text representation. The goal of this tutorial is to offer a comprehensive presentation of recent methods that rely on graph-based text representations to deal with various tasks in Text Mining, NLP and IR.
The application of naturallanguageprocessing (NLP) methods and resources to clinical and biomedical text has received growing attention over the past years, but progress has been limited by difficulties to access sh...
详细信息
ISBN:
(纸本)9781450357944
The application of naturallanguageprocessing (NLP) methods and resources to clinical and biomedical text has received growing attention over the past years, but progress has been limited by difficulties to access shared tools and resources, partially caused by patient privacy and data confidentiality constraints. Efforts to increase sharing and interoperability of the few existing resources are needed to facilitate the progress observed in the general NLP domain. Leveraging our research in corpus analysis and de-identification research, we have created multiple synthetic data sets for a couple of NLP tasks based on real clinical sentences. We are organizing a challenge workshop to promote community efforts towards the advancement in clinical NLP. The challenge workshop will have two tasks: 1) Family History Information Extraction;and 2) Clinical Semantic Textual Similarity.
In this paper we present a software tool for elicitation and management of process metadata. It follows our previously published design idea of an assistant for researchers that aims at minimizing the additional effor...
详细信息
ISBN:
(纸本)9791095546009
In this paper we present a software tool for elicitation and management of process metadata. It follows our previously published design idea of an assistant for researchers that aims at minimizing the additional effort required for producing a sustainable workflow documentation. With the ever-growing number of linguistic resources available, it also becomes increasingly important to provide proper documentation to make them comparable and to allow meaningful evaluations for specific use cases. The often prevailing practice of post hoc documentation of resource generation or research processes bears the risk of information loss. Not only does detailed documentation of a process aid in achieving reproducibility, it also increases usefulness of the documented work for others as a cornerstone of good scientific practice. Time pressure together with the lack of simple documentation methods leads to workflow documentation in practice being an arduous and often neglected task. Our tool ensures a clean documentation for common workflows in naturallanguageprocessing and digital humanities. Additionally, it can easily be integrated into existing institutional infrastructures.
Recently artificial intelligence technology replaces traditional manual methods in many fields. Especially the application of artificial intelligence in the legal field liberates legal people from tedious work. For ex...
详细信息
Question-Answer (QA) matching is a fundamental task in the naturallanguageprocessing community. In this paper, we first build a novel QA matching corpus with informal text which is collected from a product reviewing...
详细信息
MultiMT is an European Research Council Starting Grant whose aim is to devise data, methods and algorithms to exploit multi-modal information (images, audio, metadata) for context modelling in machine translation and ...
详细信息
A novel learnable dictionary encoding layer is proposed in this paper for end-to-end language identification. It is inline with the conventional GMM i-vector approach both theoretically and practically. We imitate the...
详细信息
ISBN:
(纸本)9781538646588
A novel learnable dictionary encoding layer is proposed in this paper for end-to-end language identification. It is inline with the conventional GMM i-vector approach both theoretically and practically. We imitate the mechanism of traditional GMM training and Supervector encoding procedure on the top of CNN. The proposed layer can accumulate high-order statistics from variable-length input sequence and generate an utterance level fixed-dimensional vector representation. Unlike the conventional methods, our new approach provides an end-to-end learning framework, where the inherent dictionary are learned directly from the loss function. The dictionaries and the encoding representation for the classifier are learned jointly. The representation is orderless and therefore appropriate for language identification. We conducted a preliminary experiment on NIST LRE07 closed-set task, and the results reveal that our proposed dictionary encoding layer achieves significant error reduction comparing with the simple average pooling.
The proceedings contain 12 papers. The topics discussed include: the automatic processing of the texts in naturallanguage. some bibliometric indicators of the current state of this research area;comparison of traditi...
The proceedings contain 12 papers. The topics discussed include: the automatic processing of the texts in naturallanguage. some bibliometric indicators of the current state of this research area;comparison of traditional machine learning methods and Google services in identifying tonality on Russian texts;the algorithms for complex analysis of the corpuses of poetic texts in the Kazakh language;fast and accurate patent classification in search engines;reservoir computing echo state network classifier training;the comparison of autoencoder architectures in improving of prediction models;metadata handling for big data projects;and supervised and unsupervised learning in processing myographic patterns.
Physician Review Websites allow users to evaluate their experiences with health services. As these evaluations are regularly contextualized with facts from users' private lives, they often accidentally disclose pe...
详细信息
暂无评论