The proceedings contain 12 papers. The topics discussed include: users and data: the two neglected children of bilingual naturallanguageprocessing research;deep investigation of cross-language plagiarism detection m...
详细信息
ISBN:
(纸本)9781945626616
The proceedings contain 12 papers. The topics discussed include: users and data: the two neglected children of bilingual naturallanguageprocessing research;deep investigation of cross-language plagiarism detection methods;sentence alignment using unfolding recursive autoencoders;acquisition of translation lexicons for historically unwritten languages via bridging loanwords;toward a comparable corpus of Latvian, Russian and English tweets;automatic extraction of parallel speech corpora from dubbed movies;a parallel collection of clinical trials in Portuguese and English;weighted set-theoretic alignment of comparable sentences;and BUCC 2017 shared task: a first attempt toward a deep learning framework for identifying parallel sentences in comparable corpora.
Native language Identification (NLI) is the task of automatically identifying the native language (L1) of an individual based on their language production in a learned language. It is typically framed as a classificat...
详细信息
Clinical named entity recognition (CNER) that identifies boundaries and types of medical entities, is a fundamental and crucial task in clinical naturallanguageprocessing. Recent years have witnessed considerable pr...
详细信息
Clinical named entity recognition (CNER) that identifies boundaries and types of medical entities, is a fundamental and crucial task in clinical naturallanguageprocessing. Recent years have witnessed considerable progress in deep learning based algorithms, such as RNN, CNN and their integrated methods, which show the effectiveness in CNER. In this work, we propose a deep learning model for CNER that adopts bidirectional RNN-CRF architecture using concatenated n-gram character representation to capture rich context information. Second, we incorporate word segmentation results, part-of-speech (POS) tagging and medical vocabulary as features into our model. Further, the final output is delivered by the comparison between the separated models and the overall model. The proposed framework has been evaluated in CCKS2017 task2 dataset, achieving 90.10 F1-score for CNER.
We investigate the problem of readeraware multi-document summarization (RA-MDS) and introduce a new dataset for this problem. To tackle RA-MDS, we extend a variational auto-encodes (VAEs) based MDS framework by jointl...
详细信息
Sentiment lexicons are widely used as an intuitive and inexpensive way of tackling sentiment classification, often within a simple lexicon word-counting approach or as part of a supervised model. However, it is an ope...
详细信息
Neural networks are a family of powerful machine learning models. This book focuses on the application of neural network models to naturallanguage data. The first half of the book (Parts I and II) covers the basics o...
ISBN:
(数字)9781627052955
Neural networks are a family of powerful machine learning models. This book focuses on the application of neural network models to naturallanguage data. The first half of the book (Parts I and II) covers the basics of supervised machine learning and feed-forward neural networks, the basics of working with machine learning over language data, and the use of vector-based rather than symbolic representations for words. It also covers the computation-graph abstraction, which allows to easily define and train arbitrary neural networks, and is the basis behind the design of contemporary neural network software libraries. The second part of the book (Parts III and IV) introduces more specialized neural network architectures, including 1D convolutional neural networks, recurrent neural networks, conditioned-generation models, and attention-based models. These architectures and techniques are the driving force behind state-of-the-art algorithms for machine translation, syntactic parsing, and many other applications. Finally, we also discuss tree-shaped networks, structured prediction, and the prospects of multi-task learning.
Text summarization has been one of the key research areas in naturallanguageprocessing (NLP) for a while. The various methods to summarize one or more documents can be broadly classified into extractive and abstract...
详细信息
Since automatic language generation is a task able to enrich applications rooted in most of the language-related areas, from machine translation to interactive dialogue, it seems worthwhile to undertake a strategy foc...
详细信息
暂无评论