We present an exploration of the encoder-decoder structured Long Short-Term Memory Network (LSTM) as a detector of the anomalous missing observations in streaming medical data by using the difference between the LSTM-...
详细信息
ISBN:
(纸本)9781538663066
We present an exploration of the encoder-decoder structured Long Short-Term Memory Network (LSTM) as a detector of the anomalous missing observations in streaming medical data by using the difference between the LSTM-reconstructed and observed values as the anomaly detector. We experiment with time-series data from bedside monitoring devices from the available Medical Information Mart for Intensive Care Database (MIMIC). Our results show that not only encoder-decoder LSTM approach works well for detecting the difference between anomalous and normal missing observations in streaming medical data, but also has an imputation potential for the missing data.
Table-to-text generation involves using natural language to describe a table which has formal structure and valuable information. This paper introduces a two-level encoder-decoder neural model for table-to-text genera...
详细信息
Table-to-text generation involves using natural language to describe a table which has formal structure and valuable information. This paper introduces a two-level encoder-decoder neural model for table-to-text generation. To make the most of the structure which ordinarily is expressed as a set of field-value records and deal with rare words appearing in a table, this study adopts an improved encoder-decoder approach and uses field information to reprocess words in texts as decoding result. In encoder, two LSTM-RNNs used for combining fields and values that one LSTM-RNN gives priority to fields and the other gives first place to values. In decoder, two-level attention mechanism used on states encoded before to get the relation between words in the text and fields in the table and the relation between words in the text and values in the table. At last the decoding result is transformed to real words. The model is experimented on WIKIBIO and WEATHERGOV, and improves the current stateof-the-art BLEU-4 score from 44.89 to 45.77, and from 61.01 to 62.89 respectively.
Training a conventional automatic speech recognition (ASR) system to support multiple languages is challenging because the sub-word unit, lexicon and word inventories are typically language specific. In contrast, sequ...
详细信息
ISBN:
(纸本)9781538646588
Training a conventional automatic speech recognition (ASR) system to support multiple languages is challenging because the sub-word unit, lexicon and word inventories are typically language specific. In contrast, sequence-to-sequence models are well suited for multilingual ASR because they encapsulate an acoustic, pronunciation and language model jointly in a single network. In this work we present a single sequence-to-sequence ASR model trained on 9 different Indian languages, which have very little overlap in their scripts. Specifically, we take a union of language-specific grapheme sets and train a grapheme-based sequence-to-sequence model jointly on data from all languages. We find that this model, which is not explicitly given any information about language identity, improves recognition performance by 21% relative compared to analogous sequence-to-sequence models trained on each language individually. By modifying the model to accept a language identifier as an additional input feature, we further improve performance by an additional 7% relative and eliminate confusion between different languages.
Time series forecasting has been regarded as a key research problem in various fields. such as financial forecasting, traffic flow forecasting, medical monitoring, intrusion detection, anomaly detection, and air quali...
详细信息
ISBN:
(纸本)9781538694039
Time series forecasting has been regarded as a key research problem in various fields. such as financial forecasting, traffic flow forecasting, medical monitoring, intrusion detection, anomaly detection, and air quality forecasting etc. In this paper, we propose a sequence-to-sequence deep learning framework for multivariate time series forecasting, which addresses the dynamic, spatial-temporal and nonlinear characteristics of multivariate time series data by LSTM based encoder-decoder architecture. Through the air quality multivariate time series forecasting experiments, we show that the proposed model has better forecasting performance than classic shallow learning and baseline deep learning models. And the predicted PM2.5 value can be well matched with the ground truth value under single timestep and multi-timestep forward forecasting conditions. The experiment results show that our model is capable of dealing with multivariate time series forecasting with satisfied accuracy.
We present a state-of-the-art end-to-end Automatic Speech Recognition (ASR) model. We learn to listen and write characters with a joint Connectionist Temporal Classification (CTC) and attention-based encoder-decoder n...
详细信息
ISBN:
(纸本)9781510848764
We present a state-of-the-art end-to-end Automatic Speech Recognition (ASR) model. We learn to listen and write characters with a joint Connectionist Temporal Classification (CTC) and attention-based encoder-decoder network. The encoder is a deep Convolutional Neural Network (CNN) based on the VGG network. The CTC network sits on top of the encoder and is jointly trained with the attention-based decoder. During the beam search process, we combine the CTC predictions. the attention-based decoder predictions and a separately trained LSTM language model. We achieve a 5-10% error reduction compared to prior systems on spontaneous Japanese and Chinese speech, and our end-to-end model beats out traditional hybrid ASR systems.
图像标注任务是人工智能领域中将机器视觉(Computer Vision)与自然语言处理(Natural Language Processing)两大方向相结合的任务,受到学界极大的关注。本文针对目前主流的图像描述算法进行综合的研究,基于目前图像标注任务中取得优秀效...
详细信息
图像标注任务是人工智能领域中将机器视觉(Computer Vision)与自然语言处理(Natural Language Processing)两大方向相结合的任务,受到学界极大的关注。本文针对目前主流的图像描述算法进行综合的研究,基于目前图像标注任务中取得优秀效果的CNN-LSTM描述生成算法,引入目前机器视觉方向上取得长足发展的目标检测框架Faster R-CNN作编码器替换CNN,使用图像区域特征输入解码器;在解码器部分的循环神经网络中使用注意力机制,进一步强化区域图像特征对解码器生成自然语言描述的贡献,从而构成从区域特征到全局描述的结构化图像标注框架。这一图像标注算法在MSCO⁃CO数据集上进行训练与测试(分别在训练集与测试集上进行),我们提出的模型获得了超过了基线模型的效果。
Multiple chemical information processing (i.e. encoding and decoding) was achieved by a Q-band absorption pattern-generating miniaturized chromophore "temoporfin" using a set of metal inputs in various combi...
详细信息
Multiple chemical information processing (i.e. encoding and decoding) was achieved by a Q-band absorption pattern-generating miniaturized chromophore "temoporfin" using a set of metal inputs in various combinations akin to biological and digital information processing systems. The distinct Q-band absorption intensities gener-ated as outputs at different wavelengths employing single instrumental method enabled perform as several complex logic 4-to-2 encoders and 2-to-3 decoders in an expeditious manner. (C) 2017 Elsevier B.V. All rights reserved.
This paper presents our approach to improve video captioning by integrating audio and video features. Video captioning is the task of generating a textual description to describe the content of a video. State-of-the-a...
详细信息
ISBN:
(纸本)9781509047888
This paper presents our approach to improve video captioning by integrating audio and video features. Video captioning is the task of generating a textual description to describe the content of a video. State-of-the-art approaches to video captioning are based on sequence-to-sequence models, in which a single neural network accepts sequential images and audio data, and outputs a sequence of words that best describe the input data in natural language. The network thus learns to encode the video input into an intermediate semantic representation, which can be useful in applications such as multimedia indexing, automatic narration, and audio-visual question answering. In our prior work, we proposed an attention-based multi-modal fusion mechanism to integrate image, motion, and audio features, where the multiple features are integrated in the network. Here, we apply hypothesis-level integration based on minimum Bayes-risk (MBR) decoding to further improve the caption quality, focusing on well-known evaluation metrics (BLEU and METEOR scores). Experiments with the YouTube2Text and MSR-VTT datasets demonstrate that combinations of early and late integration of multimodal features significantly improve the audio-visual semantic representation, as measured by the resulting caption quality. In addition, we compared the performance of our method using two different types of audio features: MFCC features, and the audio features extracted using SoundNet, which was trained to recognize objects and scenes from videos using only the audio signals.
暂无评论