检索结果-内蒙古大学图书馆

34th IEEE International Conference on Data Engineering Workshops (ICDEW)

作者： Bowie, Michael Begoli, Edmon Park, Byung H. Oak Ridge Natl Lab Computat Sci & Engn Div Oak Ridge TN 37831 USA

ISBN: (纸本)9781538663066

We present an exploration of the encoder-decoder structured Long Short-Term Memory Network (LSTM) as a detector of the anomalous missing observations in streaming medical data by using the difference between the LSTM-reconstructed and observed values as the anomaly detector. We experiment with time-series data from bedside monitoring devices from the available Medical Information Mart for Intensive Care Database (MIMIC). Our results show that not only encoder-decoder LSTM approach works well for detecting the difference between anomalous and normal missing observations in streaming medical data, but also has an imputation potential for the missing data.

关键词： LSTM encoder-decoder anomaly detection missing data

来源：评论

学校读者我要写书评

暂无评论

Two-Level Model for Table-to-Text Generation

Two-Level Model for Table-to-Text Generation

引用

作者： Juan Cao Junpeng Gong Pengzhou Zhang Communication University of China

Table-to-text generation involves using natural language to describe a table which has formal structure and valuable information. This paper introduces a two-level encoder-decoder neural model for table-to-text generation. To make the most of the structure which ordinarily is expressed as a set of field-value records and deal with rare words appearing in a table, this study adopts an improved encoder-decoder approach and uses field information to reprocess words in texts as decoding result. In encoder, two LSTM-RNNs used for combining fields and values that one LSTM-RNN gives priority to fields and the other gives first place to values. In decoder, two-level attention mechanism used on states encoded before to get the relation between words in the text and fields in the table and the relation between words in the text and values in the table. At last the decoding result is transformed to real words. The model is experimented on WIKIBIO and WEATHERGOV, and improves the current stateof-the-art BLEU-4 score from 44.89 to 45.77, and from 61.01 to 62.89 respectively.

关键词： table-to-text generation encoder-decoder attention mechanism neural network

来源：评论

学校读者我要写书评

暂无评论

MULTILINGUAL SPEECH RECOGNITION WITH A SINGLE END-TO-END MODEL

MULTILINGUAL SPEECH RECOGNITION WITH A SINGLE END-TO-END MOD...

引用

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

作者： Toshniwal, Shubham Sainath, Tara N. Weiss, Ron J. Li, Bo Moreno, Pedro Weinstein, Eugene Rao, Kanishka Toyota Technol Inst Chicago IL 60637 USA Google Inc Mountain View CA USA

ISBN: (纸本)9781538646588

Training a conventional automatic speech recognition (ASR) system to support multiple languages is challenging because the sub-word unit, lexicon and word inventories are typically language specific. In contrast, sequence-to-sequence models are well suited for multilingual ASR because they encapsulate an acoustic, pronunciation and language model jointly in a single network. In this work we present a single sequence-to-sequence ASR model trained on 9 different Indian languages, which have very little overlap in their scripts. Specifically, we take a union of language-specific grapheme sets and train a grapheme-based sequence-to-sequence model jointly on data from all languages. We find that this model, which is not explicitly given any information about language identity, improves recognition performance by 21% relative compared to analogous sequence-to-sequence models trained on each language individually. By modifying the model to accept a language identifier as an additional input feature, we further improve performance by an additional 7% relative and eliminate confusion between different languages.

关键词： ASR speech recognition multilingual encoder-decoder seq2seq Indian

来源：评论

学校读者我要写书评

暂无评论

Time Series Forecasting using Sequence-to-Sequence Deep Learning Framework

Time Series Forecasting using Sequence-to-Sequence Deep Lear...

引用

9th International Conference on Parallel Architectures, Algorithms and Programming (PAAP)

作者： Du, Shengdong Li, Tianrui Horng, Shi-Jinn Southwest Jiaotong Univ Sch Informat Sci & Technol Chengdu 611756 Sichuan Peoples R China Natl Taiwan Univ Sci & Technol Dept Comp Sci & Informat Engn Taipei Taiwan

ISBN: (纸本)9781538694039

Time series forecasting has been regarded as a key research problem in various fields. such as financial forecasting, traffic flow forecasting, medical monitoring, intrusion detection, anomaly detection, and air quality forecasting etc. In this paper, we propose a sequence-to-sequence deep learning framework for multivariate time series forecasting, which addresses the dynamic, spatial-temporal and nonlinear characteristics of multivariate time series data by LSTM based encoder-decoder architecture. Through the air quality multivariate time series forecasting experiments, we show that the proposed model has better forecasting performance than classic shallow learning and baseline deep learning models. And the predicted PM2.5 value can be well matched with the ground truth value under single timestep and multi-timestep forward forecasting conditions. The experiment results show that our model is capable of dealing with multivariate time series forecasting with satisfied accuracy.

关键词： Time series forecasting LSTM encoder-decoder PM2.5 Sequence-to-sequence deep learning

来源：评论

学校读者我要写书评

暂无评论

Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN encoder and RNN-LM 18

Advances in Joint CTC-Attention based End-to-End Speech Reco...

引用

18th Annual Conference of the International-Speech-Communication-Association (INTERSPEECH 2017)

作者： Hari, Takaaki Watanabe, Shinji Zhang, Yu Chan, William Mitsubishi Elect Res Labs Cambridge MA 02139 USA MIT Cambridge MA 02139 USA Carnegie Mellon Univ Pittsburgh PA 15213 USA

ISBN: (纸本)9781510848764

We present a state-of-the-art end-to-end Automatic Speech Recognition (ASR) model. We learn to listen and write characters with a joint Connectionist Temporal Classification (CTC) and attention-based encoder-decoder network. The encoder is a deep Convolutional Neural Network (CNN) based on the VGG network. The CTC network sits on top of the encoder and is jointly trained with the attention-based decoder. During the beam search process, we combine the CTC predictions. the attention-based decoder predictions and a separately trained LSTM language model. We achieve a 5-10% error reduction compared to prior systems on spontaneous Japanese and Chinese speech, and our end-to-end model beats out traditional hybrid ASR systems.

关键词： end-to-end speech recognition encoder-decoder connectionist temporal classification attention model

来源：评论

学校读者我要写书评

暂无评论

基于长短时神经网络的城市需水量预测应用

引用

净水技术 2019年第A01期38卷 257-260,286页

作者：张薇薇赵平伟王景成上海城投水务〈集团〉有限公司上海200002 上海交通大学电子信息与电气工程学院上海200240

在分析影响居民用水量相关性因素的基础上,采用长短时神经网络结合encoder-decoder方法建立城市需水量预测模型。长短时神经网络可以自动从时间序列的历史数据中抽取数据特征,避免了手动设计输入变量特征的繁琐,且可以采用更长时间的历... 详细信息

在分析影响居民用水量相关性因素的基础上,采用长短时神经网络结合encoder-decoder方法建立城市需水量预测模型。长短时神经网络可以自动从时间序列的历史数据中抽取数据特征,避免了手动设计输入变量特征的繁琐,且可以采用更长时间的历史数据进行训练,充分考虑长期条件下不同天气、节假日的城市居民用水特征。encoder-decoder的网络结构模拟大脑对数据处理和做出决策的过程,适合多小时水量预测模型的构建。该模型应用于某地区需水量预测,取得了较高的预测精度,模型的适用性得到了有效验证。

关键词：需水量预测长短时神经网络 encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

基于深度学习的结构化图像标注研究

引用

电脑知识与技术 2019年第11X期15卷 187-189页

作者：姚义王诗珂陈希豪林宇翩华东理工大学

图像标注任务是人工智能领域中将机器视觉(Computer Vision)与自然语言处理(Natural Language Processing)两大方向相结合的任务,受到学界极大的关注。本文针对目前主流的图像描述算法进行综合的研究,基于目前图像标注任务中取得优秀效... 详细信息

图像标注任务是人工智能领域中将机器视觉(Computer Vision)与自然语言处理(Natural Language Processing)两大方向相结合的任务,受到学界极大的关注。本文针对目前主流的图像描述算法进行综合的研究,基于目前图像标注任务中取得优秀效果的CNN-LSTM描述生成算法,引入目前机器视觉方向上取得长足发展的目标检测框架Faster R-CNN作编码器替换CNN,使用图像区域特征输入解码器;在解码器部分的循环神经网络中使用注意力机制,进一步强化区域图像特征对解码器生成自然语言描述的贡献,从而构成从区域特征到全局描述的结构化图像标注框架。这一图像标注算法在MSCO⁃CO数据集上进行训练与测试(分别在训练集与测试集上进行),我们提出的模型获得了超过了基线模型的效果。

关键词： encoder-decoder Faster R-CNN 注意力机制

来源：评论

学校读者我要写书评

暂无评论

An expedient multiple information processing pattern-generating chromophore

引用

SENSORS AND ACTUATORS B-CHEMICAL 2017年 251卷 164-170页

作者： Rout, Bhimsen Bigliardi-Qi, Mei Bigliardi, Paul Lorenz Agcy Sci Technol & Res Inst Med Biol Singapore 138648 Singapore

Multiple chemical information processing (i.e. encoding and decoding) was achieved by a Q-band absorption pattern-generating miniaturized chromophore "temoporfin" using a set of metal inputs in various combinations akin to biological and digital information processing systems. The distinct Q-band absorption intensities gener-ated as outputs at different wavelengths employing single instrumental method enabled perform as several complex logic 4-to-2 encoders and 2-to-3 decoders in an expeditious manner. (C) 2017 Elsevier B.V. All rights reserved.

关键词： Molecular logic-gate encoder-decoder Pattern-generating chromophore Molecular information processinga

来源：评论

学校读者我要写书评

暂无评论

基于深度学习的端到端场景文本识别方法研究

基于深度学习的端到端场景文本识别方法研究

引用

作者：刘衍平华南理工大学

学位级别：硕士

图片能带给人们丰富的信息,而文字作为人类智慧的结晶,其所包含的信息量往往要比色彩纹理等携带的信息量大得多,因此对场景图像中文字的识别和理解显得十分有必要和重要。由于场景文本图像的复杂性,传统OCR文字识别不再适应这种新的挑... 详细信息

图片能带给人们丰富的信息,而文字作为人类智慧的结晶,其所包含的信息量往往要比色彩纹理等携带的信息量大得多,因此对场景图像中文字的识别和理解显得十分有必要和重要。由于场景文本图像的复杂性,传统OCR文字识别不再适应这种新的挑战。人工智能及计算机科学技术的新突破,使得基于深度学习算法理论的场景文本识别方法较传统OCR技术有了较大的提升,但离实际的运用还有不小的差距。因此,本文进行基于深度学习的场景文本识别方法研究具有重要的理论研究意义及广泛的应用前景。本文旨在研究自然场景下中文文本图像的文本识别方法,提出一种基于卷积神经网络及递归神经网络的端到端场景文本识别模型与方法。与传统文本识别方法相比,该模型与方法具有更好的特征学习和特征分类能力。本文完成的主要工作包括:1、提出了一种基于可变形卷积网络的场景文本图像特征提取模型。该模型利用可变形卷积神经网络实现了文本图像特征的自动提取,与其他模型相比具有更好的特征学习能力,对复杂场景文本图像的识别具有更好的鲁棒性,尤其表现在文本图像中字体存在几何变形变换时,鲁棒性更好。利用本文提出的特征提取模型,能较好地提取场景文本图像中的特征,可以有效改善文本识别的性能。2、提出了一种改进的encoder-decoder框架中的注意力机制计算模型。标准注意力机制通常采用全局注意力的方式进行解码,并且当前时刻的输入为全部输入信息的加权和。改进后的注意力机制采用了局部注意力的方式,且当前时刻的输入为局部输入信息的加权卷积平均,即先对局部输入信息求解权重因子,各局部输入信息依据权重因子进行加权后,进行卷积操作产生多个新的输入信息,最后将多个新输入信息的平均值作为当前时刻的输入。实验结果表明,本文注意力机制的改进可以提高0.5%文本识别的准确率。3、提出了一种改进的解码输出后处理操作。现有的后处理操作通常采用纯搜索算法或者融合了简单语言模型的搜索算法,其中有些搜索算法存在因搜索简单而性能较差或因搜索复杂而耗时偏长的现象。改进后的后处理操作在不降低解码性能的前提下减少了搜索空间和时间,并融合了有效的统计语言模型。实验结果表明,本文改进的编码输出后处理可以提高解码效率和解码准确率。4、提出了一种自然场景下复杂文本图像的数据增广方法。该方法通过对少量的真实场景文本图像建模,使合成的图像在字体、颜色、噪声、仿射失真等方面更加贴近真实文本图像。通过本文所给的数据增广方法,可以快速合满足自己需要的数据集,减少数据采集的人力物力。5、提出了一种基于二维递归网络的编码解码网络模型。该模型可以避免文本图像特征图降维和利用字符结构信息,实现了端到端的文字识别。在传统的encoder-decoder框架中,通常采用一维递归神经网络作为其编码解码的核心结构。然而一维递归神经网络仅仅适应于序列识别,因此为了采用encoder-decoder框架进行文本识别,通常需要将二维文本图像的特征图进行降维,转为一维序列输入到encoder-decoder框架中。这一操作严重破坏了中文汉字的空间结构,丢失了很大一部分的空间结构特征。本文采用二维递归网络作为encoder-decoder框架的核心,使其可以直接与深度卷积网络中提取的特征图相连。encoder-decoder框架利用了中文汉字的空间结构特征,同时对文本图像中在纵坐标上的形变具有更好的鲁棒性。实验结果表明,较一维递归网络,使用二维递归网络编码解码可以提高2.6%的文本识别准确率,达到最高为78.6%的识别率。较标准二维递归网络,本文的二维递归网络在性能上接近标准二维递归网络,具有计算速度快,网络模型设计简单等特点。

关键词：文本识别深度学习卷积神经网络 encoder-decoder 注意力机制

来源：评论

学校读者我要写书评

暂无评论

EARLY AND LATE INTEGRATION OF AUDIO FEATURES FOR AUTOMATIC VIDEO DESCRIPTION

EARLY AND LATE INTEGRATION OF AUDIO FEATURES FOR AUTOMATIC V...

引用

IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

作者： Hori, Chiori Hori, Takaaki Marks, Tim K. Hershey, John R. MERL 201 Broadway Cambridge MA 02139 USA

ISBN: (纸本)9781509047888

This paper presents our approach to improve video captioning by integrating audio and video features. Video captioning is the task of generating a textual description to describe the content of a video. State-of-the-art approaches to video captioning are based on sequence-to-sequence models, in which a single neural network accepts sequential images and audio data, and outputs a sequence of words that best describe the input data in natural language. The network thus learns to encode the video input into an intermediate semantic representation, which can be useful in applications such as multimedia indexing, automatic narration, and audio-visual question answering. In our prior work, we proposed an attention-based multi-modal fusion mechanism to integrate image, motion, and audio features, where the multiple features are integrated in the network. Here, we apply hypothesis-level integration based on minimum Bayes-risk (MBR) decoding to further improve the caption quality, focusing on well-known evaluation metrics (BLEU and METEOR scores). Experiments with the YouTube2Text and MSR-VTT datasets demonstrate that combinations of early and late integration of multimodal features significantly improve the audio-visual semantic representation, as measured by the resulting caption quality. In addition, we compared the performance of our method using two different types of audio features: MFCC features, and the audio features extracted using SoundNet, which was trained to recognize objects and scenes from videos using only the audio signals.

关键词： video description audio feature SoundNet MFCC encoder-decoder deep learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：