检索结果-内蒙古大学图书馆

A survey on automatic image caption generation

NEUROCOMPUTING 2018年 311卷 291-304页

作者： Bai, Shuang An, Shan Beijing Jiaotong Univ Sch Elect & Informat Engn 3 Shang Yuan Cun Beijing Peoples R China Beijing Jingdong Shangke Informat Technol Co Ltd Beijing Peoples R China

Image captioning means automatically generating a caption for an image. As a recently emerged research area, it is attracting more and more attention. To achieve the goal of image captioning, semantic information of images needs to be captured and expressed in natural languages. Connecting both research communities of computer vision and natural language processing, image captioning is a quite challenging task. Various approaches have been proposed to solve this problem. In this paper, we present a survey on advances in image captioning research. Based on the technique adopted, we classify image captioning approaches into different categories. Representative methods in each category are summarized, and their strengths and limitations are talked about. In this paper, we first discuss methods used in early work which are mainly retrieval and template based. Then, we focus our main attention on neural network based methods, which give state of the art results. Neural network based methods are further divided into subcategories based on the specific framework they use. Each subcategory of neural network based methods are discussed in detail. After that, state of the art methods are compared on benchmark datasets. Following that, discussions on future research directions are presented. (C) 2018 Elsevier B.V. All rights reserved.

关键词： Image captioning Sentence template Deep neural networks Multimodal embedding encoder-decoder framework Attention mechanism

来源：评论

学校读者我要写书评

暂无评论

RESC-net: reconstruction error as skip connection for stereo matching

引用

ELECTRONICS LETTERS 2018年第23期54卷 1330-1331页

作者： Feng, Yiliu Liang, Zhengfa Liu, Hengzhu Natl Univ Def Technol Changsha Hunan Peoples R China

Recently, the stereo matching task has been dramatically promoted by the deep learning methods. Specifically, the encoder-decoder framework with skip connection achieves outstanding performance over others. The skip connection scheme can bring detailed or in other words, residual information for the final prediction, thus improves the performance, which is successfully applied in many other pixel-wise prediction tasks, such as semantic segmentation, depth estimation and so on. In contrast to other tasks, the authors can explicitly obtain the residual information for stereo matching, which is achieved by back-warping the right image and calculating the reconstruction error. The reconstruction error is successfully used as unsupervised loss, but has not been explored for skip connection. In this Letter, the authors show that the reconstruction error in the feature space is very helpful to bring residual information for the final prediction. They validate the effectiveness of using reconstruction error for skip connection by conducting experiments on the KITTI 2015 and Scene Flow datasets. Experiments show that the proposed scheme can improve the performance by a notable margin and achieves the state-of-the-art performance with very fast processing time.

关键词： decoding image matching stereo image processing image segmentation learning (artificial intelligence) image sequences feature extraction reconstruction error stereo matching task deep learning methods encoder-decoder framework skip connection achieves outstanding performance skip connection scheme residual information final prediction pixel-wise prediction tasks

来源：评论

学校读者我要写书评

暂无评论

Automatic Caption Generation for Medical Images 18

Automatic Caption Generation for Medical Images

引用

3rd International Conference on Smart City Applications (SCA')

作者： Allaouzi, Imane Ben Ahmed, M. Benamrou, B. Ouardouz, M. Abdelmalek Essaadi Univ LIST FSTT Tangier Morocco Abdelmalek Essaadi Univ MMC FSTT Tangier Morocco

ISBN: (纸本)9781450365628

With the increasing availability of medical images coming from different modalities (X-Ray, CT, PET, MRI, ultrasound, etc.), and the huge advances in the development of incredibly fast, accurate and enhanced computing power with the current graphics processing units. The task of automatic caption generation from medical images became a new way to improve healthcare and the key method for getting better results at lower costs. In this paper, we give a comprehensive overview of the task of image captioning in the medical domain, covering: existing models, the benchmark medical image caption datasets, and evaluation metrics that have been used to measure the quality of the generated captions.

关键词： Medical Image Captioning Deep Neural Networks LSTM CNN RNN Generative models Retrieval-based models encoder-decoder framework Attention mechanism Computer Vision Natural Language Processing

来源：评论

学校读者我要写书评

暂无评论

Recurrent Fusion Network for Image Captioning 15th

Recurrent Fusion Network for Image Captioning

引用

15th European Conference on Computer Vision (ECCV)

作者： Jiang, Wenhao Ma, Lin Jiang, Yu-Gang Liu, Wei Zhang, Tong Tencent AI Lab Shenzhen Peoples R China Fudan Univ Shanghai Peoples R China

ISBN: (纸本)9783030012168;9783030012151

Recently, much advance has been made in image captioning, and an encoder-decoder framework has been adopted by all the state-of-the-art models. Under this framework, an input image is encoded by a convolutional neural network (CNN) and then translated into natural language with a recurrent neural network (RNN). The existing models counting on this framework employ only one kind of CNNs, e.g., ResNet or Inception-X, which describes the image contents from only one specific view point. Thus, the semantic meaning of the input image cannot be comprehensively understood, which restricts improving the performance. In this paper, to exploit the complementary information from multiple encoders, we propose a novel recurrent fusion network (RFNet) for the image captioning task. The fusion process in our model can exploit the interactions among the outputs of the image encoders and generate new compact and informative representations for the decoder. Experiments on the MSCOCO dataset demonstrate the effectiveness of our proposed RFNet, which sets a new state-of-the-art for image captioning.

关键词： Image captioning encoder-decoder framework Recurrent fusion network (RFNet)

来源：评论

学校读者我要写书评

暂无评论

Encoding Emotional Information for Sequence-to-Sequence Response Generation

Encoding Emotional Information for Sequence-to-Sequence Resp...

引用

International Conference on Artificial Intelligence and Big Data (ICAIBD)

作者： Chan, Yin Hei Lui, Andrew Kwok Fai Open Univ Hong Kong Comp Sch Sci & Technol Hong Kong Hong Kong Peoples R China

ISBN: (纸本)9781538669877

This paper introduces an alternative approach on embedding emotional information at the encoder stage of a sequence-to-sequence based emotional response generation. It explores different positioning and styles of the embedding, which represent associations of emotion with specific words or the whole sentence. The experiment was set up with standard dataset as well as dataset annotated with emotional classifiers. Preliminary results showed that this new approach should better represent sentence level emotional and work well with standard Recurrent Neural network (RNN) with Long Short Term Memory (LSTM) architecture.

关键词： chatbots conversational agents long short term memory recurrent neural network encoder-decoder framework emotional response

来源：评论

学校读者我要写书评

暂无评论

Abstractive Document Summarization via Neural Model with Joint Attention 6th

Abstractive Document Summarization via Neural Model with Joi...

引用

6th CCF International Conference on Natural Language Processing and Chinese Computing (NLPCC)

作者： Hou, Liwei Hu, Po Bei, Chao Cent China Normal Univ Sch Comp Sci Wuhan 430079 Hubei Peoples R China Global Tone Commun Technol Co Ltd Beijing 100043 Peoples R China

ISBN: (纸本)9783319736181;9783319736174

Due to the difficulty of abstractive summarization, the great majority of past work on document summarization has been extractive, while the recent success of sequence-to-sequence framework has made abstractive summarization viable, in which a set of recurrent neural networks models based on attention encoder-decoder have achieved promising performance on short-text summarization tasks. Unfortunately, these attention encoder-decoder models often suffer from the undesirable shortcomings of generating repeated words or phrases and inability to deal with out-of-vocabulary words appropriately. To address these issues, in this work we propose to add an attention mechanism on output sequence to avoid repetitive contents and use the subword method to deal with the rare and unknown words. We applied our model to the public dataset provided by NLPCC 2017 shared task3. The evaluation results show that our system achieved the best ROUGE performance among all the participating teams and is also competitive with some state-of-the-art methods.

关键词： Abstractive summarization Attentional mechanism encoder-decoder framework Neural network

来源：评论

学校读者我要写书评

暂无评论

Feature Fusion Based on Neural Image Captioning with Spatial Attention

Feature Fusion Based on Neural Image Captioning with Spatial...

引用

作者： Qingqing Lu Xiaomei Zhang Xin Kang Fuji Ren School of Information Science and Technology Nantong University Faculty of Engineering Tokushima University

Generating a natural language description of an image is a challenging but meaningful *** task combines two significant artificial intelligent fields:computer vision and natural language *** task is valuable for many applications,such as searching images and assisting the people who have visually impaired to view the world,*** approaches adopt an encoder-decoder framework,and some of the future methods are improved on the basis of this *** these methods,image features are extracted by VGG net or other networks,but the feature map will lose important information during the *** this paper,we fusing different kinds of image features extracted by the two networks:VGG19 and Resnet50,and put it into the neural network to *** also add an attention into the a basic neural encoder-decoder model for generating natural sentence descriptions,at each time step,our model will attend to the image feature and pick up the most meaningful parts to generate *** test our model on the benchmark dataset called I APR TC-12,comparing with other methods,we validate our model have state-of-the-art performance.

关键词： Image captioning feature fusion encoder-decoder framework attention

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：