检索结果-内蒙古大学图书馆

RESC-net: reconstruction error as skip connection for stereo matching

ELECTRONICS LETTERS 2018年第23期54卷 1330-1331页

作者： Feng, Yiliu Liang, Zhengfa Liu, Hengzhu Natl Univ Def Technol Changsha Hunan Peoples R China

Recently, the stereo matching task has been dramatically promoted by the deep learning methods. Specifically, the encoder-decoder framework with skip connection achieves outstanding performance over others. The skip connection scheme can bring detailed or in other words, residual information for the final prediction, thus improves the performance, which is successfully applied in many other pixel-wise prediction tasks, such as semantic segmentation, depth estimation and so on. In contrast to other tasks, the authors can explicitly obtain the residual information for stereo matching, which is achieved by back-warping the right image and calculating the reconstruction error. The reconstruction error is successfully used as unsupervised loss, but has not been explored for skip connection. In this Letter, the authors show that the reconstruction error in the feature space is very helpful to bring residual information for the final prediction. They validate the effectiveness of using reconstruction error for skip connection by conducting experiments on the KITTI 2015 and Scene Flow datasets. Experiments show that the proposed scheme can improve the performance by a notable margin and achieves the state-of-the-art performance with very fast processing time.

关键词： decoding image matching stereo image processing image segmentation learning (artificial intelligence) image sequences feature extraction reconstruction error stereo matching task deep learning methods encoder-decoder framework skip connection achieves outstanding performance skip connection scheme residual information final prediction pixel-wise prediction tasks

来源：评论

学校读者我要写书评

暂无评论

A survey on automatic image caption generation

引用

NEUROCOMPUTING 2018年 311卷 291-304页

作者： Bai, Shuang An, Shan Beijing Jiaotong Univ Sch Elect & Informat Engn 3 Shang Yuan Cun Beijing Peoples R China Beijing Jingdong Shangke Informat Technol Co Ltd Beijing Peoples R China

Image captioning means automatically generating a caption for an image. As a recently emerged research area, it is attracting more and more attention. To achieve the goal of image captioning, semantic information of images needs to be captured and expressed in natural languages. Connecting both research communities of computer vision and natural language processing, image captioning is a quite challenging task. Various approaches have been proposed to solve this problem. In this paper, we present a survey on advances in image captioning research. Based on the technique adopted, we classify image captioning approaches into different categories. Representative methods in each category are summarized, and their strengths and limitations are talked about. In this paper, we first discuss methods used in early work which are mainly retrieval and template based. Then, we focus our main attention on neural network based methods, which give state of the art results. Neural network based methods are further divided into subcategories based on the specific framework they use. Each subcategory of neural network based methods are discussed in detail. After that, state of the art methods are compared on benchmark datasets. Following that, discussions on future research directions are presented. (C) 2018 Elsevier B.V. All rights reserved.

关键词： Image captioning Sentence template Deep neural networks Multimodal embedding encoder-decoder framework Attention mechanism

来源：评论

学校读者我要写书评

暂无评论

encoder-decoder Couplet Generation Model Based on 'Trapezoidal Context' Character Vector

引用

COMPUTER JOURNAL 2021年第3期64卷 286-295页

作者： Gao, Rui Zhu, Yuanyuan Li, Mingye Li, Shoufeng Shi, Xiaohu Jilin Univ Coll Comp Sci & Technol Minist Educ Key Lab Symbol Computat & Knowledge Engn 2699 Qianjin St Changchun 130012 Peoples R China Beijing Wodong Tianjun Informat Technol Co Ltd JD Cloud & AI Dept 18th HouseKechuang 11th St Beijing 101111 Peoples R China Univ Melbourne Melbourne Sch Engn Sch Comp & Informat Syst Melbourne Vic 3010 Australia

This paper studies the couplet generation model which automatically generates the second line of a couplet by giving the first line. Unlike other sequence generation problems, couplet generation not only considers the sequential context within a sentence line but also emphasizes the relationships between the corresponding words of first and second lines. Therefore, a trapezoidal context character embedding the vector model has been developed firstly, which considers the 'sequence context' and the 'corresponding word context' simultaneously. Afterwards, we chose the typical encoder-decoder framework to solve the sequence-sequence problems, of which the encoder and decoder are used by bi-directional GRU and GRU, respectively. In order to further increase the semantic consistency of the first and second lines of couplets, the pre-trained sentence vector of the first line is added to the attention mechanism in the model. To verify the effectiveness of the method, it is applied to the real data set. Experimental results show that our proposed model can compete with the up-to-date methods, and both adding sentence vectors to attention and using trapezoidal context character vectors can improve the effectiveness of the algorithm.

关键词： couplet generation model encoder-decoder framework gate recurrent unit word vector

来源：评论

学校读者我要写书评

暂无评论

Spatial Information-Guided Adaptive Context-Aware Network for Efficient RGB-D Semantic Segmentation

引用

IEEE SENSORS JOURNAL 2023年第19期23卷 23512-23521页

作者： Zhang, Yang Xiong, Chenyun Liu, Junjie Ye, Xuhui Sun, Guodong Hubei Univ Technol Sch Mech Engn Wuhan 430068 Peoples R China Nanjing Univ Dept Comp Sci & Technol Natl Key Lab Novel Software Technol Nanjing 210023 Peoples R China

Efficient RGB-D semantic segmentation has received considerable attention in mobile robots, which plays a vital role in analyzing and recognizing environmental information. According to previous studies, depth information can provide corresponding geometric relationships for objects and scenes, but actual depth data usually exist as noise. To avoid unfavorable effects on segmentation accuracy and computation, it is necessary to design an efficient framework to leverage cross-modal correlations and complementary cues. In this article, we propose an efficient lightweight encoder-decoder network that reduces the computational parameters and guarantees the robustness of the algorithm. Working with channel and spatial fusion attention modules, our network effectively captures multi-level RGB-D features. A globally guided local affinity context module is proposed to obtain sufficient high-level context information. The decoder uses a lightweight residual unit (LRU) that combines short- and long-distance information with a few redundant computations. Experimental results on the NYUv2, SUN RGB-D, and Cityscapes datasets show that our method achieves a better tradeoff among segmentation accuracy, inference time, and parameters than the state-of-the-art (SOTA) methods.

关键词： Efficient encoder-decoder framework RGB-D semantic segmentation spatial and channel attention

来源：评论

学校读者我要写书评

暂无评论

GVA: guided visual attention approach for automatic image caption generation

引用

MULTIMEDIA SYSTEMS 2024年第1期30卷 1-16页

作者： Hossen, Md. Bipul Ye, Zhongfu Abdussalam, Amr Hossain, Md. Imran Univ Sci & Technol China Sch Informat Sci & Technol Hefei 230027 Anhui Peoples R China Pabna Univ Sci & Technol Dept ICE Pabna 6600 Bangladesh

Automated image caption generation with attention mechanisms focuses on visual features including objects, attributes, actions, and scenes of the image to understand and provide more detailed captions, which attains great attention in the multimedia field. However, deciding which aspects of an image to highlight for better captioning remains a challenge. Most advanced captioning models utilize only one attention module to assign attention weights to visual vectors, but this may not be enough to create an informative caption. To tackle this issue, we propose an innovative and well-designed Guided Visual Attention (GVA) approach, incorporating an additional attention mechanism to re-adjust the attentional weights on the visual feature vectors and feed the resulting context vector to the language LSTM. Utilizing the first-level attention module as guidance for the GVA module and re-weighting the attention weights significantly enhances the caption's quality. Recently, deep neural networks have allowed the encoder-decoder architecture to make use visual attention mechanism, where faster R-CNN is used for extracting features in the encoder and a visual attention-based LSTM is applied in the decoder. Extensive experiments have been implemented on both the MS-COCO and Flickr30k benchmark datasets. Compared with state-of-the-art methods, our approach achieved an average improvement of 2.4% on BLEU@1 and 13.24% on CIDEr for the MSCOCO dataset, as well as 4.6% on BLEU@1 and 12.48% on CIDEr score for the Flickr30K datasets, based on the cross-entropy optimization. These results demonstrate the clear superiority of our proposed approach in comparison to existing methods using standard evaluation metrics. The implementing code can be found here: (https://***/mdbipu/GVA).

关键词： Image captioning Faster R-CNN LSTM Up-down model encoder-decoder framework

来源：评论

学校读者我要写书评

暂无评论

Automatic Caption Generation for Medical Images 18

Automatic Caption Generation for Medical Images

引用

3rd International Conference on Smart City Applications (SCA')

作者： Allaouzi, Imane Ben Ahmed, M. Benamrou, B. Ouardouz, M. Abdelmalek Essaadi Univ LIST FSTT Tangier Morocco Abdelmalek Essaadi Univ MMC FSTT Tangier Morocco

ISBN: (纸本)9781450365628

With the increasing availability of medical images coming from different modalities (X-Ray, CT, PET, MRI, ultrasound, etc.), and the huge advances in the development of incredibly fast, accurate and enhanced computing power with the current graphics processing units. The task of automatic caption generation from medical images became a new way to improve healthcare and the key method for getting better results at lower costs. In this paper, we give a comprehensive overview of the task of image captioning in the medical domain, covering: existing models, the benchmark medical image caption datasets, and evaluation metrics that have been used to measure the quality of the generated captions.

关键词： Medical Image Captioning Deep Neural Networks LSTM CNN RNN Generative models Retrieval-based models encoder-decoder framework Attention mechanism Computer Vision Natural Language Processing

来源：评论

学校读者我要写书评

暂无评论

Efficient Audio Captioning with encoder-Level Knowledge Distillation 25

Efficient Audio Captioning with Encoder-Level Knowledge Dist...

引用

25th Interspeech Conference

作者： Xu, Xuenan Liu, Haohe Wu, Mengyue Wang, Wenwu Plumbley, Mark D. Shanghai Jiao Tong Univ MoE Key Lab Artificial Intelligence X LANCE Lab Shanghai Peoples R China Univ Surrey Ctr Vis Speech & Signal Proc CVSSP Guildford Surrey England

Significant improvement has been achieved in automated audio captioning (AAC) with recent models. However, these models have become increasingly large as their performance is enhanced. In this work, we propose a knowledge distillation (KD) framework for AAC. Our analysis shows that in the encoder-decoder based AAC models, it is more effective to distill knowledge into the encoder as compared with the decoder. To this end, we incorporate encoder-level KD loss into training, in addition to the standard supervised loss and sequence-level KD loss. We investigate two encoder-level KD methods, based on mean squared error (MSE) loss and contrastive loss, respectively. Experimental results demonstrate that contrastive KD is more robust than MSE KD, exhibiting superior performance in data-scarce situations. By leveraging audio-only data into training in the KD framework, our student model achieves competitive performance, with an inference speed that is 19 times faster(1).

关键词： automated audio captioning encoder-decoder framework knowledge distillation EfficientNet

来源：评论

学校读者我要写书评

暂无评论

CBR-Ren: A Case-Based Reasoning Driven Retriever-Generator Model for Hybrid Long-Form Numerical Reasoning 32nd

CBR-Ren: A Case-Based Reasoning Driven Retriever-Generator M...

引用

32nd International Conference on Case-Based Reasoning Research and Development (ICCBR)

作者： Feng, Boda Gao, Hui Zhang, Peng Zhang, Jing Tianjin Univ Coll Intelligence & Comp Tianjin Peoples R China

ISBN: (纸本)9783031636455;9783031636462

Numerical reasoning over hybrid data aims to extract critical facts from long-form documents and tables, and generate arithmetic expressions based on these facts to answer the question. Most existing methods are based on the retriever-generator model. However, the inferential power of the retriever-generator model is poor, resulting in insufficient attention to critical facts. To solve these problems, combining Large Language Model (LLM) and Case-Based Reasoning (CBR), we propose a Case-Based Driven Retriever-generator model (CBR-Ren) to enhance the ability of the retriever-generator model for retrieving and distinguishing critical facts. In the retrieval stage, the model introduces a golden explanation by prompt technology of LLM, which helps the retriever construct explicit templates for inferring critical facts and reduces the impact of non-critical facts on the generator. In the generator stage, the CBR-driven retrieval algorithm enhances the representation learning ability of the encoder and obtains the relevant knowledge in decoder history. In addition, the model proposes fact weighting, which enhances the ability to locate critical facts and helps to generate correct numerical expressions. Experimental results on the FinQA and Conv-FinQA demonstrate the effectiveness of CBR-Ren, which outperforms all the baselines.

关键词： Financial Numerical reasoning Prompt-based technology Case-based Reasoning encoder-decoder framework

来源：评论

学校读者我要写书评

暂无评论

Image Captioning Based on Automatic Constraint Loss 19

Image Captioning Based on Automatic Constraint Loss

引用

11th International Conference on Machine Learning and Computing (ICMLC)

作者： Xu, Chaoqian Zhu, Gengming Wang, Lixin Hunan Univ Sci & Technol Xiangtan Hunan Peoples R China

ISBN: (纸本)9781450366007

In recent years, the encoder-decoder framework has been widely used in image captioning. In the forecast period, many methods regard the input of the usage model at the previous moment as the output at the moment, which may cause the generated words to get worse. This paper proposes to use the correct rate of the preceding words to constrain the weight of the back words, making the loss weight of the back words increase as the preceding word error rate decreases, namely Automatic Constraint Loss (ACL), reducing the difference in the training and test phase. The experimental results on the MSCOCO dataset show that the addition of the proposed method to the original model, the bleu_1 and bleu_2 scores are greatly improved, and the attention mechanism can more accurately select the image region.

关键词： Image captioning encoder-decoder framework automatic constraint Loss

来源：评论

学校读者我要写书评

暂无评论

Abstractive Document Summarization via Neural Model with Joint Attention 6th

Abstractive Document Summarization via Neural Model with Joi...

引用

6th CCF International Conference on Natural Language Processing and Chinese Computing (NLPCC)

作者： Hou, Liwei Hu, Po Bei, Chao Cent China Normal Univ Sch Comp Sci Wuhan 430079 Hubei Peoples R China Global Tone Commun Technol Co Ltd Beijing 100043 Peoples R China

ISBN: (纸本)9783319736181;9783319736174

Due to the difficulty of abstractive summarization, the great majority of past work on document summarization has been extractive, while the recent success of sequence-to-sequence framework has made abstractive summarization viable, in which a set of recurrent neural networks models based on attention encoder-decoder have achieved promising performance on short-text summarization tasks. Unfortunately, these attention encoder-decoder models often suffer from the undesirable shortcomings of generating repeated words or phrases and inability to deal with out-of-vocabulary words appropriately. To address these issues, in this work we propose to add an attention mechanism on output sequence to avoid repetitive contents and use the subword method to deal with the rare and unknown words. We applied our model to the public dataset provided by NLPCC 2017 shared task3. The evaluation results show that our system achieved the best ROUGE performance among all the participating teams and is also competitive with some state-of-the-art methods.

关键词： Abstractive summarization Attentional mechanism encoder-decoder framework Neural network

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：