检索结果-内蒙古大学图书馆

Inter-Level Feature Balanced Fusion Network for Street Scene Segmentation

SENSORS 2021年第23期21卷 7844-7844页

作者： Li, Dongqian Fan, Cien Zou, Lian Zuo, Qi Jiang, Hao Liu, Yifeng Wuhan Univ Sch Elect Informat Wuhan 430072 Peoples R China Natl Engn Lab Risk Percept & Prevent NEL RPP Beijing 100041 Peoples R China

Semantic segmentation, as a pixel-level recognition task, has been widely used in a variety of practical scenes. Most of the existing methods try to improve the performance of the network by fusing the information of high and low layers. This kind of simple concatenation or element-wise addition will lead to the problem of unbalanced fusion and low utilization of inter-level features. To solve this problem, we propose the Inter-Level Feature Balanced Fusion Network (IFBFNet) to guide the inter-level feature fusion towards a more balanced and effective direction. Our overall network architecture is based on the encoder-decoder architecture. In the encoder, we use a relatively deep convolution network to extract rich semantic information. In the decoder, skip-connections are added to connect and fuse low-level spatial features to restore a clearer boundary expression gradually. We add an inter-level feature balanced fusion module to each skip connection. Additionally, to better capture the boundary information, we added a shallower spatial information stream to supplement more spatial information details. Experiments have proved the effectiveness of our module. Our IFBFNet achieved a competitive performance on the Cityscapes dataset with only finely annotated data used for training and has been greatly improved on the baseline network.

关键词： semantic segmentation encoder-decoder feature balanced fusion Cityscapes

来源：评论

学校读者我要写书评

暂无评论

MUMC: Minimizing uncertainty of mixture of cues

引用

IMAGE AND VISION COMPUTING 2021年 115卷 104280-104280页

作者： Patro, Badri N. Kurmi, Vinod K. Kumar, Sandeep Namboodiri, Vinay P. Indian Inst Technol Kanpur Dept Elect Engn Kanpur Uttar Pradesh India Indian Inst Technol Kanpur Dept Comp Sci & Engn Kanpur Uttar Pradesh India Indian Inst Technol Kanpur Dept Math & Stat Kanpur Uttar Pradesh India

Generating natural questions from an image is a semantic task that requires using vision and language modalities to learn multimodal representations. Images can have multiple visual and language cues such as places, captions, and tags. In this paper, we propose a principled deep Bayesian learning framework that combines these cues to produce natural questions. We observe that with the addition of more cues and by minimizing uncertainty in the among cues, the Bayesian network becomes more confident. We propose a Minimizing Uncertainty of Mixture of Cues (MUMC), that minimizes uncertainty present in a mixture of cues experts for generating probabilistic questions. This is a Bayesian framework and the results show a remarkable similarity to natural questions as validated by a human study. Ablation studies of our model indicate that a subset of cues is inferior at this task and hence the principled fusion of cues is preferred. Further, we observe that the proposed approach substantially improves over state-of-the-art benchmarks on the quantitative metrics (BLEU-n, METEOR, ROUGE, and CIDEr). Here, we provide project link for Deep Bayesian VQG: https://***/BVQG/. (c) 2021 Elsevier B.V. All rights reserved.

关键词： Uncertainty estimation Mixture of cues Visual Question Answering Paraphrase Visual Question Generation LSTM CNN encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

A semi-supervised deep learning model for ship encounter situation classification

引用

OCEAN ENGINEERING 2021年 239卷 109824-109824页

作者： Chen, Xiang Liu, Yuanchang Achuthan, Kamalasudhan Zhang, Xinyu Chen, Jinhai UCL Dept Civil Environm & Geomat Engn Chadwick Bldg London WC1E 6BT England UCL Dept Mech Engn Torrington Pl London WC1E 7JE England Dalian Maritime Univ Key Lab Maritime Dynam Simulat & Control Minist Transportat Dalian 116026 Peoples R China Jimei Univ Nav Coll Nationallocal Joint Engn Res Ctr Marine Nav Aids Serv Xiamen 361021 Peoples R China

Maritime safety is an important issue for global shipping industries. Currently, most of collision accidents at sea are caused by the misjudgement of the ship's operators. The deployment of maritime autonomous surface ships (MASS) can greatly reduce ships' reliance on human operators by using an automated intelligent collision avoidance system to replace human decision-making. To successfully develop such a system, the capability of autonomously identifying other ships and evaluating their associated encountering situation is of paramount importance. In this paper, we aim to identify ships' encounter situation modes using deep learning methods based upon the Automatic Identification System (AIS) data. First, a segmentation process is developed to divide each ship's AIS data into different segments that contain only one encounter situation mode. This is different to the majority of studies that have proposed encounter situation mode classification using hand-crafted features, which may not reflect the actual ship's movement states. Furthermore, a number of present classification tasks are conducted using substantial labelled AIS data followed by a supervised training paradigm, which is not applicable to our dataset as it contains a large number of unlabelled AIS data. Therefore, a method called Semi Supervised Convolutional encoder-decoder Network (SCEDN) for ship encounter situation classification based on AIS data is proposed. The structure of the network is not only able to automatically extract features from AIS segments but also share training parameters for the unlabelled data. The SCEDN uses an encoder-decoder convolutional structure with four channels for each segment (distance, speed, Time to the Closed Point of Approach (TCPA) and Distance to the Closed Point of Approach (DCPA)) been developed. The performance of the SCEDN model are evaluated by comparing to several baselines with the experimental results demonstrating a higher accuracy can be achieved by o

关键词： Automatic Identification System (AIS) Semi-supervised learning Deep learning Convolutional neural network encoder-decoder Trajectory data Encounter situation classification

来源：评论

学校读者我要写书评

暂无评论

Stroke constrained attention network for online handwritten mathematical expression recognition

引用

PATTERN RECOGNITION 2021年 119卷 108047-108047页

作者： Wang, Jiaming Du, Jun Zhang, Jianshu Wang, Bin Ren, Bo Univ Sci & Technol China Natl Engn Lab Speech & Language Informat Proc Hefei Anhui Peoples R China Tencent Youtu Lab Shenzhen Peoples R China

In this paper, we propose a novel stroke constrained attention network (SCAN) which treats stroke as the basic unit for encoder-decoder based online handwritten mathematical expression recognition (HMER). Unlike previous methods which use trace points or image pixels as basic units, SCAN makes full use of stroke-level information for better alignment and representation. The proposed SCAN can be adopted in both single-modal (online or offline) and multi-modal HMER. For single-modal HMER, SCAN first employs a CNN-GRU encoder to extract point-level features from input traces in online mode and employs a CNN encoder to extract pixel-level features from input images in offline mode, then use stroke constrained information to convert them into online and offline stroke-level features. Using stroke-level features can explicitly group points or pixels belonging to the same stroke, therefore reduces the difficulty of symbol segmentation and recognition via the decoder with attention mechanism. For multi-modal HMER, other than fusing multi-modal information in decoder, SCAN can also fuse multi-modal information in encoder by utilizing the stroke based alignments between online and offline modalities. The encoder fusion is a better way for combining multi-modal information as it implements the information interaction one step before the decoder fusion so that the advantages of multiple modalities can be exploited earlier and more adequately. Besides, we propose an approach combining the encoder fusion and decoder fusion, namely encoder-decoder fusion, which can further improve the performance. Evaluated on a benchmark published by CROHME competition, the proposed SCAN achieves the state-of-the-art performance. Furthermore, by conducting experiments on an additional task: online handwritten Chinese character recognition (HCCR), we demonstrate the generality of our proposed method. (c) 2021 Elsevier Ltd. All rights reserved.

关键词： Stroke-level information Multi-modal fusion encoder-decoder Attention mechanism Handwritten mathematical expression recognition

来源：评论

学校读者我要写书评

暂无评论

FF-GAN: Feature Fusion GAN for Monocular Depth Estimation 3rd

FF-GAN: Feature Fusion GAN for Monocular Depth Estimation

引用

3rd Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

作者： Jia, Ruiming Li, Tong Yuan, Fei North China Univ Technol Sch Informat Sci & Technol Beijing Peoples R China Chinese Acad Sci Inst Automat Digital Content Technol & Media Serv Res Ctr Beijing Peoples R China

ISBN: (纸本)9783030606329;9783030606336

Since the results of CNN methods for monocular depth estimation generally suffer the problem of visual dissatisfaction, we propose Feature Fusion GAN (FF-GAN) to address this issue. First, an end-to-end network based on encoder-decoder structure is proposed as the generator of FF-GAN, which can exploit the information of different scales. The encoder of our generator fuse features in different levels with a feature fusion module. The component which can obtain the information of multi-scale receptive field is the main part of the decoder of our generator. Second, in order to match the generator, the discriminator of FF-GANis designed to efficiently learn the information of different scales by applying pyramid structure. Experiments on public datasets demonstrate the effectiveness of our generator and discriminator. Compared with the CNN methods, the results predicted by FF-GAN are significantly improved in terms of texture loss and edge blur while ensuring accuracy, and the visual effect is better.

关键词： Conditional Generative Adversarial Network encoder-decoder Monocular depth estimation Receptive field

来源：评论

学校读者我要写书评

暂无评论

A Hybrid Text Classification and Language Generation Model for Automated Summarization of Dutch Breast Cancer Radiology Reports 2

A Hybrid Text Classification and Language Generation Model f...

引用

2nd IEEE International Conference on Cognitive Machine Intelligence (IEEE CogMI)

作者： Nguyen, Elisa Theodorakopoulos, Daphne Pathak, Shreyasi Geerdink, Jeroen Vijlbrief, Onno van Keulen, Maurice Seifert, Christin Univ Twente Fac Elect Engn Math & Comp Sci Enschede Netherlands Hosp Grp Twente Hengelo Netherlands Univ Duisburg Essen Fac Med Essen Germany

ISBN: (纸本)9781728141442

Breast cancer diagnosis is based on radiology reports describing observations made from medical imagery, such as X-rays obtained during mammography. The reports are written by radiologists and contain a conclusion summarizing the observations. Manually summarizing the reports is time-consuming and leads to high text variability. This paper investigates the automated summarization of Dutch radiology reports. We propose a hybrid model consisting of a language model (encoder-decoder with attention) and a separate BI-RADS score classifier. The summarization model achieved a ROUGE-L F1 score of 51.5% on the Dutch reports, which is comparable to results in other languages and other domains. For the BI-RADS classification, the language model (accuracy 79.1%) was outperformed by a separate classifier (accuracy 83.3%), leading us to propose a hybrid approach for radiology report summarization. Our qualitative evaluation with experts found the generated conclusions to be comprehensible and to cover mostly relevant content, and the main focus for improvement should be their factual correctness. While the current model is not accurate enough to be employed in clinical practice, our results indicate that hybrid models might be a worthwhile direction for future research.

关键词： Abstractive Summarization Radiology Reports Breast Cancer Deep Learning encoder-decoder Attention Mechanism

来源：评论

学校读者我要写书评

暂无评论

Fast pixel-matching for video object segmentation

引用

SIGNAL PROCESSING-IMAGE COMMUNICATION 2021年 98卷 116373-116373页

作者： Yu, Siyue Xiao, Jimin Zhang, Bingfeng Lim, Eng Gee Zhao, Yao Xian Jiaotong Liverpool Univ Suzhou Jiangsu Peoples R China Beijing Jiaotong Univ Beijing Peoples R China

Video object segmentation, aiming to segment the foreground objects given the annotation of the first frame, has been attracting increasing attentions. Many state-of-the-art approaches have achieved great performance by relying on online model updating or mask-propagation techniques. However, most online models require high computational cost due to model fine-tuning during inference. Most mask-propagation based models are faster but with relatively low performance due to failure to adapt to object appearance variation. In this paper, we are aiming to design a new model to make a good balance between speed and performance. We propose a model, called NPMCA-net, which directly localizes foreground objects based on mask-propagation and non-local technique by matching pixels in reference and target frames. Since we bring in information of both first and previous frames, our network is robust to large object appearance variation, and can better adapt to occlusions. Extensive experiments show that our approach can achieve a new state-of-the-art performance with a fast speed at the same time (86.5% IoU on DAVIS-2016 and 72.2% IoU on DAVIS-2017, with speed of 0.11s per frame) under the same level comparison. Source code is available at https://***/siyueyu/NPMCA-net.

关键词： Non-local pixel matching Mask-propagation encoder-decoder

来源：评论

学校读者我要写书评

暂无评论

Image captioning in Hindi language using transformer networks

引用

COMPUTERS & ELECTRICAL ENGINEERING 2021年 92卷 107114-107114页

作者： Mishra, Santosh Kumar Dhir, Rijul Saha, Sriparna Bhattacharyya, Pushpak Singh, Amit Kumar Indian Inst Technol Patna Dept Comp Sci & Engn Patna Bihar India Natl Inst Technol Patna Dept Comp Sci & Engn Patna Bihar India

Neural encoder-decoder architectures have been used extensively for image captioning. Con-volutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) are popularly used in encoder and decoder models. Recurrent Neural Networks are popular architectures in natural language processing used for language modeling, but they are sequential in nature. The transformer model can solve this problem of sequential dependency by using an attention mechanism. Many works are available for image captioning in the English language, but models for generating Hindi captions are limited;hence, we have tried to fill this gap. We have created the Hindi dataset for image captioning by manually translating the popular MSCOCO dataset from English to Hindi. Experimental results show that our proposed model outperforms other models. The proposed model has attained the BLEU-1 score of 62.9, BLEU-2 score of 43.3, BLEU-3 score of 29.1, and BLEU4 score of 19.0.

关键词： Image captioning Transformer model Deep learning encoder-decoder Hindi language

来源：评论

学校读者我要写书评

暂无评论

Generating Instructive Questions from Multiple Articles to Guide Reading in E-Bibliotherapy

引用

SENSORS 2021年第9期21卷 3223页

作者： Xin, Yunxing Cao, Lei Wang, Xin He, Xiaohao Feng, Ling Tsinghua Univ Res Inst Data Sci Dept Comp Sci & Technol Ctr Computat Mental Healthcare Beijing 100084 Peoples R China

E-Bibliotherapy deals with adolescent psychological stress by manually or automatically recommending multiple reading articles around their stressful events, using electronic devices as a medium. To make E-Bibliotherapy really useful, generating instructive questions before their reading is an important step. Such a question shall (a) attract teens' attention;(b) convey the essential message of the reading materials so as to improve teens' active comprehension;and most importantly (c) highlight teens' stress to enable them to generate emotional resonance and thus willingness to pursue the reading. Therefore in this paper, we propose to generate instructive questions from the multiple recommended articles to guide teens to read. Four solutions based on the neural encoder-decoder model are presented to tackle the task. For model training and testing, we construct a novel large-scale QA dataset named TeenQA, which is specific to adolescent stress. Due to the extensibility of question expressions, we incorporate three groups of automatic evaluation metrics as well as one group of human evaluation metrics to examine the quality of the generated questions. The experimental results show that the proposed encoder-decoder with Summary on Contexts with Feature-rich embeddings (ED-SoCF) solution can generate good questions for guiding reading, achieving comparable performance on some semantic similarity metrics with that of humans.

关键词： E-bibliotherapy instructive question reading guidance encoder-decoder dataset

来源：评论

学校读者我要写书评

暂无评论

Getting More Data for Low-resource Morphological Inflection: Language Models and Data Augmentation 12

Getting More Data for Low-resource Morphological Inflection:...

引用

12th International Conference on Language Resources and Evaluation (LREC)

作者： Sorokin, Alexey Moscow MV Lomonosov State Univ Moscow Inst Phys & Technol Fac Math & Mech Leninskie GoryGSP 1 Moscow Russia

ISBN: (纸本)9791095546344

We investigate the effect of data augmentation on low-resource morphological segmentation. We compare two settings: the pure low-resource one, when only 100 annotated word forms are available, and the augmented one, where we use the original training set and 1000 unlabeled word forms to generate 1000 artificial inflected forms. Evaluating on Sigmorphon 2018 dataset, we observe that using the best among these two models reduces the error rate of state-of-the-art model by 6%, while for our baseline model the error reduction is 17%

关键词： inflection encoder-decoder abstract paradigms language models data augmentation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：