检索结果-内蒙古大学图书馆

CFN: A coarse-to-fine network for eye fixation prediction

IET IMAGE PROCESSING 2022年第9期16卷 2373-2383页

作者： Xu, Binwei Liang, Haoran Liang, Ronghua Chen, Peng Zhejiang Univ Technol Coll Comp Sci & Technol Hangzhou 31023 Peoples R China

Many image-to-image computer vision approaches have made great progress by an end-to-end framework with the encoder-decoder architecture. However, the same image-to-image eye fixation prediction task is not the same as those computer vision tasks in that it focuses more on salient regions rather than precise predictions for every pixel. Thus, it is not appropriate to directly apply the end-to-end encoder-decoder to the eye fixation prediction task. In addition, although high-level feature is important, the contribution of low-level feature should also be kept and balanced in computational model. Nevertheless, some low-level features that attract attention are easily neglected while transiting through the deep network. Therefore, the effective way to integrate low-level and high-level features for improving eye fixation prediction performance is still a challenging task. In this paper, a coarse-to-fine network (CFN) that encompasses two pathways with different training strategies are proposed: coarse perceiving network (CFN-Coarse) can be a simple encoder network or any of the existing pretrained network to capture the distribution of salient regions and generate high-quality feature maps;fine integrating network (CFN-Fine) uses fixed parameters from the CFN-Coarse and combines features from deep to shallow in the deconvolution process by adding skip connections between down-sampling and up-sampling paths to efficiently integrate deep and shallow features. The saliency map obtained by the method is evaluated over 6 standard benchmark datasets, namely SALICON, MIT1003, MIT300, Toronto, OSIE, and SUN500. The results demonstrate that the method can surpass the state-of-the-art accuracy of eye fixation prediction and achieves the competitive performance to date under most evaluation metrics on SALICON Saliency Prediction Challenge (LSUN2017).

关键词： OSIE dataset saliency map SALICON Saliency Prediction Challenge low-level feature CFN-Fine image-to-image eye fixation prediction task SALICON dataset deconvolution image-to-image computer vision convolutional neural nets MIT300 dataset high-quality feature map generation MIT1003 dataset salient region distribution image sampling feature extraction CFN-Coarse deep learning (artificial intelligence) image segmentation SUN500 dataset computer vision coarse perceiving network Toronto dataset high-level feature down-sampling path up-sampling path deconvolution process coarse-to-fine network encoder-decoder architecture deep network

来源：评论

学校读者我要写书评

暂无评论

A Mono SLAM Method Based on Depth Estimation by DenseNet-CNN

引用

IEEE SENSORS JOURNAL 2022年第3期22卷 2447-2455页

作者： Jin, Yifan Yu, Lei Chen, Zhong Fei, Shumin Huaiyin Inst Technol Jiangsu Key Lab Adv Mfg Technol Huaian 223003 Peoples R China Soochow Univ Sch Mech & Elect Engn Suzhou 215000 Peoples R China Southeast Univ Sch Automat Nanjing 210000 Peoples R China

Currently, SLAM (simultaneous localization and mapping) systems based on monocular cameras cannot directly obtain depth information, and most of them have problems with scale uncertainty and need to be initialized. In some application scenarios that require navigation and obstacle avoidance, the inability to achieve dense mapping is also a defect of monocular SLAM. In response to the above problems, this paper proposes a method which learns depth estimation by DenseNet and CNN for a monocular SLAM system. We use an encoder-decoder architecture based on transfer learning and convolutional neural networks to estimate the depth information of monocular RGB images. At the same time, through the front-end ORB feature extraction and the back-end direct RGB-D Bundle Adjustment optimization method, it is possible to obtain accurate camera poses and achieve dense indoor mapping when using estimated depth information. The experimental results show that the monocular depth estimation model used in this paper can achieve good results, and it is also competitive in comparison with the current popular methods. On this basis, the error of camera pose estimation is also smaller than traditional monocular SLAM solutions and can complete the dense indoor reconstruction task. It is a complete SLAM system based on monocular camera.

关键词： Simultaneous localization and mapping Estimation Cameras Convolutional neural networks Sensors Image reconstruction Feature extraction Monocular depth estimation encoder-decoder architecture transfer learning camera pose estimation dense mapping

来源：评论

学校读者我要写书评

暂无评论

LiCENt: Low-Light Image Enhancement Using the Light Channel of HSL

引用

IEEE ACCESS 2022年 10卷 33547-33560页

作者： Garg, Atik Pan, Xin-Wen Dung, Lan-Rong Natl Yang Ming Chiao Tung Univ EECS Int Grad Program Hsinchu 30010 Taiwan Natl Yang Ming Chiao Tung Univ Dept Elect & Comp Engn Hsinchu 30010 Taiwan

Images captured in low-brightness environments often lead to poor visibility and exhibit artifacts such as low brightness, low contrast, and color distortion. These artifacts not only affect the visual perception of the human eye but also decrease the performance of computer vision algorithms. Existing deep learning-based image enhancements studies are quite slow and usually require extensive hardware specifications. Conversely, lightweight enhancement approaches do not provide satisfactory performance as compared to state-of-the-art methods. Therefore, we proposed a fast and lightweight deep learning-based algorithm for performing low-light image enhancement using the light channel of Hue Saturation Lightness (HSL). LiCENt stands for Light Channel Enhancement Network that uses a combination of an autoencoder and convolutional neural network (CNN) to train a low-light enhancer to first improve the illumination and later improve the details of the low-light image in a unified framework. This method used a single channel lightness 'L' of HSL color space instead of traditional RGB color channels which helps in reducing the number of learnable parameters by a factor of 8.92, at the most. LiCENt also has significant advantages for the Brilliance Perception Adjustment, which enables the model to avoid issues including over-enhancement and color distortion. The experimental results demonstrate that our approach generalizes well in synthetic and natural low-light images and outperforms other methods in terms of qualitative and quantitative metrics.

关键词： Image color analysis Lighting Image enhancement Visualization Training Generative adversarial networks Convolutional neural networks Deep learning image processing low-light enhancement encoder-decoder architecture image enhancement computer vision convolutional neural network

来源：评论

学校读者我要写书评

暂无评论

LARFNet: Lightweight asymmetric refining fusion network for real-time semantic segmentation

引用

COMPUTERS & GRAPHICS-UK 2022年 109卷 55-64页

作者： Hu, Xuegang Gong, Juelin Chongqing Univ Posts & Telecommun Sch Commun & Informat Engn Chongqing 400065 Peoples R China Chongqing Univ Posts & Telecommun Chongqing Key Lab Signal & Informat Proc Chongqing 400065 Peoples R China

In this paper, we propose a lightweight asymmetric refining fusion network (LARFNet) for real-time semantic segmentation to solve the problem that some existing models cannot achieve good segmentation accuracy with real-time inference speed in mobile devices due to the huge computational overhead. Specifically, LARFNet adopts an asymmetric encoder-decoder structure. The depth-wise separable asymmetric interaction module (DSAI module) is designed in the encoding process, which effectively extracted local and surrounding information under different receptive fields with optimized convolution in the condition of ensuring communication between channels. In the decoder, we design the bilateral pyramid pooling attention module (BPPA module) and the multi-stage refinement fusion module (MRF Module). The BPPA module is used to integrate the high-level output multi-scale context information. Based on spatial and channel attention mechanisms, the MRF module is proposed to refine the feature maps of different resolutions and guide the feature fusion. Experimental results show that LARFNet achieves 69.2% mIoU and 65.6% mIoU on Cityscapes and Camvid datasets at 127 FPS and 222 FPS respectively, only using a single NVIDIA GeForce GTX2080Ti GPU and 0.72M parameters without any pre-training or pre-processing. Compared with most of the existing state-of-the-art models, the proposed method realizes the efficient use of network parameters at a faster speed, reduces the number of network parameters, and still achieves the accuracy of good segmentation.(c) 2022 Elsevier Ltd. All rights reserved.

关键词： Attention mechanism Lightweight network encoder-decoder architecture Real-time semantic segmentation

来源：评论

学校读者我要写书评

暂无评论

Streamflow modelling and forecasting for Canadian watersheds using LSTM networks with attention mechanism

引用

NEURAL COMPUTING & APPLICATIONS 2022年第22期34卷 19995-20015页

作者： Girihagama, Lakshika Khaliq, Muhammad Naveed Lamontagne, Philippe Perdikaris, John Roy, Rene Sushama, Laxmi Elshorbagy, Amin Natl Res Council Canada Ottawa ON Canada Ontario Power Generat Niagara Falls ON Canada Hydro Meteo Notre Dame Des Prairies PQ Canada McGill Univ Montreal PQ Canada Univ Saskatchewan Saskatoon SK Canada

This study investigates the capability of sequence-to-sequence machine learning (ML) architectures in an effort to develop streamflow forecasting tools for Canadian watersheds. Such tools are useful to inform local and region-specific water management and flood forecasting related activities. Two powerful deep-learning variants of the Recurrent Neural Network were investigated, namely the standard and attention-based encoder-decoder long short-term memory (LSTM) models. Both models were forced with past hydro-meteorological states and daily meteorological data with a look-back time window of several days. These models were tested for 10 different watersheds from the Ottawa River watershed, located within the Great Lakes Saint-Lawrence region of Canada, an economic powerhouse of the country. The results of training and testing phases suggest that both models are able to simulate overall hydrograph patterns well when compared to observational records. Between the two models, the attention model significantly outperforms the standard model in all watersheds, suggesting the importance and usefulness of the attention mechanism in ML architectures, not well explored for hydrological applications. The mean performance accuracy of the attention model on unseen data, when assessed in terms of mean Nash-Sutcliffe Efficiency and Kling-Gupta Efficiency is, respectively, found to be 0.985 and 0.954 for these watersheds. Streamflow forecasts with lead times of up to 5 days with the attention model demonstrate overall skillful performance with well above the benchmark accuracy of 70%. The results of the study suggest that the encoder-decoder LSTM, with attention mechanism, is a powerful modelling choice for developing streamflow forecasting systems for Canadian watersheds.

关键词： Streamflow forecasting LSTM encoder-decoder architecture Attention-based models Deep learning

来源：评论

学校读者我要写书评

暂无评论

MSCFNet: A Lightweight Network With Multi-Scale Context Fusion for Real-Time Semantic Segmentation

引用

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 2022年第12期23卷 25489-25499页

作者： Gao, Guangwei Xu, Guoan Yu, Yi Xie, Jin Yang, Jian Yue, Dong Nanjing Univ Posts & Telecommun Inst Adv Technol Nanjing 210023 Peoples R China Natl Inst Informat Digital Content & Media Sci Res Div Tokyo 1018430 Japan Soochow Univ Prov Key Lab Comp Informat Proc Technol Suzhou 215006 Peoples R China Nanjing Univ Posts & Telecommun Coll Automat Nanjing 210023 Peoples R China Nanjing Univ Posts & Telecommun Coll Artificial Intelligence Nanjing 210023 Peoples R China Nanjing Univ Sci & Technol Sch Comp Sci & Technol Suzhou 210094 Peoples R China Nanjing Univ Posts & Telecommun Coll Automat Inst Adv Technol Nanjing 210023 Peoples R China Nanjing Univ Posts & Telecommun Coll Artificial Intelligence Inst Adv Technol Nanjing 210023 Peoples R China

In recent years, how to strike a good trade-off between accuracy, inference speed, and model size has become the core issue for real-time semantic segmentation applications, which plays a vital role in real-world scenarios such as autonomous driving systems and drones. In this study, we devise a novel lightweight network using a multi-scale context fusion (MSCFNet) scheme, which explores an asymmetric encoder-decoder architecture to alleviate these problems. More specifically, the encoder adopts some developed efficient asymmetric residual (EAR) modules, which are composed of factorization depth-wise convolution and dilation convolution. Meanwhile, instead of complicated computation, simple deconvolution is applied in the decoder to further reduce the amount of parameters while still maintaining the high segmentation accuracy. Also, MSCFNet has branches with efficient attention modules from different stages of the network to well capture multi-scale contextual information. Then we combine them before the final classification to enhance the expression of the features and improve the segmentation efficiency. Comprehensive experiments on challenging datasets have demonstrated that the proposed MSCFNet, which contains only 1.15M parameters, achieves 71.9% Mean IoU on the Cityscapes testing dataset and can run at over 50 FPS on a single Titan XP GPU configuration.

关键词： Convolution Semantics Ear Real-time systems Image segmentation Feature extraction Task analysis Real-time semantic segmentation lightweight network encoder-decoder architecture context fusion

来源：评论

学校读者我要写书评

暂无评论

GPCNet: global-context pyramidal and class-balanced network for high-resolution SAR remote sensing image classification

引用

JOURNAL OF APPLIED REMOTE SENSING 2022年第3期16卷

作者： Ni, Kang Yuan, Chunyang Nanjing Univ Posts & Telecommun Sch Comp Sci Nanjing Peoples R China

The description of context information affected by speckle and class imbalance under labeled data makes the pixelwise classification for high-resolution (HR) synthetic aperture radar (SAR) image a challenging task. To address these issues, we propose a global-context pyramidal and class-balanced network (GPCNet) for HR SAR image classification. The proposed structure follows an encoder-decoder architecture. In the encoder module, the multiscale convolutional and global-local cross-channel attention (GCA) blocks are employed to capture the global-context and distinguishable deep feature statistics, while reducing the impacts of the random fluctuation in the homogeneous region. The channel information of different scale convolutional layers is efficiently learned by local cross-channel interaction in the GCA block. Besides, a sampled class-balanced loss, associating with the effective number, is utilized for alleviating the class imbalance of HR SAR images. The experiments carried out on a TerraSAR-X image classification dataset demonstrate GPCNet is able to yield superior performance when compared with other related networks. (C) 2022 Society of Photo-Optical Instrumentation Engineers (SPIE)

关键词： synthetic aperture radar pixelwise classification encoder-decoder architecture context information

来源：评论

学校读者我要写书评

暂无评论

Multi-task prediction model based on ConvLSTM and encoder-decoder

引用

INTELLIGENT DATA ANALYSIS 2021年第2期25卷 359-382页

作者： Luo, Tao Cao, Xudong Li, Jin Dong, Kun Zhang, Rui Wei, Xueliang China Univ Petr Coll Informat Sci & Engn Beijing 102200 Peoples R China

The energy load data in the micro-energy network are a time series with sequential and nonlinear characteristics. This paper proposes a model based on the encode-decode architecture and ConvLSTM for multi-scale prediction of multi-energy loads in the micro-energy network. We apply ConvLSTM, LSTM, attention mechanism and multi-task learning concepts to construct a model specifically for processing the energy load forecasting of the micro-energy network. In this paper, ConvLSTM is used to encode the input time series. The attention mechanism is used to assign different weights to the features, which are subsequently decoded by the decoder LSTM layer. Finally, the fully connected layer interprets the output. This model is applied to forecast the multi-energy load data of the micro-energy network in a certain area of Northwest China. The test results prove that our model is convergent, and the evaluation index value of the model is better than that of the multi-task FC-LSTM and the single-task FC-LSTM. In particular, the application of the attention mechanism makes the model converge faster and with higher precision.

关键词： encoder-decoder architecture CNN ConvLSMT LSTM deep learning attention mechanism multi-task learning multi-step prediction load forecasting micro-energy network multi-time scale

来源：评论

学校读者我要写书评

暂无评论

Lightweight and efficient asymmetric network design for real-time semantic segmentation

引用

APPLIED INTELLIGENCE 2022年第1期52卷 564-579页

作者： Zhang, Xiu-Ling Du, Bing-Ce Luo, Zhao-Ci Ma, Kai Yanshan Univ Qinhuangdao Minist Educ Intelligent Control Syst & Intelligen Engn Res Ctr Qinhuangdao Hebei Peoples R China Yanshan Univ Qinhuangdao Key Lab Ind Comp Control Engn Hebei Prov Qinhuangdao Hebei Peoples R China

With the increasing demand for application scenarios such as autonomous driving and drone aerial photography, it has become a challenging problem that how to achieve the best trade-off between segmentation accuracy and inference speed while reducing the number of parameters. In this paper, a lightweight and efficient asymmetric network (LEANet) for real-time semantic segmentation is proposed to address this problem. Specifically, LEANet adopts an asymmetric encoder-decoder architecture. In the encoder, a depth-wise asymmetric bottleneck module with separation and shuffling operations (SS-DAB module) is proposed to jointly extract local and context information. In the decoder, a pyramid pooling module based on channel-wise attention (CA-PP module) is proposed to aggregate multi-scale context information and guide feature selection. Without any pre-training and post-processing, LEANet respectively achieves the accuracy of 71.9% and 67.5% mean Intersection over Union (mIoU) with the speed of 77.3 and 98.6 Frames Per Second (FPS) on the Cityscapes and CamVid test sets. These experimental results show that LEANet achieves an optimal trade-off between segmentation accuracy and inference speed with only 0.74 million parameters.

关键词： Real-time Semantic segmentation Lightweight network Asymmetric network encoder-decoder architecture

来源：评论

学校读者我要写书评

暂无评论

A Comparison Between LSTM and Transformers for Image Captioning 1

引用

3rd International Conference on Digital Technologies and Applications

作者： Zouitni, Chaimae Sabri, My Abdelouahed Aarab, Abdellah Univ Sidi Mohamed Ben Abdellah Fac Sci Dhar El Mahraz Dept Comp Sci Fes Morocco

ISBN: (数字)9783031298608

ISBN: (纸本)9783031298592;9783031298608

Image captioning is the process of generating a textual description of images, which integrates both computer vision and natural language processing. Approaches based on encoder-decoder architectures have been recently proposed to solve image captioning problems. The main objective of this paper is to conduct a comparative study between the two most widely used approaches for natural language processing tasks, namely, LSTMs and Transformers. We used the Flickr8k dataset as input images. Regarding image feature extraction, we used the VGG16 model. To evaluate the obtained descriptions generated by the models, the BLEU score metric is used to measure the performance of both models. The latter were able to generate grammatically correct and expressive captions.

关键词： Image captioning encoder-decoder architecture VGG16 LSTM Transformers Attention mechanism Flickr 8k BLEU

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：