版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Nanjing Univ Posts & Telecommun Natl Engn Res Ctr Commun & Networking Nanjing 21003 Peoples R China Nanjing Univ Posts & Telecommun Dept Internet Things Nanjing 21003 Peoples R China Temple Univ Dept Comp & Informat Sci Philadelphia PA 19122 USA
出 版 物:《APPLIED SOFT COMPUTING》 (应用软计算)
年 卷 期:2020年第96卷
页 面:106682-106682页
核心收录:
学科分类:08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:National Natural Science Foundation of China [61876093, 61801242, 61671253] National Natural Science Foundation of Jiangsu Province [BK20181393] National Science Foundation [IIS-1302164] China Scholarship Council
主 题:Robot vision Self-driving Real-time semantic segmentation Convolutional neural networks Encoder-decoder networks
摘 要:The extensive computational burden limits the usage of convolutional neural networks (CNNs) in edge devices for image semantic segmentation, which plays a significant role in many real-world applications, such as augmented reality, robotics, and self-driving. To address this problem, this paper presents an attention-guided lightweight network, namely AGLNet, which employs an encoder-decoder architecture for real-time semantic segmentation. Specifically, the encoder adopts a novel residual module to abstract feature representations, where two new operations, channel split and shuffle, are utilized to greatly reduce computation cost while maintaining higher segmentation accuracy. On the other hand, instead of using complicated dilated convolution and artificially designed architecture, two types of attention mechanism are subsequently employed in the decoder to upsample features to match input resolution. Specifically, a factorized attention pyramid module (FAPM) is used to explore hierarchical spatial attention from high-level output, still remaining fewer model parameters. To delineate object shapes and boundaries, a global attention upsample module (GAUM) is adopted as global guidance for high-level features. The comprehensive experiments demonstrate that our approach achieves state-of-the-art results in terms of speed and accuracy on three self-driving datasets: CityScapes, CamVid, and Mapillary Vistas. AGLNet achieves 71.3%, 69.4%, and 30.7% mean IoU on these datasets with only 1.12M model parameters. Our method also achieves 52 FPS, 90 FPS, and 53 FPS inference speed, respectively, using a single GTX 1080Ti GPU. Our code is open-source and available at https://***/xiaoyufenfei/Efficient-Segmentation-Networks. (C) 2020 Elsevier B.V. All rights reserved.