Real-time semantic segmentation holds extensive application prospects in autonomous driving and robot navigation. Recently, real-time semantic segmentationnetworks mainly adopt encoder-decoder architecture and multi-...
详细信息
Real-time semantic segmentation holds extensive application prospects in autonomous driving and robot navigation. Recently, real-time semantic segmentationnetworks mainly adopt encoder-decoder architecture and multi-branch architecture. However, both approaches have their own advantages and limitations. Encoder- decoder models are generally better at extracting contextual information, but may face challenges in capturing fine details and local spatial information. On the other hand, the multi-branch structure excels at capturing boundary and spatial detail information, but it requires an efficient and flexible feature fusion strategy to prevent information redundancy. To leverage the strengths of both approaches, we propose a parallel segmentation network (PaSeNet) which adopts the unsymmetrical encoder-decoder structure to introduce novel ideas for research and applications in real-time semantic segmentation. Specifically, we design a main branch with a spatial information enhancement path during the encoding phase and introduce mask autoencoder based on self-supervised learning as an auxiliary branch to supplement the main branch in extracting details as well as local spatial information. Additionally, we propose the Grouped Aggregation Pyramid Pooling Module to optimize the extraction of contextual information. In the decoding phase, we introduce the Coordinate-Attention- Guided Decoder to effectively integrate diverse information from different branches. A large number of experiments on the Cityscapes, Cambridge-driving Labeled Video database (CamVid), NightCity and instance segmentation in Aerial Images Dataset demonstrate that our method achieves competitive results. Specifically, PaSeNet-Base obtains 79.9% mean Intersection Over Union (mIOU) at 55.6 Frames Per Second (FPS) on Cityscapes test dataset and 80.2% mIOU at 96.8 FPS on CamVid test dataset.
暂无评论