咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Semantic Scene Completion via ... 收藏

Semantic Scene Completion via Semantic-Aware Guidance and Interactive Refinement Transformer

作     者:Xiao, Haihong Kang, Wenxiong Liu, Hao Li, Yuqiong He, Ying 

作者机构:South China Univ Technol Sch Automat Sci & Engn Guangzhou 510641 Peoples R China Pazhou Lab Guangzhou 510335 Peoples R China Nanyang Technol Univ Coll Comp & DataScience Singapore 639798 Singapore Chinese Acad Sci Inst Mech Key Lab Mech Fluid Solid Coupling Syst Beijing 100190 Peoples R China 

出 版 物:《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY》 (IEEE Trans Circuits Syst Video Technol)

年 卷 期:2025年第35卷第5期

页      面:4212-4225页

核心收录:

学科分类:0808[工学-电气工程] 08[工学] 

基  金:National Natural Science Foundation of China Ministry of Education, Singapore [MOE-T2EP20220-0005, RT19/22] International Science and Technology Cooperation Project of Guangzhou Economic and Technological Development District [2023GH16] Fundamental Research Funds for the Central Universities [2024ZYGXZR104] 

主  题:Semantics Proposals Feature extraction Three-dimensional displays Transformers Point cloud compression Image reconstruction Laser radar Circuits and systems Autonomous vehicles 3D vision semantic scene completion interactive refinement transformer 

摘      要:Predicting per-voxel occupancy status and corresponding semantic labels in 3D scenes is pivotal to 3D intelligent perception in autonomous driving. In this paper, we propose a novel semantic scene completion framework that can generate complete 3D volumetric semantics from a single image at a low cost. To the best of our knowledge, this is the first endeavor specifically aimed at mitigating the negative impacts of incorrect voxel query proposals caused by erroneous depth estimates and enhancing interactions for positive ones in camera-based semantic scene completion tasks. Specifically, we present a straightforward yet effective Semantic-aware Guided (SAG) module, which seamlessly integrates with task-related semantic priors to facilitate effective interactions between image features and voxel query proposals in a plug-and-play manner. Furthermore, we introduce a set of learnable object queries to better perceive objects within the scene. Building on this, we propose an Interactive Refinement Transformer (IRT) block, which iteratively updates voxel query proposals to enhance the perception of semantics and objects within the scene by leveraging the interaction between object queries and voxel queries through query-to-query cross-attention. Extensive experiments demonstrate that our method outperforms existing state-of-the-art approaches, achieving overall improvements of 0.30 and 2.74 in mIoU metric on the SemanticKITTI and SSCBench-KITTI-360 validation datasets, respectively, while also showing superior performance in the aspect of small object generation.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分