咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Dual-domain deformable feature... 收藏

Dual-domain deformable feature fusion for multi-modal 3D object detection

作     者:Wang, Shihao Deng, Tao 

作者机构:Chongqing Jiaotong Univ Sch Mechatron & Vehicle Engn Chongqing Peoples R China Chongqing Jiaotong Univ Sch Aeronaut Chongqing Peoples R China Chongqing Key Lab Green Aviat Energy & Power Chongqing Peoples R China Chongqing Jiaotong Univ Green Aerotech Res Chongqing Peoples R China 

出 版 物:《JOURNAL OF ELECTRONIC IMAGING》 (J. Electron. Imaging)

年 卷 期:2024年第33卷第6期

核心收录:

学科分类:0808[工学-电气工程] 1002[医学-临床医学] 0809[工学-电子科学与技术(可授工学、理学学位)] 08[工学] 0702[理学-物理学] 

基  金:National Natural Science Foundation of China Major Project of Science and Technology Research Program of Chongqing Education Commission of China [KJZD-M202400703] Natural Science Ranking Projects of Chongqing Jiaotong University [XJ2023000701] Team Building Project for Graduate Tutors in Chongqing [JDDSTD2022007] Joint Training Base Construction Project for Graduate Students in Chongqing [JDLHPYJD2022001, JDLHPYJD2023002] 

主  题:3D object detection multi-modal fusion feature alignment deformable attention autonomous driving 

摘      要:Recent advancements in 3D object detection using light detection and ranging (LiDAR)-camera fusion have enhanced autonomous driving perception. However, aligning LiDAR and image data during multimodal fusion remains a significant challenge. We propose a novel multi-modal feature alignment and fusion architecture to effectively align and fuse voxel and image data. The proposed architecture comprises four key modules. Z-axis attention aggregates voxel features along the vertical axis using self-attention. Voxel-domain deformable encoder improves context understanding with deformable attention to encode voxel features. Dual-domain deformable feature alignment uses deformable attention to adaptively align voxel and image features, addressing resolution mismatches. Finally, gated fusion utilizes a gating mechanism to dynamically fuse aligned features. The multi-layer design further enhances feature detail retention and improves dual-domain fusion performance. Experimental results show our method increases average precision by 2.41% at the hard difficulty level for cars on the KITTI test set. On the KITTI validation set, mean average precision improves by 1.06% for cars, 6.88% for pedestrians, and 1.83% for cyclists. (c) 2024 SPIE and IS&T

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分