版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Chongqing Jiaotong Univ Sch Mechatron & Vehicle Engn Chongqing Peoples R China Chongqing Jiaotong Univ Sch Aeronaut Chongqing Peoples R China Chongqing Key Lab Green Aviat Energy & Power Chongqing Peoples R China Chongqing Jiaotong Univ Green Aerotech Res Chongqing Peoples R China
出 版 物:《JOURNAL OF ELECTRONIC IMAGING》 (J. Electron. Imaging)
年 卷 期:2024年第33卷第6期
核心收录:
学科分类:0808[工学-电气工程] 1002[医学-临床医学] 0809[工学-电子科学与技术(可授工学、理学学位)] 08[工学] 0702[理学-物理学]
基 金:National Natural Science Foundation of China Major Project of Science and Technology Research Program of Chongqing Education Commission of China [KJZD-M202400703] Natural Science Ranking Projects of Chongqing Jiaotong University [XJ2023000701] Team Building Project for Graduate Tutors in Chongqing [JDDSTD2022007] Joint Training Base Construction Project for Graduate Students in Chongqing [JDLHPYJD2022001, JDLHPYJD2023002]
主 题:3D object detection multi-modal fusion feature alignment deformable attention autonomous driving
摘 要:Recent advancements in 3D object detection using light detection and ranging (LiDAR)-camera fusion have enhanced autonomous driving perception. However, aligning LiDAR and image data during multimodal fusion remains a significant challenge. We propose a novel multi-modal feature alignment and fusion architecture to effectively align and fuse voxel and image data. The proposed architecture comprises four key modules. Z-axis attention aggregates voxel features along the vertical axis using self-attention. Voxel-domain deformable encoder improves context understanding with deformable attention to encode voxel features. Dual-domain deformable feature alignment uses deformable attention to adaptively align voxel and image features, addressing resolution mismatches. Finally, gated fusion utilizes a gating mechanism to dynamically fuse aligned features. The multi-layer design further enhances feature detail retention and improves dual-domain fusion performance. Experimental results show our method increases average precision by 2.41% at the hard difficulty level for cars on the KITTI test set. On the KITTI validation set, mean average precision improves by 1.06% for cars, 6.88% for pedestrians, and 1.83% for cyclists. (c) 2024 SPIE and IS&T