咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >ContrastAlign: Toward Robust B... 收藏
arXiv

ContrastAlign: Toward Robust BEV Feature Alignment via Contrastive Learning for Multi-Modal 3D Object Detection

作     者:Song, Ziying Jia, Feiyang Pan, Hongyu Luo, Yadan Jia, Caiyan Zhang, Guoxin Liu, Lin Ji, Yang Yang, Lei Wang, Li 

作者机构:School of Computer Science and Technology Beijing Jiaotong University China Beijing Key Lab of Traffic Data Analysis and Mining China Horizon Robotics The University of Queensland Australia Hebei University of Science and Technology China Tsinghua University China Beijing Institute of Technology China 

出 版 物:《arXiv》 (arXiv)

年 卷 期:2024年

核心收录:

主  题:Cameras 

摘      要:In the field of 3D object detection tasks, fusing heterogeneous features from LiDAR and camera sensors into a unified Bird s Eye View (BEV) representation is a widely adopted paradigm. However, existing methods are often compromised by imprecise sensor calibration, resulting in feature misalignment in LiDAR-camera BEV fusion. Moreover, such inaccuracies result in errors in depth estimation for the camera branch, ultimately causing misalignment between LiDAR and camera BEV features. In this work, we propose a novel ContrastAlign approach that utilizes contrastive learning to enhance the alignment of heterogeneous modalities, thereby improving the robustness of the fusion process. Specifically, our approach includes the L-Instance module, which directly outputs LiDAR instance features within LiDAR BEV features. Then, we introduce the C-Instance module, which predicts camera instance features through RoI (Region of Interest) pooling on the camera BEV features. We propose the InstanceFusion module, which utilizes contrastive learning to generate similar instance features across heterogeneous modalities. We then use graph matching to calculate the similarity between the neighboring camera instance features and the similarity instance features to complete the alignment of instance features. Our method achieves state-of-the-art performance, with an mAP of 70.3%, surpassing BEVFusion by 1.8% on the nuScenes validation set. Importantly, our method outperforms BEVFusion by 7.3% under conditions with misalignment noise. Copyright © 2024, The Authors. All rights reserved.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分