版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Tianjin Key Laboratory for Control Theory & Applications in Complicated Systems and Intelligent Robot La-boratory, Tianjin University of Technology, 391 Binshui West Road Department of Electrical Engineering, Tshwane University of Technology
出 版 物:《Optoelectronics Letters》 (光电子快报(英文))
年 卷 期:2025年
学科分类:080904[工学-电磁场与微波技术] 0810[工学-信息与通信工程] 0809[工学-电子科学与技术(可授工学、理学学位)] 08[工学] 081105[工学-导航、制导与控制] 081001[工学-通信与信息系统] 081002[工学-信号与信息处理] 0825[工学-航空宇航科学与技术] 0811[工学-控制科学与工程]
基 金:supported by the National Natural Science Foundation of China (Grant No. 62103298) the South African National Research Foundation (Grant Nos. 132797 and 137951)
摘 要:Deep learning for point clouds faces the challenge of their inherent unordered nature, which makes traditional CNN-like methods not directly applicable. However, due to the inherent permutation invariance, transformers provide solutions to unordered points problems faced in LiDAR-based object detection. In this paper, a two-stage LiDAR 3D object detection framework is presented, namely Point-Voxel Dual Transformer (PV-DT3D), which is a transformer-based method. In the proposed PV-DT3D, point-voxel fusion features are used for proposal refinement. Specifically, in the PV-DT3D, keypoints are sampled from entire point cloud scene and used to encode representative scene features via a proposal-aware voxel set abstraction module. Subsequently, following the generation of proposals by the region proposal networks (RPN), the internal encoded keypoints are fed into dual transformer encoder-decoder architecture. In 3D object detection, the proposed PV-DT3D is the first to take advantage of both pointwise transformer and channel-wise architecture to capture contextual information from the perspective of spatial and channel dimensions. Experiments conducted on the highly competitive KITTI 3D car detection leaderboard show that, the PV-DT3D achieves superior detection accuracy among state-of-the-art point-voxel-based methods.