咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Enhancing realism in LiDAR sce... 收藏

Enhancing realism in LiDAR scene generation with CSPA-DFN and linear cross-attention via Diffusion Transformer model

作     者:Ye, Shaoxun Di, Xiaoguang Liao, Ming Li, Ximing 

作者机构:Control and Simulation Center Harbin Institute of Technology Harbin150080 China National Key Laboratory of Modeling and Simulation for Complex Systems Harbin150080 China 

出 版 物:《Neural Networks》 (Neural Netw.)

年 卷 期:2025年第189卷

页      面:107503页

学科分类:1002[医学-临床医学] 0803[工学-光学工程] 10[医学] 

基  金:This work was partially supported by the Aeronautical Science Foundation of China [grant number No. 2022Z071077002] the Natural Science Foundation of Heilongjiang Province of China [grant number No. LH2021F026] the Fundamental Research Funds for Central Universities [grant number No. HIT.NSRIF202243] 

主  题:Photointerpretation 

摘      要:Point cloud diffusion models have found extensive applications in autonomous driving and robotics. However, there is still a big gap between their generated LiDAR scene samples and real-world data in terms of visual quality. This discrepancy primarily arises from the loss of detailed information during the decoding process from latent space and the lack of guidance from global 3D structural information in the point cloud generation process, leading to distortions and artifacts in LiDAR scene samples. In this paper, we propose a novel LiDAR Diffusion Transformer Model that integrates Channel-Spatial Parallel Attention and Dilation Fusion Network (CSPA-DFN) with a linear cross-attention post-processing module to refine the generated LiDAR scene samples. Specifically, CSPA-DFN is designed to simultaneously emphasize detailed features across different channels and spatial locations in parallel, leveraging multi-scale dilated convolutions and channel grouping to preserve and enhance these detailed features. In order to provide global 3D structural information and balance performance and efficiency, we design a post-processing module that fuses voxelized features and range images using a linear ReLU cross-attention mechanism. Our approach is evaluated on the unconditional generation task using the KITTI-360 and nuScenes datasets, achieving the state-of-the-art results in LiDAR scene s generation quality. Furthermore, by incorporating semantic labels and camera views into the latent space, in addition to enhancing the model s semantic understanding capability for LiDAR scenes, our method also demonstrates additional performance improvements compared to previous works in terms of LiDAR scene s visual quality. The code implementation has been released on https://***/HITysx/LiDAR-Scene-Generation. © 2025 Elsevier Ltd

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分