文献详情 >Enhancing realism in LiDAR sce... 收藏

Enhancing realism in LiDAR scene generation with CSPA-DFN and linear cross-attention via Diffusion Transformer model

作者：Ye, Shaoxun Di, Xiaoguang Liao, Ming Li, Ximing

作者机构：Control and Simulation Center Harbin Institute of Technology Harbin150080 China National Key Laboratory of Modeling and Simulation for Complex Systems Harbin150080 China

出版物：《Neural Networks》 (Neural Netw.)

年卷期：2025年第189卷

页面：107503页

学科分类：1002[医学-临床医学] 0803[工学-光学工程] 10[医学]

基　　金：This work was partially supported by the Aeronautical Science Foundation of China [grant number No. 2022Z071077002] the Natural Science Foundation of Heilongjiang Province of China [grant number No. LH2021F026] the Fundamental Research Funds for Central Universities [grant number No. HIT.NSRIF202243]

主　　题：Photointerpretation

摘要：Point cloud diffusion models have found extensive applications in autonomous driving and robotics. However, there is still a big gap between their generated LiDAR scene samples and real-world data in terms of visual quality. This discrepancy primarily arises from the loss of detailed information during the decoding process from latent space and the lack of guidance from global 3D structural information in the point cloud generation process, leading to distortions and artifacts in LiDAR scene samples. In this paper, we propose a novel LiDAR Diffusion Transformer Model that integrates Channel-Spatial Parallel Attention and Dilation Fusion Network (CSPA-DFN) with a linear cross-attention post-processing module to refine the generated LiDAR scene samples. Specifically, CSPA-DFN is designed to simultaneously emphasize detailed features across different channels and spatial locations in parallel, leveraging multi-scale dilated convolutions and channel grouping to preserve and enhance these detailed features. In order to provide global 3D structural information and balance performance and efficiency, we design a post-processing module that fuses voxelized features and range images using a linear ReLU cross-attention mechanism. Our approach is evaluated on the unconditional generation task using the KITTI-360 and nuScenes datasets, achieving the state-of-the-art results in LiDAR scene s generation quality. Furthermore, by incorporating semantic labels and camera views into the latent space, in addition to enhancing the model s semantic understanding capability for LiDAR scenes, our method also demonstrates additional performance improvements compared to previous works in terms of LiDAR scene s visual quality. The code implementation has been released on https://***/HITysx/LiDAR-Scene-Generation. © 2025 Elsevier Ltd

本地馆藏 | 借阅须知 | 我要预约

已订购，未入库

sda

目录详情 | 试阅读 |

读者评论与其他读者分享你的观点

学校读者

用户名:未登录

我的评分

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Enhancing realism in LiDAR scene generation with CSPA-DFN and linear cross-attention via Diffusion Transformer model

读者评论与其他读者分享你的观点

请选择收藏分类：

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

看过本文的还看了

相关文献

该作者的其他文献

CADAL相关文献

Enhancing realism in LiDAR scene generation with CSPA-DFN and linear cross-attention via Diffusion Transformer model

读者评论 与其他读者分享你的观点

请选择收藏分类： 新增自定义分类 确定 取消

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

读者评论与其他读者分享你的观点

请选择收藏分类：