咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >PAVLM: Advancing Point Cloud b... 收藏
arXiv

PAVLM: Advancing Point Cloud based Affordance Understanding Via Vision-Language Model

作     者:Liu, Shang-Ching Tran, Van Nhiem Chen, Wenkai Cheng, Wei-Lun Huang, Yen-Lin Liao, I-Bin Li, Yung-Hui Zhang, Jianwei 

作者机构: Department of Informatics University of Hamburg Germany  Taiwan Department of Electrical Engineering National Taiwan University Taiwan Department of Computer Science and Technology National Tsinghua University Taiwan 

出 版 物:《arXiv》 (arXiv)

年 卷 期:2024年

核心收录:

主  题:Human robot interaction 

摘      要:Affordance understanding, the task of identifying actionable regions on 3D objects, plays a vital role in allowing robotic systems to engage with and operate within the physical world. Although Visual Language Models (VLMs) have excelled in high-level reasoning and long-horizon planning for robotic manipulation, they still fall short in grasping the nuanced physical properties required for effective human-robot interaction. In this paper, we introduce PAVLM (Point cloud Affordance Vision-Language Model), an innovative framework that utilizes the extensive multimodal knowledge embedded in pre-trained language models to enhance 3D affordance understanding of point cloud. PAVLM integrates a geometric-guided propagation module with hidden embeddings from large language models (LLMs) to enrich visual semantics. On the language side, we prompt Llama-3.1 models to generate refined context-aware text, augmenting the instructional input with deeper semantic cues. Experimental results on the 3D-AffordanceNet benchmark demonstrate that PAVLM outperforms baseline methods for both full and partial point clouds, particularly excelling in its generalization to novel open-world affordance tasks of 3D objects. For more information, visit our project site: ***. © 2024, CC BY-NC-ND.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分