检索结果-内蒙古大学图书馆

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

作者： Wu, Xiaoyang Tian, Zhuotao Wen, Xin Peng, Bohao Liu, Xihui Yu, Kaicheng Zhao, Hengshuang Univ Hong Kong Hong Kong Peoples R China Chinese Univ Hong Kong Hong Kong Peoples R China Westlake Univ Hangzhou Peoples R China Alibaba Grp Hangzhou Peoples R China

ISBN: (纸本)9798350353006

The rapid advancement of deep learning models is often attributed to their ability to leverage massive training data. In contrast, such privilege has not yet fully benefited 3d deep learning, mainly due to the limited availability of large-scale 3d datasets. Merging multiple available data sources and letting them collaboratively train a single model is a potential solution. However, due to the large domain gap between 3d point cloud datasets, such mixed supervision could adversely affect the model's performance and lead to degenerated performance (i.e., negative transfer) compared to single-dataset training. In view of this challenge, we introduce Point Prompt Training (PPT), a novel framework for multi-dataset synergistic learning in the context of 3d representation learning that supports multiple pre-training paradigms. Based on this framework, we propose Prompt-driven Normalization, which adapts the model to different datasets with domain-specific prompts and Language-guided Categorical Alignment that decently unifies the multiple-dataset label spaces by leveraging the relationship between label text. Extensive experiments verify that PPT can overcome the negative transfer associated with synergistic learning and produce generalizable representations. Notably, it achieves state-of-the-art performance on each dataset using a single weight-shared model with supervised multi-dataset training. Moreover, when served as a pre-training framework, it outperforms other pre-training approaches regarding representation quality and attains remarkable state-of-the-art performance across over ten diverse downstream tasks spanning both indoor and outdoor 3d scenarios.

关键词： 3d Pre-training 3d representation learning 3d Semantic Segmentation

来源：评论

学校读者我要写书评

暂无评论

JM3d & JM3d-LLM: Elevating 3d representation With Joint Multi-Modal Cues

引用

IEEE TRANSACTIONS ON PATTERN ANALYSIS ANd MACHINE INTELLIGENCE 2025年第4期47卷 2475-2492页

作者： Ji, Jiayi Wang, Haowei Wu, Changli Ma, Yiwei Sun, Xiaoshuai Ji, Rongrong Xiamen Univ Key Lab Multimedia Trusted Percept & Efficient Com Minist Educ China Xiamen 361005 Peoples R China Natl Univ Singapore Singapore 119077 Singapore Tencent Youtu Lab Shanghai 200000 Peoples R China

The rising importance of 3d representation learning, pivotal in computer vision, autonomous driving, and robotics, is evident. However, a prevailing trend, which straightforwardly resorted to transferring 2d alignment strategies to the 3d domain, encounters three distinct challenges: (1) Information degradation: This arises from the alignment of 3d data with mere single-view 2d images and generic texts, neglecting the need for multi-view images and detailed subcategory texts. (2) Insufficient Synergy: These strategies align 3d representations to image and text features individually, hampering the overall optimization for 3d models. (3) Underutilization: The fine-grained information inherent in the learned representations is often not fully exploited, indicating a potential loss in detail. To address these issues, we introduce JM3d, a comprehensive approach integrating point cloud, text, and image. Key contributions include the Structured Multimodal Organizer (SMO), enriching vision-language representation with multiple views and hierarchical text, and the Joint Multi-modal Alignment (JMA), combining language understanding with visual representation. Our advanced model, JM3d-LLM, marries 3d representation with large language models via efficient fine-tuning. Evaluations on ModelNet40 and ScanObjectNN establish JM3d's superiority. The superior performance of JM3d-LLM further underscores the effectiveness of our representation transfer approach.

关键词： Three-dimensional displays Solid modeling Point cloud compression Visualization representation learning Feature extraction Large language models data models degradation Contrastive learning 3d representation learning joint multi-modal alignment large language model structured multimodal organizer

来源：评论

学校读者我要写书评

暂无评论

Balanced Class-Incremental 3d Object Classification and Retrieval

引用

IEEE TRANSACTIONS ON KNOWLEdGE ANd dATA ENGINEERING 2024年第1期36卷 35-48页

作者： Liu, An-An Lu, Haochun Zhou, Heyu Li, Tianbao Kankanhalli, Mohan Tianjin Univ Sch Elect & Informat Engn Tianjin 300072 Peoples R China Hefei Comprehens Natl Sci Ctr Inst Artificial Intelligence Hefei 230088 Anhui Peoples R China Natl Univ Singapore Sch Comp Singapore 117543 Singapore

Most existing 3d object classification and retrieval algorithms rely on one-off supervised learning on closed 3d object sets and tend to provide rigid convolutional neural networks with little scalability. Such limitations substantially restrict their potential to learn newly emerged 3d object classes continually in the real world. Aiming to go beyond these limitations, we innovatively propose two new and challenging tasks: class-incremental 3d object classification (CI-3dOC) and class-incremental 3d object retrieval (CI-3dOR), the key to which is class-incremental 3d representation learning. It expects the network to update continually to learn new 3d class representations without forgetting the previously learned ones. To this end, we design a novel balanced distillation network (BdNet) that uses a dual supervision mechanism to balance between consolidating old knowledge (stability) and adapting to new 3d object classes (plasticity) carefully. On the one hand, we employ stability-based supervision to retain the stable and discriminative information of old classes that greatly benefit both classification and retrieval tasks. On the other hand, we use plasticity-based supervision to improve the network's generalization for learning new class 3d representations by transferring knowledge from a temporary teacher network to the current model. By properly handling the relationship between the two modules, we achieve a surprising performance improvement. Furthermore, considering there is no available dataset for evaluation, we build two 3d datasets, INOR-1 and INOR-2, to evaluate these two new tasks. Extensive experimental results demonstrate that our method can significantly outperform other state-of-the-art class-incremental learning methods. Even if we store 500-1000 fewer 3d objects than SOTA methods, BdNet still achieves comparable performance.

关键词： 3d representation learning 3d object classification 3d object retrieval class-incremental learning

来源：评论

学校读者我要写书评

暂无评论

MULTIVIEW LONG-SHORT SPATIAL CONTRASTIVE learning FOR 3d MEdICAL IMAGE ANALYSIS 47

MULTIVIEW LONG-SHORT SPATIAL CONTRASTIVE LEARNING FOR 3D MED...

引用

47th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

作者： Cao, Gongpeng Wang, Yiping Zhang, Manli Zhang, Jing Kang, Guixia Xu, Xin Beijing Univ Posts & Telecommun Minist Educ Key Lab Universal Wireless Commun Beijing 100876 Peoples R China Gen Hosp PLA Dept Neurosurg Beijing 100853 Peoples R China

ISBN: (纸本)9781665405409

The success of supervised deep learning heavily depends on large labeled datasets whose construction is often challenging in medical image analysis. Contrastive learning, a variant of self-supervised learning, is a potential solution to alleviate the strong demand for data annotation. In this work, we extend the contrastive learning framework to 3d volumetric medical imaging. Specifically, we propose (1) multiview contrasting strategy to maximize the mutual information between three views of 3d image to learn global representations and (2) long-short spatial contrasting strategy to learn local representations by matching a short spatial clip to a long spatial clip in the latent space. To combine these two strategies, we propose multiview long-short spatial contrastive learning (MLSSCL) framework, which can effectively learn generic 3d representations. Our extensive experiments on two brain Magnetic Resonance Imaging (MRI) datasets demonstrate that MLSSCL significantly outperforms learning from scratch and other self-supervised learning methods on both classification and segmentation tasks.

关键词： Contrastive learning 3d representation learning multiview contrast long-short spatial contrast medical image analysis

来源：评论

学校读者我要写书评

暂无评论

Lang3dSG: Language-based contrastive pre-training for 3d Scene Graph prediction 11

Lang3DSG: Language-based contrastive pre-training for 3D Sce...

引用

International Conference in 3d Vision (3dV)

作者： Koch, Sebastian Hermosilla, Pedro Vaskevicius, Narunas Colosi, Mirco Ropinski, Timo Bosch Ctr Artificial Intelligence Renningen Peoples R China Robert Bosch Corp Res Renningen Germany Univ Ulm Ulm Germany TU Vienna Vienna Austria

ISBN: (纸本)9798350362466;9798350362459

3d scene graphs are an emerging 3d scene representation, that models both the objects present in the scene as well as their relationships. However, learning 3d scene graphs is a challenging task because it requires not only object labels but also relationship annotations, which are very scarce in datasets. While it is widely accepted that pre-training is an effective approach to improve model performance in low data regimes, in this paper, we find that existing pre-training methods are ill-suited for 3d scene graphs. To solve this issue, we present the first language-based pre-training approach for 3d scene graphs, whereby we exploit the strong relationship between scene graphs and language. To this end, we leverage the language encoder of CLIP, a popular vision-language model, to distill its knowledge into our graph-based network. We formulate a contrastive pre-training, which aligns text embeddings of relationships (subject-predicate-object triplets) and predicted 3d graph features. Our method achieves state-of-the-art results on the main semantic 3d scene graph benchmark by showing improved effectiveness over pre-training baselines and outperforming all the existing fully supervised scene graph prediction methods by a significant margin. Furthermore, since our scene graph features are language-aligned, it allows us to query the language space of the features in a zero-shot manner. In this paper, we show an example of utilizing this property of the features to predict the room type of a scene without further training.

关键词： 3d representation learning 3d Scene Graph CLIP GCN language + 3d vision pre-training

来源：评论

学校读者我要写书评

暂无评论

Atom-ProteinQA: Atom-level protein model quality assessment through fine-grained joint learning

引用

COMPUTER METHOdS ANd PROGRAMS IN BIOMEdICINE 2024年 249卷 108078-108078页

作者： Han, Yatong Lu, Yingfeng Yan, Xu Cui, Hannah Cheng, Shenghui Zheng, Jiayou Zhou, Yuzhe Wang, Sheng Li, Zhen Chinese Univ Hong Kong Shenzhen Future Network Intelligence Inst Shenzhen 518172 Peoples R China Chinese Univ Hong Kong Shenzhen Sch Sci & Engn Shenzhen 518172 Peoples R China Westlake Univ Hangzhou 310024 Peoples R China Shanghai Zelixir Biotech Co Ltd Shanghai 200030 Peoples R China

Motivation: Protein model quality assessment (ProteinQA) is a fundamental task that is essential for biologically relevant applications, i.e., protein structure refinement, protein design, etc. Previous works aimed to conduct ProteinQA only on the global structure or per -residue level, ignoring potentially usable and precise cues from a fine-grained per -atom perspective. In this study, we propose an atom -level ProteinQA model, named Atom-ProteinQA, in which two innovative modules are designed to extract geometric and topological atomlevel relationships respectively. Specifically, on the one hand, a geometric perception module exploits 3d sparse convolution to capture the geometric features of the input protein, generating fine-grained atom -level predictions. On the other hand, natural chemical bonds are utilized to construct an atom -level graph, then message passing from a topological perception module is applied to output residue -level predictions in parallel. Eventually, through a cross -model aggregation module, features from different modules mutually interact, enhancing performance on both the atom and residue levels. Results: Extensive experiments show that our proposed Atom-ProteinQA outperforms previous methods by a large margin, regardless of residue -level or atom -level assessment. Concretely, we achieved state-of-the-art performance on CATH-2084, decoy -8000, public benchmarks CASP13 & CASP14, and the CAMEO. Availability: The repository of this project is released on: https://github .com /luyfcandy /Atom _ProteinQA.

关键词： Protein quality assessment Multi-model learning Graph neural network 3d representation learning

来源：评论

学校读者我要写书评

暂无评论

Language-Grounded Indoor 3d Semantic Segmentation in the Wild 1

引用

17th European Conference on Computer Vision (ECCV)

作者： Rozenberszki, david Litany, Or dai, Angela Tech Univ Munich Munich Germany NVIDIA Santa Clara CA USA

ISBN: (数字)9783031198274

ISBN: (纸本)9783031198267;9783031198274

Recent advances in 3d semantic segmentation with deep neural networks have shown remarkable success, with rapid performance increase on available datasets. However, current 3d semantic segmentation benchmarks contain only a small number of categories - less than 30 for ScanNet and SemanticKITTI, for instance, which are not enough to reflect the diversity of real environments (e.g., semantic image understanding covers hundreds to thousands of classes). Thus, we propose to study a larger vocabulary for 3d semantic segmentation with a new extended benchmark on ScanNet data with 200 class categories, an order of magnitude more than previously studied. This large number of class categories also induces a large natural class imbalance, both of which are challenging for existing 3d semantic segmentation methods. To learn more robust 3d features in this context, we propose a language-driven pre-training method to encourage learned 3d features that might have limited training examples to lie close to their pre-trained text embeddings. Extensive experiments show that our approach consistently outperforms state-of-the-art 3d pre-training for 3d semantic segmentation on our proposed benchmark (+9% relative mIoU), including limited-data scenarios with +25% relative mIoU using only 5% annotations.

关键词： 3d semantic scene understanding 3d semantic segmentation 3d representation learning Language+3d vision

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：