检索结果-内蒙古大学图书馆

Conference on Computer Vision and Pattern Recognition (CVPR)

作者： Hai Wu Shijia Zhao Xun Huang Chenglu Wen Xin Li Cheng Wang Fujian Key Laboratory of Sensing and Computing for Smart Cities Xiamen University Section of Visual Computing and Interactive Media Texas A&M University

ISBN: (数字)9798350353006

ISBN: (纸本)9798350353013

The prevalent approaches of unsupervised 3D object de-tection follow cluster-based pseudo-label generation and iterative self-training processes. However, the challenge arises due to the sparsity of LiDAR scans, which leads to pseudo-labels with erroneous size and position, resulting in subpar detection performance. To tackle this problem, this paper introduces a Commonsense Prototype-based Detector, termed CPD, for unsupervised 3D object de-tection. CPD first constructs Commonsense Prototype (CProto) characterized by high-quality bounding box and dense points, based on commonsense intuition. Subse-quently, CPD refines the low-quality pseudo-labels by lever-aging the size prior from CProto. Furthermore, CPD en-hances the detection accuracy of sparsely scanned objects by the geometric knowledge from CProto. CPD outper-forms state-of-the-art unsupervised 3D detectors on Waymo Open Dataset (WOD), PandaSet, and KITTI datasets by a large margin. Besides, by training CPD on WOD and testing on KITTI, CPD attains 90.85% and 81.01% 3D Aver-age Precision on easy and moderate car classes, respectively. These achievements position CPD in close prox-imity to fully supervised detectors, highlighting the sig-nificance of our method. The code will be available at https://***/hailanyi/CPD.

关键词： Training Three-dimensional displays Accuracy Laser radar Prototypes Detectors Object detection

来源：评论

学校读者我要写书评

暂无评论

Commonsense Prototype for Outdoor Unsupervised 3D Object Detection

arXiv

引用

arXiv 2024年

作者： Wu, Hai Zhao, Shijia Huang, Xun Wen, Chenglu Li, Xin Wang, Cheng Fujian Key Laboratory of Sensing and Computing for Smart Cities Xiamen University China Section of Visual Computing and Interactive Media Texas A&M University United States

The prevalent approaches of unsupervised 3D object detection follow cluster-based pseudo-label generation and iterative self-training processes. However, the challenge arises due to the sparsity of LiDAR scans, which leads to pseudo-labels with erroneous size and position, resulting in subpar detection performance. To tackle this problem, this paper introduces a Commonsense Prototype-based Detector, termed CPD, for unsupervised 3D object detection. CPD first constructs Commonsense Prototype (CProto) characterized by high-quality bounding box and dense points, based on commonsense intuition. Subsequently, CPD refines the low-quality pseudo-labels by leveraging the size prior from CProto. Furthermore, CPD enhances the detection accuracy of sparsely scanned objects by the geometric knowledge from CProto. CPD outperforms state-of-the-art unsupervised 3D detectors on Waymo Open Dataset (WOD), PandaSet, and KITTI datasets by a large margin. Besides, by training CPD on WOD and testing on KITTI, CPD attains 90.85% and 81.01% 3D Average Precision on easy and moderate car classes, respectively. These achievements position CPD in close proximity to fully supervised detectors, highlighting the significance of our method. The code will be available at https://***/hailanyi/CPD. Copyright © 2024, The Authors. All rights reserved.

关键词： Object detection

来源：评论

学校读者我要写书评

暂无评论

HINTED: Hard Instance Enhanced Detector with Mixed-Density Feature Fusion for Sparsely-Supervised 3D Object Detection

HINTED: Hard Instance Enhanced Detector with Mixed-Density F...

引用

Conference on Computer Vision and Pattern Recognition (CVPR)

作者： Qiming Xia Wei Ye Hai Wu Shijia Zhao Leyuan Xing Xun Huang Jinhao Deng Xin Li Chenglu Wen Cheng Wang Fujian Key Laboratory of Sensing and Computing for Smart Cities Xiamen University Xiamen China Section of Visual Computing and Interactive Media Texas A&M University Texas USA

ISBN: (数字)9798350353006

ISBN: (纸本)9798350353013

Current sparsely-supervised object detection methods largely depend on high threshold settings to derive high-quality pseudo labels from detector predictions. However, hard instances within point clouds frequently display incomplete structures, causing decreased confidence scores in their assigned pseudo-labels. Previous methods inevitably result in inadequate positive supervision for these instances. To address this problem, we propose a novel Hard INsTance Enhanced Detector (HINTED), for sparsely-supervised 3D object detection. Firstly, we design a self-boosting teacher (SBT) model to generate more potential pseudo-labels, enhancing the effectiveness of information transfer. Then, we introduce a mixed-density student (MDS) model to concentrate on hard instances during the training phase, thereby improving detection accuracy. Our extensive experiments on the KITTI dataset validate our method's superior performance. Compared with leading sparsely-supervised methods, HINTED significantly improves the detection performance on hard instances, no-tably outperforming fully-supervised methods in detecting challenging categories like cyclists. HINTED also significantly outperforms the state-of-the-art semi-supervised method on challenging categories. The code is available at https://***/xmuqimingxia/HINTED.

关键词： Training Point cloud compression Computer vision Three-dimensional displays Codes Accuracy Object detection

来源：评论

学校读者我要写书评

暂无评论

Enhancing Weakly Supervised Semantic Segmentation with Multi-modal Foundation Models: An End-to-End Approach

arXiv

引用

arXiv 2024年

作者： Ravanbakhsh, Elham Niu, Cheng Liang, Yongqing Ramanujam, J. Li, Xin Louisiana State University Baton RougeLA70803 United States Department of Computer Science & Engineering Texas A&M University College StationTX77843 United States Section of Visual Computing and Interactive Media Texas A&M University College StationTX77843 United States

Semantic segmentation is a core computer vision problem, but the high costs of data annotation have hindered its wide application. Weakly-Supervised Semantic Segmentation (WSSS) offers a cost-efficient workaround to extensive labeling in comparison to fully-supervised methods by using partial or incomplete labels. Existing WSSS methods have difficulties in learning the boundaries of objects leading to poor segmentation results. We propose a novel and effective framework that addresses these issues by leveraging visual foundation models inside the bounding box. Adopting a two-stage WSSS framework, our proposed network consists of a pseudo-label generation module and a segmentation module. The first stage leverages Segment Anything Model (SAM) to generate high-quality pseudo-labels. To alleviate the problem of delineating precise boundaries, we adopt SAM inside the bounding box with the help of another pre-trained foundation model (e.g., Grounding-DINO). Furthermore, we eliminate the necessity of using the supervision of image labels, by employing CLIP in classification. Then in the second stage, the generated high-quality pseudo-labels are used to train an off-the-shelf segmenter that achieves the state-of-the-art performance on PASCAL VOC 2012 and MS COCO 2014. Copyright © 2024, The Authors. All rights reserved.

关键词： Semantic Segmentation

来源：评论

学校读者我要写书评

暂无评论

Deep Video Representation Learning: A Survey

arXiv

引用

arXiv 2024年

作者： Ravanbakhsh, Elham Liang, Yongqing Ramanujam, J. Li, Xin Division of Electrical & Computer Engineering Center for Computation & Technology Louisiana State University Baton RougeLA70803 United States Department of Computer Science and Engineering Texas A&M University College StationTX77843 United States Section of Visual Computing and Interactive Media Texas A&M University College StationTX77843 United States

This paper provides a review on representation learning for videos. We classify recent spatio-temporal feature learning methods for sequential visual data and compare their pros and cons for general video analysis. Building effective features for videos is a fundamental problem in computer vision tasks involving video analysis and understanding. Existing features can be generally categorized into spatial and temporal features. Their effectiveness under variations of illumination, occlusion, view and background are discussed. Finally, we discuss the remaining challenges in existing deep video representation learning studies. © 2024, CC BY-NC-ND.

关键词： Deep learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：