检索结果-内蒙古大学图书馆

Search3D: Hierarchical Open-Vocabulary 3D segmentation

IEEE ROBOTICS AND AUTOMATION LETTERS 2025年第3期10卷 2558-2565页

作者： Takmaz, Ayca Delitzas, Alexandros Sumner, Robert W. Engelmann, Francis Wald, Johanna Tombari, Federico Swiss Fed Inst Technol CH-8092 Zurich Switzerland Google CH-8002 Zurich Switzerland Stanford Univ Stanford CA 94305 USA

Open-vocabulary 3D segmentation enables exploration of 3D spaces using free-form text descriptions. Existing methods for open-vocabulary 3D instance segmentation primarily focus on identifying object-level instances but struggle with finer-grained scene entities such as object parts, or regions described by generic attributes. In this work, we introduce Search3D, an approach to construct hierarchical open-vocabulary 3D scene representations, enabling 3D search at multiple levels of granularity: fine-grained object parts, entire objects, or regions described by attributes like materials. Unlike prior methods, Search3D shifts towards a more flexible open-vocabulary 3D search paradigm, moving beyond explicit object-centric queries. For systematic evaluation, we further contribute a scene-scale open-vocabulary 3D part segmentation benchmark based on MultiScan, along with a set of open-vocabulary fine-grained part annotations on ScanNet++. Search3D outperforms baselines in scene-scale open-vocabulary 3D part segmentation, while maintaining strong performance in segmenting 3D objects and materials.

关键词： Three-dimensional displays Image segmentation Search problems Object recognition Robots Image reconstruction Geometry Solid modeling Internet Instance segmentation Object detection RGB-D perception segmentation and categorization semantic scene understanding

来源：评论

学校读者我要写书评

暂无评论

How to Relieve Distribution Shifts in Semantic segmentation for Off-Road Environments

引用

IEEE ROBOTICS AND AUTOMATION LETTERS 2025年第5期10卷 4500-4507页

作者： Hwang, Ji-Hoon Kim, Daeyoung Yoon, Hyung-Suk Kim, Dong-Wook Seo, Seung-Woo Seoul Natl Univ Dept Elect & Comp Engn ASRI INMC Seoul 151742 South Korea Seoul Natl Univ Inst Engn Res Seoul 151742 South Korea

Semantic segmentation is crucial for autonomous navigation in off-road environments, enabling precise classification of surroundings to identify traversable regions. However, distinctive factors inherent to off-road conditions, such as source-target domain discrepancies and sensor corruption from rough terrain, can result in distribution shifts that alter the data differently from the trained conditions. This often leads to inaccurate semantic label predictions and subsequent failures in navigation tasks. To address this, we propose ST-Seg, a novel framework that expands the source distribution through style expansion (SE) and texture regularization (TR). Unlike prior methods that implicitly apply generalization within a fixed source distribution, ST-Seg offers an intuitive approach for distribution shift. Specifically, SE broadens domain coverage by generating diverse realistic styles, augmenting the limited style information of the source domain. TR stabilizes local texture representation affected by style-augmented learning through a deep texture manifold. Experiments across various distribution-shifted target domains demonstrate the effectiveness of ST-Seg, with substantial improvements over existing methods. These results highlight the robustness of ST-Seg, enhancing the real-world applicability of semantic segmentation for off-road navigation.

关键词： Training Semantic segmentation Robot sensing systems Feature extraction Navigation Robots Semantics Standards Robustness Roads Deep learning for visual perception computer vision for transportation object detection segmentation and categorization

来源：评论

学校读者我要写书评

暂无评论

SCDA-Net: Structure Completion and Density Awareness Network for LiDAR-Based 3D Object Detection

引用

IEEE ROBOTICS AND AUTOMATION LETTERS 2025年第5期10卷 4268-4275页

作者： Wu, Shuwen Yang, Jinfu Ma, Jiaqi Zhang, Shaochen Hao, Tianhao Li, Mingai Beijing Univ Technol Sch Informat Sci & Technol Beijing 100124 Peoples R China

As a fundamental task in various application scenarios, including autonomous driving and mobile robotic systems, 3D object detection has received extensive attention from researchers in both academia and industry. However, due to the working principle of LiDAR and external factors such as occlusion, the collected point cloud of the object is usually sparse and incomplete, which affects the performance of 3D object detector. In this letter, a Structure Completion and Density Awareness Network (SCDA-Net) is proposed for 3D object detection from point clouds. Specifically, a structure completion module is designed to predict dense shapes of complete point clouds by leveraging sequence transduction ability of the transformer architecture. Furthermore, we propose a density-aware voxel RoI pooling strategy to introduce density features that reflect the state information of the original objects in refinement stage. By restoring the complete structure of the objects and considering the true distribution of the points in raw point cloud, the proposed method achieves more accurate feature extraction and scene perception. Extensive experimental results on the KITTI and Waymo datasets demonstrate the effectiveness of the proposed SCDA-Net.

关键词： Three-dimensional displays Point cloud compression Proposals Feature extraction Object detection Shape Laser radar Detectors Transformers Convolution segmentation and categorization deep learning for visual perception deep learning methods

来源：评论

学校读者我要写书评

暂无评论

Automatic Identification of Individual African Leopards in Unlabeled Camera Trap Images

引用

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING 2025年 22卷 2460-2471页

作者： Guo, Cheng Miguel, Agnieszka Maciejewski, Anthony A. Colorado State Univ Dept Elect & Comp Engn Ft Collins CO 80523 USA Seattle Univ Dept Elect & Com puter Engn Seattle WA 98122 USA

This article describes an algorithm to solve the real-world animal identification problem, i.e., determine the unknown number of K individual animals in a dataset of N unlabeled camera-trap images of African leopards, provided by Panthera. To determine the leopards' IDs, we propose an effective automated algorithm, that consists of segmenting leopard bodies from images, scoring similarity between image pairs, and clustering followed by verification. To perform clustering, we employ a modified ternary search that uses a novel adaptive k-medoids++ clustering algorithm. The best clustering is determined using an expanded definition of the silhouette score. A new post-clustering verification procedure is used to further improve the quality of a clustering. The algorithm was evaluated using the Panthera dataset that consists of 677 individual leopards taken from 1555 images, and resulted in a clustering with an adjusted mutual information score of 0.958 as compared to 0.864 using a baseline k-medoids++ clustering algorithm. Note to Practitioners-We proposed an effective automated algorithm to solve the real-world animal identification problem: identifying K unknown individual animals in N images of a given species, with most animals only represented by a single image. This algorithm is different from other methods that assume all images in a dataset are from known individuals and thus regard the animal ID problem as a retrieval identification task. Our approach consists of a new adaptive k-medoids++ clustering algorithm and a novel post-clustering verification procedure. The clustering is performed based on the degree of similarity between all image pairs in the dataset with the result validated using an expanded definition of the silhouette score. The accuracy of our algorithm was demonstrated on a real-world image dataset of African leopards, a small dataset with a relatively large ratio of K/N, provided by Panthera. Code has been made available at: https://***/

关键词： Computer vision for automation robotics and automation in life sciences object detection segmentation and categorization automated animal identification camera-trap images clustering

来源：评论

学校读者我要写书评

暂无评论

AutoSelecter: Efficient Synthetic Nighttime Images Make Object Detector Stronger

引用

IEEE ROBOTICS AND AUTOMATION LETTERS 2025年第5期10卷 4660-4665页

作者： Chao, Meng Wang, Mengjie Shi, Wenxiu Zhu, Huiping Song, Zhang Rui, Zhang Ming, Yang Shanghai Jiao Tong Univ Shanghai 200240 Peoples R China Z One Technol Co Ltd Shanghai 201199 Peoples R China

Object detection has achieved significant advancements despite the challenges posed by adverse conditions like low-light nighttime environments, where annotated data is not only scarce but also challenging to accurately label. Instead of designing special network, we focus on the creation and efficient utilization of synthetic data to address the problem. We generate synthetic data by employing an enhanced generative model that adeptly transforms daytime images into low-light nighttime ones. Furthermore, we introduce a data selection scheme, named AutoSelecter, which can be flexibly integrated into the training process of object detector, ensuring the selection of the most effective synthetic data. By efficiently utilizing synthetic data, our strategy achieves an average improvement of 5.2% and 6.1% in AP$_{50}$ on the nighttime datasets of BDD100k and Waymo, respectively, for the YOLOv7, YOLOv8, and RT-DETR object detectors. We have insightfully discovered numerous missed and mislabeled annotations in manually annotated low-light nighttime datasets, which can significantly interfere with the accuracy of evaluation results during nighttime. Consequently, we also provide a manually annotated and more accurate dataset BDD100kValNight+ for better evaluation. On this refined dataset, our strategy achieves an average improvement of 5.1% in AP$_{50}$ on the three detectors.

关键词： Synthetic data Training Feature extraction Detectors Object detection Accuracy Active learning Principal component analysis Uncertainty Geometry segmentation and categorization deep learning for visual perception synthetic data

来源：评论

学校读者我要写书评

暂无评论

BiCo-Fusion: Bidirectional Complementary LiDAR-Camera Fusion for Semantic- and Spatial-Aware 3D Object Detection

引用

IEEE ROBOTICS AND AUTOMATION LETTERS 2025年第2期10卷 1457-1464页

作者： Song, Yang Wang, Lin Hong Kong Univ Sci & Technol Guangzhou AI Thrust Guangzhou 511458 Guangdong Peoples R China Nanyang Technol Univ NTU Sch Elect & Elect Engn EEE Singapore 639798 Singapore

3D object detection is an important task that has been widely applied in autonomous driving. To perform this task, a new trend is to fuse multi-modal inputs, i.e., LiDAR and camera. Under such a trend, recent methods fuse these two modalities by unifying them in the same 3D space. However, during direct fusion in a unified space, the drawbacks of both modalities (LiDAR features struggle with detailed semantic information and the camera lacks accurate 3D spatial information) are also preserved, diluting semantic and spatial awareness of the final unified representation. To address the issue, this letter proposes a novel bidirectional complementary LiDAR-camera fusion framework, called BiCo-Fusion that can achieve robust semantic- and spatial-aware 3D object detection. The key insight is to fuse LiDAR and camera features in a bidirectional complementary way to enhance the semantic awareness of the LiDAR and the 3D spatial awareness of the camera. The enhanced features from both modalities are then adaptively fused to build a semantic- and spatial-aware unified representation. Specifically, we introduce Pre-Fusion consisting of a Voxel Enhancement Module (VEM) to enhance the semantic awareness of voxel features from 2D camera features and Image Enhancement Module (IEM) to enhance the 3D spatial awareness of camera features from 3D voxel features. We then introduce Unified Fusion (U-Fusion) to adaptively fuse the enhanced features from the last stage to build a unified representation. Extensive experiments demonstrate the superiority of our BiCo-Fusion against the prior arts.

关键词： Cameras Laser radar Three-dimensional displays Semantics Feature extraction Point cloud compression Fuses Object detection Proposals Robot vision systems Deep learning for visual perception sensor fusion object detection segmentation and categorization

来源：评论

学校读者我要写书评

暂无评论

Bio-Inspired Electrostatic Detection Method for Threat Perception in Autonomous Platforms

引用

IEEE ROBOTICS AND AUTOMATION LETTERS 2025年第4期10卷 3692-3699页

作者： Man, Menghua Chen, Yazhou Cai, Na Ma, Guilei Wei, Ming Army Engn Univ PLA Shijiazhuang 050003 Peoples R China

Autonomous platforms have been widely adopted in both civilian and military contexts, achieving notable success. However, their energy resources, computational power, payload capacity, and cost are constrained and interdependent. These limitations prevent the integration of advanced threat detection systems, such as high-speed cameras and radar. Consequently, they are unable to detect high-velocity projectiles, significantly undermining their survivability in hostile environments. Developing low-power, cost-effective, and computationally efficient threat detection methods is of critical importance. This letter proposes a method for perceiving the flight trajectories of charged objects based on the principle of electrostatic induction. A physical model for electrostatic detection is established, and the effects of flight speed and trajectory on induction signals are analyzed. Inspired by sharks, a four-element array testing system with specific orientation and spatial distribution characteristics is designed. An indoor experimental setup is developed to simulate and test the flight of charged objects, generating a comprehensive dataset by varying parameters such as flight speed, incident angle, and testing distance. This dataset is then used to train a symbolic regression machine learning model, resulting in a mathematical model capable of predicting the incoming direction of charged objects with an error margin of less than 10 degrees. The model's generalization ability and the impact of the number of sensors are also discussed.

关键词： Electrodes Trajectory Electrostatics Projectiles Electrostatic induction Vectors Sharks Sensor arrays Meters Intelligent sensors Aerial systems: Perception and autonomy object detection robot safety segmentation and categorization

来源：评论

学校读者我要写书评

暂无评论

Trimodal Navigable Region segmentation Model: Grounding Navigation Instructions in Urban Areas

引用

IEEE ROBOTICS AND AUTOMATION LETTERS 2024年第5期9卷 4162-4169页

作者： Hosomi, Naoki Hatanaka, Shumpei Iioka, Yui Yang, Wei Kuyo, Katsuyuki Misu, Teruhisa Yamada, Kentaro Sugiura, Komei Keio Univ Kanagawa 2238522 Japan Honda Res & Dev Co Ltd Tokyo 1076238 Japan Honda Res Inst USA Inc San Jose CA 95134 USA

In this study, we develop a model that enables mobilities to have more friendly interactions with users. Specifically, we focus on the referring navigable regions task in which a model grounds navigable regions of the road using the mobility's camera image and natural language navigation instructions. This task is challenging because of the requirement of vision-and-language comprehension in situations that involve rapidly changing environments with other mobilities. The performance of existing methods is insufficient, partly because they do not consider features related to scene context, such as semantic segmentation information. Therefore, it is important to incorporate these features into a multimodal encoder. In this study, we propose a trimodal (three modalities of language, image, and mask) encoder-decoder model called the Trimodal Navigable Region segmentation Model. We introduce the Text-Mask Encoder Block to process semantic segmentation masks and the Day-Night Classification Branch to balance the input modalities. We validated our model on the Talk2Car-RegSeg dataset. The results demonstrated that our method outperformed the baseline method for standard metrics.

关键词： Object detection semantic scene understanding segmentation and categorization

来源：评论

学校读者我要写书评

暂无评论

SHENRON - Scalable, High Fidelity and EfficieNt Radar SimulatiON

引用

IEEE ROBOTICS AND AUTOMATION LETTERS 2024年第2期9卷 1644-1651页

作者： Bansal, Kshitiz Reddy, Gautham Bharadia, Dinesh Univ Calif San Diego La Jolla CA 92093 USA

Radar Simulations have become an essential tool in radar algorithm development and testing due to the lack of available high-resolution radar datasets and enormous difficulty in acquiring real-world data. However, simulating radar data is challenging as existing radar simulation tools are not easily accessible, require detailed mesh inputs and take hours to simulate. To address these issues, we present SHENRON, an open-source framework that efficiently simulates high-fidelity MIMO radar data using only lidar point cloud and camera images. We show that with SHENRON, one can generate simulated data that can be used to evaluate algorithms as effectively as on real data. Further, one can perform quick iterations through a vast parameter space of the radar to find the best set of parameters for any application, significantly aiding research in radar perception and sensor fusion.

关键词： Object detection segmentation and categorization simulation and animation

来源：评论

学校读者我要写书评

暂无评论

Object Importance Estimation Using Counterfactual Reasoning for Intelligent Driving

引用

IEEE ROBOTICS AND AUTOMATION LETTERS 2024年第4期9卷 3648-3655页

作者： Gupta, Pranay Biswas, Abhijat Admoni, Henny Held, David Carnegie Mellon Univ Robot Inst Pittsburgh PA 15213 USA

The ability to identify important objects in a complex and dynamic driving environment is essential for autonomous driving agents to make safe and efficient driving decisions. It also helps assistive driving systems decide when to alert drivers. We tackle object importance estimation in a data-driven fashion and introduce HOIST - Human-annotated Object Importance in Simulated Traffic. HOIST contains driving scenarios with human-annotated importance labels for vehicles and pedestrians. We in addition propose a novel approach that relies on counterfactual reasoning to estimate an object's importance. We generate counterfactual scenarios by modifying the motion of objects and ascribe importance based on how the modifications affect the ego vehicle's driving. Our approach outperforms strong baselines for the task of object importance estimation on HOIST. We also perform ablation studies to justify our design choices and show the significance of the different components of our proposed approach.

关键词： Videos Vehicles Lifting equipment Estimation Object recognition Pedestrians Annotations Intelligent transportation systems data sets for robotic vision collision avoidance autonomous vehicle navigation object detection segmentation and categorization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：