Open-vocabulary 3D segmentation enables exploration of 3D spaces using free-form text descriptions. Existing methods for open-vocabulary 3D instance segmentation primarily focus on identifying object-level instances b...
详细信息
Open-vocabulary 3D segmentation enables exploration of 3D spaces using free-form text descriptions. Existing methods for open-vocabulary 3D instance segmentation primarily focus on identifying object-level instances but struggle with finer-grained scene entities such as object parts, or regions described by generic attributes. In this work, we introduce Search3D, an approach to construct hierarchical open-vocabulary 3D scene representations, enabling 3D search at multiple levels of granularity: fine-grained object parts, entire objects, or regions described by attributes like materials. Unlike prior methods, Search3D shifts towards a more flexible open-vocabulary 3D search paradigm, moving beyond explicit object-centric queries. For systematic evaluation, we further contribute a scene-scale open-vocabulary 3D part segmentation benchmark based on MultiScan, along with a set of open-vocabulary fine-grained part annotations on ScanNet++. Search3D outperforms baselines in scene-scale open-vocabulary 3D part segmentation, while maintaining strong performance in segmenting 3D objects and materials.
Semantic segmentation is crucial for autonomous navigation in off-road environments, enabling precise classification of surroundings to identify traversable regions. However, distinctive factors inherent to off-road c...
详细信息
Semantic segmentation is crucial for autonomous navigation in off-road environments, enabling precise classification of surroundings to identify traversable regions. However, distinctive factors inherent to off-road conditions, such as source-target domain discrepancies and sensor corruption from rough terrain, can result in distribution shifts that alter the data differently from the trained conditions. This often leads to inaccurate semantic label predictions and subsequent failures in navigation tasks. To address this, we propose ST-Seg, a novel framework that expands the source distribution through style expansion (SE) and texture regularization (TR). Unlike prior methods that implicitly apply generalization within a fixed source distribution, ST-Seg offers an intuitive approach for distribution shift. Specifically, SE broadens domain coverage by generating diverse realistic styles, augmenting the limited style information of the source domain. TR stabilizes local texture representation affected by style-augmented learning through a deep texture manifold. Experiments across various distribution-shifted target domains demonstrate the effectiveness of ST-Seg, with substantial improvements over existing methods. These results highlight the robustness of ST-Seg, enhancing the real-world applicability of semantic segmentation for off-road navigation.
As a fundamental task in various application scenarios, including autonomous driving and mobile robotic systems, 3D object detection has received extensive attention from researchers in both academia and industry. How...
详细信息
As a fundamental task in various application scenarios, including autonomous driving and mobile robotic systems, 3D object detection has received extensive attention from researchers in both academia and industry. However, due to the working principle of LiDAR and external factors such as occlusion, the collected point cloud of the object is usually sparse and incomplete, which affects the performance of 3D object detector. In this letter, a Structure Completion and Density Awareness Network (SCDA-Net) is proposed for 3D object detection from point clouds. Specifically, a structure completion module is designed to predict dense shapes of complete point clouds by leveraging sequence transduction ability of the transformer architecture. Furthermore, we propose a density-aware voxel RoI pooling strategy to introduce density features that reflect the state information of the original objects in refinement stage. By restoring the complete structure of the objects and considering the true distribution of the points in raw point cloud, the proposed method achieves more accurate feature extraction and scene perception. Extensive experimental results on the KITTI and Waymo datasets demonstrate the effectiveness of the proposed SCDA-Net.
This article describes an algorithm to solve the real-world animal identification problem, i.e., determine the unknown number of K individual animals in a dataset of N unlabeled camera-trap images of African leopards,...
详细信息
This article describes an algorithm to solve the real-world animal identification problem, i.e., determine the unknown number of K individual animals in a dataset of N unlabeled camera-trap images of African leopards, provided by Panthera. To determine the leopards' IDs, we propose an effective automated algorithm, that consists of segmenting leopard bodies from images, scoring similarity between image pairs, and clustering followed by verification. To perform clustering, we employ a modified ternary search that uses a novel adaptive k-medoids++ clustering algorithm. The best clustering is determined using an expanded definition of the silhouette score. A new post-clustering verification procedure is used to further improve the quality of a clustering. The algorithm was evaluated using the Panthera dataset that consists of 677 individual leopards taken from 1555 images, and resulted in a clustering with an adjusted mutual information score of 0.958 as compared to 0.864 using a baseline k-medoids++ clustering algorithm. Note to Practitioners-We proposed an effective automated algorithm to solve the real-world animal identification problem: identifying K unknown individual animals in N images of a given species, with most animals only represented by a single image. This algorithm is different from other methods that assume all images in a dataset are from known individuals and thus regard the animal ID problem as a retrieval identification task. Our approach consists of a new adaptive k-medoids++ clustering algorithm and a novel post-clustering verification procedure. The clustering is performed based on the degree of similarity between all image pairs in the dataset with the result validated using an expanded definition of the silhouette score. The accuracy of our algorithm was demonstrated on a real-world image dataset of African leopards, a small dataset with a relatively large ratio of K/N, provided by Panthera. Code has been made available at: https://***/
Object detection has achieved significant advancements despite the challenges posed by adverse conditions like low-light nighttime environments, where annotated data is not only scarce but also challenging to accurate...
详细信息
Object detection has achieved significant advancements despite the challenges posed by adverse conditions like low-light nighttime environments, where annotated data is not only scarce but also challenging to accurately label. Instead of designing special network, we focus on the creation and efficient utilization of synthetic data to address the problem. We generate synthetic data by employing an enhanced generative model that adeptly transforms daytime images into low-light nighttime ones. Furthermore, we introduce a data selection scheme, named AutoSelecter, which can be flexibly integrated into the training process of object detector, ensuring the selection of the most effective synthetic data. By efficiently utilizing synthetic data, our strategy achieves an average improvement of 5.2% and 6.1% in AP$_{50}$ on the nighttime datasets of BDD100k and Waymo, respectively, for the YOLOv7, YOLOv8, and RT-DETR object detectors. We have insightfully discovered numerous missed and mislabeled annotations in manually annotated low-light nighttime datasets, which can significantly interfere with the accuracy of evaluation results during nighttime. Consequently, we also provide a manually annotated and more accurate dataset BDD100kValNight+ for better evaluation. On this refined dataset, our strategy achieves an average improvement of 5.1% in AP$_{50}$ on the three detectors.
3D object detection is an important task that has been widely applied in autonomous driving. To perform this task, a new trend is to fuse multi-modal inputs, i.e., LiDAR and camera. Under such a trend, recent methods ...
详细信息
3D object detection is an important task that has been widely applied in autonomous driving. To perform this task, a new trend is to fuse multi-modal inputs, i.e., LiDAR and camera. Under such a trend, recent methods fuse these two modalities by unifying them in the same 3D space. However, during direct fusion in a unified space, the drawbacks of both modalities (LiDAR features struggle with detailed semantic information and the camera lacks accurate 3D spatial information) are also preserved, diluting semantic and spatial awareness of the final unified representation. To address the issue, this letter proposes a novel bidirectional complementary LiDAR-camera fusion framework, called BiCo-Fusion that can achieve robust semantic- and spatial-aware 3D object detection. The key insight is to fuse LiDAR and camera features in a bidirectional complementary way to enhance the semantic awareness of the LiDAR and the 3D spatial awareness of the camera. The enhanced features from both modalities are then adaptively fused to build a semantic- and spatial-aware unified representation. Specifically, we introduce Pre-Fusion consisting of a Voxel Enhancement Module (VEM) to enhance the semantic awareness of voxel features from 2D camera features and Image Enhancement Module (IEM) to enhance the 3D spatial awareness of camera features from 3D voxel features. We then introduce Unified Fusion (U-Fusion) to adaptively fuse the enhanced features from the last stage to build a unified representation. Extensive experiments demonstrate the superiority of our BiCo-Fusion against the prior arts.
Autonomous platforms have been widely adopted in both civilian and military contexts, achieving notable success. However, their energy resources, computational power, payload capacity, and cost are constrained and int...
详细信息
Autonomous platforms have been widely adopted in both civilian and military contexts, achieving notable success. However, their energy resources, computational power, payload capacity, and cost are constrained and interdependent. These limitations prevent the integration of advanced threat detection systems, such as high-speed cameras and radar. Consequently, they are unable to detect high-velocity projectiles, significantly undermining their survivability in hostile environments. Developing low-power, cost-effective, and computationally efficient threat detection methods is of critical importance. This letter proposes a method for perceiving the flight trajectories of charged objects based on the principle of electrostatic induction. A physical model for electrostatic detection is established, and the effects of flight speed and trajectory on induction signals are analyzed. Inspired by sharks, a four-element array testing system with specific orientation and spatial distribution characteristics is designed. An indoor experimental setup is developed to simulate and test the flight of charged objects, generating a comprehensive dataset by varying parameters such as flight speed, incident angle, and testing distance. This dataset is then used to train a symbolic regression machine learning model, resulting in a mathematical model capable of predicting the incoming direction of charged objects with an error margin of less than 10 degrees. The model's generalization ability and the impact of the number of sensors are also discussed.
In this study, we develop a model that enables mobilities to have more friendly interactions with users. Specifically, we focus on the referring navigable regions task in which a model grounds navigable regions of the...
详细信息
In this study, we develop a model that enables mobilities to have more friendly interactions with users. Specifically, we focus on the referring navigable regions task in which a model grounds navigable regions of the road using the mobility's camera image and natural language navigation instructions. This task is challenging because of the requirement of vision-and-language comprehension in situations that involve rapidly changing environments with other mobilities. The performance of existing methods is insufficient, partly because they do not consider features related to scene context, such as semantic segmentation information. Therefore, it is important to incorporate these features into a multimodal encoder. In this study, we propose a trimodal (three modalities of language, image, and mask) encoder-decoder model called the Trimodal Navigable Region segmentation Model. We introduce the Text-Mask Encoder Block to process semantic segmentation masks and the Day-Night Classification Branch to balance the input modalities. We validated our model on the Talk2Car-RegSeg dataset. The results demonstrated that our method outperformed the baseline method for standard metrics.
Radar Simulations have become an essential tool in radar algorithm development and testing due to the lack of available high-resolution radar datasets and enormous difficulty in acquiring real-world data. However, sim...
详细信息
Radar Simulations have become an essential tool in radar algorithm development and testing due to the lack of available high-resolution radar datasets and enormous difficulty in acquiring real-world data. However, simulating radar data is challenging as existing radar simulation tools are not easily accessible, require detailed mesh inputs and take hours to simulate. To address these issues, we present SHENRON, an open-source framework that efficiently simulates high-fidelity MIMO radar data using only lidar point cloud and camera images. We show that with SHENRON, one can generate simulated data that can be used to evaluate algorithms as effectively as on real data. Further, one can perform quick iterations through a vast parameter space of the radar to find the best set of parameters for any application, significantly aiding research in radar perception and sensor fusion.
The ability to identify important objects in a complex and dynamic driving environment is essential for autonomous driving agents to make safe and efficient driving decisions. It also helps assistive driving systems d...
详细信息
The ability to identify important objects in a complex and dynamic driving environment is essential for autonomous driving agents to make safe and efficient driving decisions. It also helps assistive driving systems decide when to alert drivers. We tackle object importance estimation in a data-driven fashion and introduce HOIST - Human-annotated Object Importance in Simulated Traffic. HOIST contains driving scenarios with human-annotated importance labels for vehicles and pedestrians. We in addition propose a novel approach that relies on counterfactual reasoning to estimate an object's importance. We generate counterfactual scenarios by modifying the motion of objects and ascribe importance based on how the modifications affect the ego vehicle's driving. Our approach outperforms strong baselines for the task of object importance estimation on HOIST. We also perform ablation studies to justify our design choices and show the significance of the different components of our proposed approach.
暂无评论