Advancements in 3D rendering like Gaussian Splatting (GS) allow novel view synthesis and real-time rendering in virtual reality (VR). However, GS-created 3D environments are often difficult to edit. For scene enhancem...
详细信息
ISBN:
(数字)9798331536459
ISBN:
(纸本)9798331536466
Advancements in 3D rendering like Gaussian Splatting (GS) allow novel view synthesis and real-time rendering in virtual reality (VR). However, GS-created 3D environments are often difficult to edit. For scene enhancement or to incorporate 3D assets, segmenting Gaussians by class is essential. Existing segmentation approaches are typically limited to certain types of scenes, e.g., "circular" scenes, to determine clear object boundaries. However, this method is ineffective when removing large objects in non-"circling" scenes such as large outdoor *** propose Semantics-Controlled GS (SCGS), a segmentation-driven GS approach, enabling the separation of large scene parts in uncontrolled, natural environments. SCGS allows scene editing and the extraction of scene parts for VR. Additionally, we introduce a challenging outdoor dataset, overcoming the "circling" setup. We outperform the state-of-the-art in visual quality on our dataset and in segmentation quality on the 3D-OVS dataset. We conducted an exploratory user study, comparing a 360-video, plain GS, and SCGS in VR with a fixed viewpoint. In our subsequent main study, users were allowed to move freely, evaluating plain GS and SCGS. Our main study results show that participants clearly prefer SCGS over plain GS. We overall present an innovative approach that surpasses the state-of-the-art both technically and in user experience.
Integrating the real-world through the power of 3D reconstruction plays an essential role in extendedreality teleconsultation. Video streams and voxel-based representations are common techniques for teleconsultation ...
Integrating the real-world through the power of 3D reconstruction plays an essential role in extendedreality teleconsultation. Video streams and voxel-based representations are common techniques for teleconsultation visualization. However, video streaming applications miss 3D content information, and voxel-based representation can only provide visual details with increasing computational costs. To enhance the classical voxel-based representation and address the timeliness of voxels, we present a modular approach which provides a point cloud overlay to highlight the latest update in the current camera frustum. Our approach applies state-of-the-art algorithms and custom packages within ROS to enable 3D reconstruction with dynamic cameras. We present a modular approach allowing the flexible exchange of the individual part, i.e. the camera or SLAM algorithm. This work presents results from a pilot study involving five participants to gain an impression of what kind of visualization type would be preferred and whether the point cloud overlay would assist the user in recognizing changes in the surroundings. The point cloud overlay enabled the participants to perceive more changes. The pilot study revealed that 60% of the participants showed a preference for the point cloud overlay over the pure mesh representation.
Guidance for assemblable parts is a promising field for augmented reality. Augmented reality assembly guidance requires 6D object poses of target objects in real time. Especially in time-critical medical or industrial...
Guidance for assemblable parts is a promising field for augmented reality. Augmented reality assembly guidance requires 6D object poses of target objects in real time. Especially in time-critical medical or industrial settings, continuous and markerless tracking of individual parts is essential to visualize instructions superimposed on or next to the target object parts. In this regard, occlusions by the user’s hand or other objects and the complexity of different assembly states complicate robust and real-time markerless multi-object tracking. To address this problem, we present Graph-based Object Tracking (GBOT), a novel graph-based single-view RGB-D tracking approach. The real-time markerless multi-object tracking is initialized via 6D pose estimation and updates the graph-based assembly poses. The tracking through various assembly states is achieved by our novel multi-state assembly graph. We update the multi-state assembly graph by utilizing the relative poses of the individual assembly parts. Linking the individual objects in this graph enables more robust object tracking during the assembly process. For evaluation, we introduce a synthetic dataset of publicly available and 3D printable assembly assets as a benchmark for future work. Quantitative experiments in synthetic data and further qualitative study in real test data show that GBOT can outperform existing work towards enabling context-aware augmented reality assembly guidance. Dataset and code will be made publically available. ** ** https://***/roth-hex-lab/gbot
The accurate reconstruction of dynamic scenes with neural radiance fields is significantly dependent on the estimation of camera poses. Widely used structure-from-motion pipelines encounter difficulties in accurately ...
详细信息
In medical and industrial domains, providing guidance for assembly processes can be critical to ensure efficiency and safety. Errors in assembly can lead to significant consequences such as extended surgery times and ...
详细信息
ISBN:
(数字)9798331516475
ISBN:
(纸本)9798331516482
In medical and industrial domains, providing guidance for assembly processes can be critical to ensure efficiency and safety. Errors in assembly can lead to significant consequences such as extended surgery times and prolonged manufacturing or maintenance times in industry. Assembly scenarios can benefit from in-situ augmented reality visualization, i.e., augmentations in close proximity to the target object, to provide guidance, reduce assembly times, and minimize errors. In order to enable in-situ visualization, 6D pose estimation can be leveraged to identify the correct location for an augmentation. Existing 6D pose estimation techniques primarily focus on individual objects and static captures. However, assembly scenarios have various dynamics, including occlusion during assembly and dynamics in the appearance of assembly objects. Existing work focus either on object detection combined with state detection, or focus purely on the pose estimation. To address the challenges of 6D pose estimation in combination with assembly state detection, our approach ASDF builds upon the strengths of YOLOv8, a real-time capable object detection framework. We extend this framework, refine the object pose, and fuse pose knowledge with network-detected pose information. Utilizing our late fusion in our Pose2State module results in refined 6D pose estimation and assembly state detection. By combining both pose and state information, our Pose2State module predicts the final assembly state with precision. The evaluation of our ASDF dataset shows that our Pose2State module leads to an improved assembly state detection and that the improvement of the assembly state further leads to a more robust 6D pose estimation. Moreover, on the GBOT dataset, we outperform the pure deep learning-based network and even outperform the hybrid and pure tracking-based approaches.
Alignment between real and virtual objects is a challenging task required for the deployment of Mixed reality (MR) into manufacturing, medical, and construction applications. To face this challenge, a series of method...
详细信息
Robust environment perception for autonomous vehicles is a tremendous challenge, which makes a diverse sensor set with e.g. camera, lidar and radar crucial. In the process of understanding the recorded sensor data, 3D...
详细信息
Robust environment perception for autonomous vehicles is a tremendous challenge, which makes a diverse sensor set with e.g. camera, lidar and radar crucial. In the process of understanding the recorded sensor data, 3D semantic segmentation plays an important role. Therefore, this work presents a pyramid-based deep fusion architecture for lidar and camera to improve 3D semantic segmentation of traffic scenes. Individual sensor backbones extract feature maps of camera images and lidar point clouds. A novel Pyramid Fusion Backbone fuses these feature maps at different scales and combines the multimodal features in a feature pyramid to compute valuable multimodal, multi-scale features. The Pyramid Fusion Head aggregates these pyramid features and further refines them in a late fusion step, incorporating the final features of the sensor backbones. The approach is evaluated on two challenging outdoor datasets and different fusion strategies and setups are investigated. It outperforms recent range view based lidar approaches as well as all so far proposed fusion strategies and architectures.
暂无评论