We propose a real-time RGB-based pipeline for objectdetection and6d pose estimation. Our novel 3d orientation estimation is based on a variant of the denoising Autoencoder that is trained on simulated views of a 3d ...
详细信息
We propose a real-time RGB-based pipeline for objectdetection and6d pose estimation. Our novel 3d orientation estimation is based on a variant of the denoising Autoencoder that is trained on simulated views of a 3d model using domain Randomization. This so-called Augmented Autoencoder has several advantages over existing methods: It does not require real, pose-annotated training data, generalizes to various test sensors and inherently handles object and view symmetries. Instead of learning an explicit mapping from input images to object poses, it provides an implicit representation of object orientations defined by samples in a latent space. Our pipeline achieves state-of-the-art performance on the T-LESS dataset both in the RGB and RGB-ddomain. We also evaluate on the LineMOddataset where we can compete with other synthetically trained approaches. We further increase performance by correcting 3d orientation estimates to account for perspective errors when the objectdeviates from the image center and show extended results. Our code is available here https://***/dLR-RM/AugmentedAutoencoder.
We propose a real-time RGB-based pipeline for objectdetection and6d pose estimation. Our novel 3d orientation estimation is based on a variant of the denoising Autoencoder that is trained on simulated views of a 3d ...
详细信息
ISBN:
(纸本)9783030012311;9783030012304
We propose a real-time RGB-based pipeline for objectdetection and6d pose estimation. Our novel 3d orientation estimation is based on a variant of the denoising Autoencoder that is trained on simulated views of a 3d model using domain Randomization. This so-called Augmented Autoencoder has several advantages over existing methods: It does not require real, pose-annotated training data, generalizes to various test sensors and inherently handles object and view symmetries. Instead of learning an explicit mapping from input images to object poses, it provides an implicit representation of object orientations defined by samples in a latent space. Experiments on the T-LESS and LineMOddatasets show that our method outperforms similar model-based approaches and competes with state-of-the art approaches that require real pose-annotated images.
object pose estimation based on a single RGB image has wide application potential but is difficult to achieve. Existing pose estimation involves various inference pipelines. One popular pipeline is to first use Convol...
详细信息
object pose estimation based on a single RGB image has wide application potential but is difficult to achieve. Existing pose estimation involves various inference pipelines. One popular pipeline is to first use Convolutional Neural Networks (CNN) to predict 2d projections of 3d keypoints in a single RGB image and then calculate the 6d pose via a Perspective-n-Point (PnP) solver. due to the gap between synthetic data and real data, the model trained on synthetic data has difficulty predicting the 6d pose accurately when applied to real data. To address the acute problem, we propose a two-stage pipeline of object pose estimation based upon multi-precision vectors and segmentation-driven (Seg-driven) PnP. In keypoint localization stage, we first develop a CNN-based three-branch network to predict multi-precision 2d vectors pointing to 2d keypoints. Then we introduce an accurate and fast Keypoint Voting scheme of Multi-precision vectors (KVM), which computes low-precision 2d keypoints using low-precision vectors and refines 2d keypoints on mid- and high-precision vectors. In the pose calculation stage, we propose Seg-driven PnP to refine the 3d Translation of poses and get the optimal pose by minimizing the non-overlapping area between segmented and rendered masks. The Seg-driven PnP leverages 2d segmentation trained on real images to improve the accuracy of pose estimation trained on synthetic data, thereby reducing the synthetic-to-real gap. Extensive experiments show our approach materially outperforms state-of-the-art methods on LM and HB datasets. Importantly, our proposed method works reasonably well for weakly textured and occludedobjects in diverse scenes.
This paper addresses the problem of 6d pose tracking of plane segments from point clouds acquired from a mobile camera. This is motivated by manual packing operations, where an opportunity exists to enhance performanc...
详细信息
This paper addresses the problem of 6d pose tracking of plane segments from point clouds acquired from a mobile camera. This is motivated by manual packing operations, where an opportunity exists to enhance performance, aiding operators with instructions based on augmented reality. The approach uses as input point clouds, by its advantages for extracting geometric information relevant to estimating the 6d pose of rigidobjects. The proposed algorithm begins with a RANSAC fitting stage on the raw point cloud. It then implements strategies to compute the 2d size and6d pose of plane segments from geometric analysis of the fitted point cloud. Redundant detections are combined using a new quality factor that predicts point cloud mapping density and allows the selection of the most accurate detection. The algorithm is designed for dynamic scenes, employing a novel particle concept in the point cloud space to track detections' validity over time. A variant of the algorithm uses box size priors (available in most packing operations) to filter out irrelevant detections. The impact of this prior knowledge is evaluated through an experimental design that compares the performance of a plane segment tracking system, considering variations in the tracking algorithm and camera speed (onboard the packing operator). The tracking algorithm varies at two levels: algorithm (Awpk\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_{wpk}$$\end{document}), which integrates prior knowledge of box sizes, and algorithm (Awoutpk\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_{woutpk}$$\end{document}), which assumes ignorance of box pro
暂无评论