检索结果-内蒙古大学图书馆

ECG: Edge-aware Point Cloud Completion with Graph Convolution

IEEE ROBOTICS AND AUTOMATION LETTERS 2020年第3期5卷 4392-4398页

作者： Pan, Liang Natl Univ Singapore Adv Robot Ctr Singapore 119077 Singapore

Scanned 3D point clouds for real-world scenes often suffer from noise and incompletion. Observing that prior point cloud shape completion networks overlook local geometric features, we propose our ECG - an Edge-aware point cloud Completion network with Graph convolution, which facilitates fine-grained 3D point cloud shape generation with multi-scale edge features. Our ECG consists of two consecutive stages: 1) skeleton generation and 2) details refinement. Each stage is a generation sub-network conditioned on the input incomplete point cloud. The first stage generates coarse skeletons to facilitate capturing useful edge features against noisy measurements. Subsequently, we design a deep hierarchical encoder with graph convolution to propagate multi-scale edge features for local geometric details refinement. To preserve local geometrical details while upsampling, we propose the Edge-aware Feature Expansion (EFE) module to smoothly expand/upsample point features by emphasizing their local edges. Extensive experiments show that our ECG significantly outperforms previous state-of-the-art (SOTA) methods for point cloud completion.

关键词： deep learning for visual perception computer vision for other robotic applications

来源：评论

学校读者我要写书评

暂无评论

learning Depth With Very Sparse Supervision

引用

IEEE ROBOTICS AND AUTOMATION LETTERS 2020年第4期5卷 5542-5549页

作者： Loquercio, Antonio Dosovitskiy, Alexey Scaramuzza, Davide Univ Zurich Dept Informat Robot & Percept Grp CH-8006 Zurich Switzerland Univ Zurich Dept Neuroinformat CH-8092 Zurich Switzerland Swiss Fed Inst Technol CH-8092 Zurich Switzerland Google Res D-10405 Berlin Germany

Motivated by the astonishing capabilities of natural intelligent agents and inspired by theories from psychology, this paper explores the idea that perception gets coupled to 3D properties of the world via interaction with the environment. Existing works for depth estimation require either massive amounts of annotated training data or some form of hard-coded geometrical constraint. This paper explores a new approach to learning depth perception requiring neither of those. Specifically, we propose a novel global-local network architecture that can be trained with the data observed by a robot exploring an environment: images and extremely sparse depth measurements, down to even a single pixel per image. From a pair of consecutive images, the proposed network outputs a latent representation of the camera's and scene's parameters, and a dense depth map. Experiments on several datasets show that, when ground truth is available even for just one of the image pixels, the proposed network can learn monocular dense depth estimation up to 22.5% more accurately than state-of-the-art approaches. We believe that this work, in addition to its scientific interest, lays the foundations to learn depth with extremely sparse supervision, which can be valuable to all robotic systems acting under severe bandwidth or sensing constraints.

关键词： deep learning for visual perception ai-based methods autonomous agents

来源：评论

学校读者我要写书评

暂无评论

IDDA: A Large-Scale Multi-Domain Dataset for Autonomous Driving

引用

IEEE ROBOTICS AND AUTOMATION LETTERS 2020年第4期5卷 5526-5533页

作者： Alberti, Emanuele Tavera, Antonio Masone, Carlo Caputo, Barbara Politecn Torino Dept Control & Comp Engn I-10129 Turin Italy Italdesign Giugiaro SpA I-10024 Turin Italy

Semantic segmentation is key in autonomous driving. Using deep visual learning architectures is not trivial in this context, because of the challenges in creating suitable large scale annotated datasets. This issue has been traditionally circumvented through the use of synthetic datasets, that have become a popular resource in this field. They have been released with the need to develop semantic segmentation algorithms able to close the visual domain shift between the training and test data. Although exacerbated by the use of artificial data, the problem is extremely relevant in this field even when training on real data. Indeed, weather conditions, viewpoint changes and variations in the city appearances can vary considerably from car to car, and even at test time for a single, specific vehicle. How to deal with domain adaptation in semantic segmentation, and how to leverage effectively several different data distributions (source domains) are important research questions in this field. To support work in this direction, this letter contributes a new large scale, synthetic dataset for semantic segmentation with more than 100 different source visual domains. The dataset has been created to explicitly address the challenges of domain shift between training and test data in various weather and view point conditions, in seven different city types. Extensive benchmark experiments assess the dataset, showcasing open challenges for the current state of the art. The dataset will be available at: https://***/home/.

关键词： Semantic scene understanding deep learning for visual perception computer vision for transportation

来源：评论

学校读者我要写书评

暂无评论

Alleviating the Burden of Labeling: Sentence Generation by Attention Branch Encoder-Decoder Network

引用

IEEE ROBOTICS AND AUTOMATION LETTERS 2020年第4期5卷 5945-5952页

作者： Ogura, Tadashi Magassouba, Aly Sugiura, Komei Hirakawa, Tsubasa Yamashita, Takayoshi Fujiyoshi, Hironobu Kawai, Hisashi Natl Inst Informat & Commun Technol Kyoto 6190289 Japan Keio Univ Yokohama Kanagawa 2238522 Japan Chubu Univ Kasugai Aichi 4878501 Japan

Domestic service robots (DSRs) are a promising solution to the shortage of home care workers. However, one of the main limitations of DSRs is their inability to interact naturally through language. Recently, data-driven approaches have been shown to be effective for tackling this limitation;however, they often require large-scale datasets, which is costly. Based on this background, we aim to perform automatic sentence generation of fetching instructions: for example, "Bring me a green tea bottle on the table." This is particularly challenging because appropriate expressions depend on the target object, as well as its surroundings. In this letter, we propose the attention branch encoder-decoder network (ABEN), to generate sentences from visual inputs. Unlike other approaches, the ABEN has multimodal attention branches that use subword-level attention and generate sentences based on subword embeddings. In experiments, we compared the ABEN with a baseline method using four standard metrics in image captioning. Results show that the ABEN outperformed the baseline in terms of these metrics.

关键词： Novel deep learning methods deep learning for visual perception

来源：评论

学校读者我要写书评

暂无评论

Don't Forget The Past: Recurrent Depth Estimation from Monocular Video

引用

IEEE ROBOTICS AND AUTOMATION LETTERS 2020年第4期5卷 6813-6820页

作者： Patil, Vaishakh Van Gansbeke, Wouter Dai, Dengxin Van Gool, Luc Swiss Fed Inst Technol TRACE Zurich Comp Vis Lab CH-8092 Zurich Switzerland Katholieke Univ Leuven Toyota TRACE Leuven Dept Elect Engn ESAT B-3001 Leuven Belgium

Autonomous cars need continuously updated depth information. Thus far, depth is mostly estimated independently for a single frame at a time, even if themethod starts fromvideo input. Our method produces a time series of depth maps, which makes it an ideal candidate for online learning approaches. In particular, we put three different types of depth estimation (supervised depth prediction, self-supervised depth prediction, and self-supervised depth completion) into a common framework. We integrate the corresponding networks with a ConvLSTM such that the spatiotemporal structures of depth across frames can be exploited to yield a more accurate depth estimation. Our method is flexible. It can be applied to monocular videos only or be combined with different types of sparse depth patterns. We carefully study the architecture of the recurrent network and its training strategy. We are first to successfully exploit recurrent networks for real-time self-supervised monocular depth estimation and completion. Extensive experiments show that our recurrent method outperforms its image-based counterpart consistently and significantly in both self-supervised scenarios. It also outperforms previous depth estimation methods of the three popular groups. Please refer to our webpage(1) for details.

关键词： deep learning for visual perception RGBD perception sensor fusion novel deep learning methods autonomous vehicle navigation

来源：评论

学校读者我要写书评

暂无评论

PrimA6D: Rotational Primitive Reconstruction for Enhanced and Robust 6D Pose Estimation

引用

IEEE ROBOTICS AND AUTOMATION LETTERS 2020年第3期5卷 4955-4962页

作者： Jeon, Myung-Hwan Kim, Ayoung Korea Adv Inst Sci & Technol Dept Robot Program Daejeon 34141 South Korea Korea Adv Inst Sci & Technol Dept Civil & Environm Engn Daejeon 305701 South Korea

In this letter, we introduce a rotational primitive prediction based 6D object pose estimation using a single image as an input. We solve for the 6D object pose of a known object relative to the camera using a single image with occlusion. Many recent state-of-the-art (SOTA) two-step approaches have exploited image keypoints extraction followed by PnP regression for pose estimation. Instead of relying on bounding box or keypoints on the object, we propose to learn orientation-induced primitive so as to achieve the pose estimation accuracy regardless of the object size. We leverage a Variational AutoEncoder (VAE) to learn this underlying primitive and its associated keypoints. The keypoints inferred from the reconstructed primitive image are then used to regress the rotation using PnP. Lastly, we compute the translation in a separate localization module to complete the entire 6D pose estimation. When evaluated over public datasets, the proposed method yields a notable improvement over the LINEMOD, the Occlusion LINEMOD, and the YCB-Video dataset. We further provide a synthetic-only trained case presenting comparable performance to the existing methods which require real images in the training phase.

关键词： perception for grasping and manipulation deep learning for visual perception

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：