One of the major open challenges in self-driving cars is the ability to detect cars and pedestrians to safely navigate in the world. Deep learning-based object detector approaches have enabled great advances in using ...
详细信息
One of the major open challenges in self-driving cars is the ability to detect cars and pedestrians to safely navigate in the world. Deep learning-based object detector approaches have enabled great advances in using camera imagery to detect and classify objects. But for a safety critical application, such as autonomous driving, the error rates of the current state of the art are still too high to enable safe operation. Moreover, the characterization of object detector performance is primarily limited to testing on prerecorded datasets. Errors that occur on novel data go undetected without additional human labels. In this letter, we propose an automated method to identify mistakes made by object detectors without ground truth labels. We show that inconsistencies in the object detector output between a pair of similar images can be used as hypotheses for false negatives (e.g., missed detections) and using a novel set of features for each hypothesis, an off-the-shelf binary classifier can he used to find valid errors. In particular, we study two distinct cues-temporal and stereo inconsistencies-using data that are readily available on most autonomous vehicles. Our method can be used with any camera-based object detector and we illustrate the technique on several sets of real world data. We show that a state-of-the-art detector, tracker, and our classifier trained only on synthetic data can identify valid errors on KITTI tracking dataset with an average precision of 0.94. We also release a new tracking dataset with 104 sequences totaling 80,655 labeled pairs of stereo images along with ground truth disparity from a game engine to facilitate further research.
Determining the types of vegetation present in an image is a core step in many precision agriculture tasks. In this letter, we focus on pixel-based approaches for classification of crops versus weeds, especially for c...
详细信息
Determining the types of vegetation present in an image is a core step in many precision agriculture tasks. In this letter, we focus on pixel-based approaches for classification of crops versus weeds, especially for complex cases involving overlapping plants and partial occlusion. We examine the benefits of multiscale and content-driven morphology-based descriptors called attribute profiles. These are compared to the state-of-the-art keypoint descriptors with a fixed neighborhood previously used in precision agriculture, namely histograms of oriented gradients and local binary patterns. The proposed classification technique is especially advantageous when coupled with morphology-based segmentation on a max-tree structure, as the same representation can be reused for feature extraction. The robustness of the approach is demonstrated by an experimental evaluation on two datasets with different crop types, while being able to provide descriptors at a higher resolution. The proposed approach compared favorably to the state-of-the-art approaches without an increase in computational complexity, while being able to provide descriptors at a higher resolution.
Methods for capturing and modeling vegetation, such as trees or plants, typically distinguish between two components-branch skeleton and foliage. Current methods do not provide quantitatively accurate tree structure a...
详细信息
Methods for capturing and modeling vegetation, such as trees or plants, typically distinguish between two components-branch skeleton and foliage. Current methods do not provide quantitatively accurate tree structure and foliage density needed for applications such as visualization, inspection, or to estimate vegetation parameters. This letter describes an automatic method for segmenting three-dimensional point cloud data of vegetation, acquired from commodity scanners, into its two main components: branches and leaves, by using geometric features computed directly on the point cloud. In this letter, the specific type of vegetation considered is broadleaf trees. We present a data-driven approach, where a Random forest classifier is used for segmentation. In contrast to state-of-the-art methods, the point cloud is not reduced to a set of primitives such as cylinders. Instead, the algorithm works at the level of the input point cloud itself, preserving quantitative accuracy in the resulting model. Computation of typical vegetation metrics follows naturally from this model. We achieve an average classification accuracy of 91% on simulated data across three different species of broadleaf trees. Qualitative results on real data are also presented.
This letter presents a novel semantic mapping approach, Recurrent-OctoMap, learned from long-term three-dimensional (3-D) Lidar data. Most existing semantic mapping approaches focus on improving semantic understanding...
详细信息
This letter presents a novel semantic mapping approach, Recurrent-OctoMap, learned from long-term three-dimensional (3-D) Lidar data. Most existing semantic mapping approaches focus on improving semantic understanding of single frames, rather than 3-D refinement of semantic maps (i.e. fusing semantic observations). The most widely used approach for the 3-D semantic map refinement is "Bayes update," which fuses the consecutive predictive probabilities following a Markov-chain model. Instead, we propose a learning approach to fuse the semantic features, rather than simply fusing predictions from a classifier. In our approach, we represent and maintain our 3-D map as an OctoMap, and model each cell as a recurrent neural network, to obtain a Recurrent-OctoMap. In this case, the semantic mapping process can he formulated as a sequence-to-sequence encoding-decoding problem. Moreover, in order to extend the duration of observations in our Recurrent-OctoMap, we developed a robust 3-D localization and mapping system for successively mapping a dynamic environment using more than two weeks of data, and the system can he trained and deployed with arbitrary memory length. We validate our approach on the ETH long-term 3-D Lidar dataset. The experimental results show that our proposed approach outperforms the conventional "Bayes update" approach.
Object retrieval and classification in point cloud data are challenged by noise, irregular sampling density, and occlusion. To address this issue, we propose a point pair descriptor that is robust to noise and occlusi...
详细信息
Object retrieval and classification in point cloud data are challenged by noise, irregular sampling density, and occlusion. To address this issue, we propose a point pair descriptor that is robust to noise and occlusion and achieves high retrieval accuracy. We further show how the proposed descriptor can be used in a four-dimensional (4-D) convolutional neural network for the task of object classification. We propose a novel 4-D convolutional layer that is able to learn class-specific clusters in the descriptor histograms. Finally, we provide experimental validation on three benchmark datasets, which confirms the superiority of the proposed approach.
Given two consecutive RGB-D images, we propose a model that estimates a dense three-dimensional (3D) motion field, also known as scene flow. We take advantage of the fact that in robot manipulation scenarios, scenes o...
详细信息
Given two consecutive RGB-D images, we propose a model that estimates a dense three-dimensional (3D) motion field, also known as scene flow. We take advantage of the fact that in robot manipulation scenarios, scenes often consist of a set of rigidly moving objects. Our model jointly estimates the following: First, the segmentation of the scene into an unknown but finite number of objects, second, the motion trajectories of these objects, and finally, the object scene flow. We employ an hourglass, deep neural network architecture. In the encoding stage, the RGB and depth images undergo spatial compression and correlation. In the decoding stage, the model outputs three images containing a per-pixel estimate of the corresponding object center as well as object translation and rotation. This forms the basis for inferring the object segmentation and final object scene flow. To evaluate our model, we generated a new and challenging, large scale, synthetic dataset that is specifically targeted at robotic manipulation: It contains a large number of scenes with a very diverse set of simultaneously moving 3D objects and is recorded with a simulated, static RGB-D camera. In quantitative experiments, we show that we outperform state-of-the-art scene flow and motion-segmentation methods on this data set. In qualitative experiments, we show how our learned model transfers to challenging real-world scenes, visually generating better results than existing methods.
We present a fast and very effective method for object classification that is particularly suited for robotic applications such as grasping and semantic mapping. Our approach is based on a Random Forest classifier tha...
详细信息
ISBN:
(纸本)9781538680940
We present a fast and very effective method for object classification that is particularly suited for robotic applications such as grasping and semantic mapping. Our approach is based on a Random Forest classifier that can be trained incrementally. This has the major benefit that semantic information from new data samples can be incorporated without retraining the entire model. Even if new samples from a previously unseen class are presented, our method is able to perform efficient updates and learn a sustainable representation for this new class. Further features of our method include a very fast and memory-efficient implementation, as well as the ability to interrupt the learning process at any time without a significant performance degradation. Experiments on benchmark data for robotic applications show the clear benefits of our incremental approach and its competitiveness with standard offline methods in terms of classification accuracy.
暂无评论