The problems of dense stereo reconstruction and object class segmentation can both be formulated as Random Field labeling problems, in which every pixel in the image is assigned a label corresponding to either its dis...
详细信息
The problems of dense stereo reconstruction and object class segmentation can both be formulated as Random Field labeling problems, in which every pixel in the image is assigned a label corresponding to either its disparity, or an objectclass such as road or building. While these two problems are mutually informative, no attempt has been made to jointly optimize their labelings. In this work we provide a flexible framework configured via cross-validation that unifies the two problems and demonstrate that, by resolving ambiguities, which would be present in real world data if the two problems were considered separately, joint optimization of the two problems substantially improves performance. To evaluate our method, we augment the Leuven data set (http://***/research/visiongroup/files/***), which is a stereo video shot from a car driving around the streets of Leuven, with 70 hand labeled objectclass and disparity maps. We hope that the release of these annotations will stimulate further work in the challenging domain of street-view analysis. Complete source code is publicly available (http://***/staff/Philip-Torr/***).
In this paper we propose an interactive approach for object class segmentation of natural images on touch-screen capable mobile devices. The key research question to which this paper tries to give an answer is: can we...
详细信息
ISBN:
(纸本)9781479942602
In this paper we propose an interactive approach for object class segmentation of natural images on touch-screen capable mobile devices. The key research question to which this paper tries to give an answer is: can we effectively correct the errors committed by an automatic or semi-automatic figure-ground segmentation algorithm while also providing real time feedback to the user on a low computational power mobile device? Many research works focused on improving automatic or semi-automatic figure-ground segmentation algorithms, but none tried to take advantage of the existing touch-screen technology integrated in most modern mobile devices to optimize the segmentation results of these algorithms. Our key idea is to use superpixels as interactive buttons that can be quickly tapped by the user to be added or removed from an initial low quality segmentation mask, with the aim of correcting the segmentation errors and produce a satisfying final result. We performed an extensive analysis of the proposed approach by implementing it both on a desktop computer and a mid-range Android device;even though our method is extremely simple, the results we obtained are comparable with those achieved by other state-of-the-art interactive segmentation algorithms. As such, we believe that the proposed approach can be exploited by most image editing mobile applications to provide a simple but highly effective method for interactive object class segmentation.
The task of associating a semantic class to the objects present in an image is challenging because this problem involves the joint segmentation and recognition of the objects. In this work, we use a recent approach em...
详细信息
ISBN:
(纸本)9781467310680
The task of associating a semantic class to the objects present in an image is challenging because this problem involves the joint segmentation and recognition of the objects. In this work, we use a recent approach embedding several optimization algorithms into a common framework named Power watershed to perform this task. We show how the fast watershed algorithm can be used to minimize an energy function for which the minimizer corresponds to the desired object class segmentation. Higher order potentials are then added to improve label consistency. We also demonstrate that the random walker algorithm can be successfully applied to semantic classsegmentation problems. Comparisons with the Graph Cuts algorithm show that the proposed approaches yield better segmentation results, obtained up to twelve times faster on a very challenging indoor scenes dataset.
The Markov and Conditional random fields (CRFs) used in computer vision typically model only local interactions between variables, as this is generally thought to be the only case that is computationally tractable. In...
详细信息
The Markov and Conditional random fields (CRFs) used in computer vision typically model only local interactions between variables, as this is generally thought to be the only case that is computationally tractable. In this paper we consider a class of global potentials defined over all variables in the CRF. We show how they can be readily optimised using standard graph cut algorithms at little extra expense compared to a standard pairwise field. This result can be directly used for the problem of class based image segmentation which has seen increasing recent interest within computer vision. Here the aim is to assign a label to each pixel of a given image from a set of possible objectclasses. Typically these methods use random fields to model local interactions between pixels or super-pixels. One of the cues that helps recognition is global object co-occurrence statistics, a measure of which classes (such as chair or motorbike) are likely to occur in the same image together. There have been several approaches proposed to exploit this property, but all of them suffer from different limitations and typically carry a high computational cost, preventing their application on large images. We find that the new model we propose produces a significant improvement in the labelling compared to just using a pairwise model and that this improvement increases as the number of labels increases.
Humans describe images in terms of nouns and adjectives while algorithms operate on images represented as sets of pixels. Bridging this gap between how humans would like to access images versus their typical represent...
详细信息
ISBN:
(纸本)9781450333313
Humans describe images in terms of nouns and adjectives while algorithms operate on images represented as sets of pixels. Bridging this gap between how humans would like to access images versus their typical representation is the goal of image parsing, which involves assigning object and attribute labels to pixels. In this article we propose treating nouns as object labels and adjectives as visual attribute labels. This allows us to formulate the image parsing problem as one of jointly estimating per-pixel object and attribute labels from a set of training images. We propose an efficient (interactive time) solution. Using the extracted labels as handles, our system empowers a user to verbally refine the results. This enables hands-free parsing of an image into pixel-wise object/attribute labels that correspond to human semantics. Verbally selecting objects of interest enables a novel and natural interaction modality that can possibly be used to interact with new generation devices (e.g., smartphones, Google Glass, livingroom devices). We demonstrate our system on a large number of real-world images with varying complexity. To help understand the trade-offs compared to traditional mouse-based interactions, results are reported for both a large-scale quantitative evaluation and a user study.
Recently, a number of cross bilateral filtering methods have been proposed for solving multi-label problems in computer vision, such as stereo, optical flow and object class segmentation that show an order of magnitud...
详细信息
Recently, a number of cross bilateral filtering methods have been proposed for solving multi-label problems in computer vision, such as stereo, optical flow and object class segmentation that show an order of magnitude improvement in speed over previous methods. These methods have achieved good results despite using models with only unary and/or pairwise terms. However, previous work has shown the value of using models with higher-order terms e. g. to represent label consistency over large regions, or global co-occurrence relations. We show how these higher-order terms can be formulated such that filter-based inference remains possible. We demonstrate our techniques on joint stereo and object labelling problems, as well as object class segmentation, showing in addition for joint object-stereo labelling how our method provides an efficient approach to inference in product label-spaces. We show that we are able to speed up inference in these models around 10-30 times with respect to competing graph-cut/move-making methods, as well as maintaining or improving accuracy in all cases. We showresults on PascalVOC-10 for object class segmentation, and Leuven for joint object-stereo labelling.
暂无评论