In this paper we introduce the Boosted Random Ferns (BRFs) to rapidly build discriminative classifiers for learning and detecting object categories. At the core of our approach we use standard random ferns, but we int...
详细信息
In this paper we introduce the Boosted Random Ferns (BRFs) to rapidly build discriminative classifiers for learning and detecting object categories. At the core of our approach we use standard random ferns, but we introduce four main innovations that let us bring ferns from an instance to a category level, and still retain efficiency. First, we define binary features on the histogram of oriented gradients-domain (as opposed to intensity-), allowing for a better representation of intra-class variability. Second, both the positions where ferns are evaluated within the sliding window, and the location of the binary features for each fern are not chosen completely at random, but instead we use a boosting strategy to pick the most discriminative combination of them. This is further enhanced by our third contribution, that is to adapt the boosting strategy to enable sharing of binary features among different ferns, yielding high recognition rates at a low computational cost. And finally, we show that training can be performed online, for sequentially arriving images. Overall, the resulting classifier can be very efficiently trained, densely evaluated for all image locations in about 0.1 seconds, and provides detection rates similar to competing approaches that require expensive and significantly slower processing times. We demonstrate the effectiveness of our approach by thorough experimentation in publicly available datasets in which we compare against state-of-the-art, and for tasks of both 2D detection and 3D multi-view estimation.
We address the problem of autocalibration of a moving camera with unknown constant intrinsic parameters. Existing autocalibration techniques use numerical optimization algorithms whose convergence to the correct resul...
详细信息
We address the problem of autocalibration of a moving camera with unknown constant intrinsic parameters. Existing autocalibration techniques use numerical optimization algorithms whose convergence to the correct result cannot be guaranteed, in general. To address this problem, we have developed a method where an interval branch-and-bound method is employed for numerical minimization. Thanks to the properties of Interval Analysis this method converges to the global solution with mathematical certainty and arbitrary accuracy and the only input information it requires from the user are a set of point correspondences and a search interval. The cost function is based on the Huang-Faugeras constraint of the essential matrix. A recently proposed interval extension based on Bernstein polynomial forms has been investigated to speed up the search for the solution. Finally, experimental results are presented.
Confluence is a novel non-Intersection over Union (IoU) alternative to Non-Maxima Suppression (NMS) in bounding box post-processing in object detection. It overcomes the inherent limitations of IoU-based NMS variants ...
详细信息
Confluence is a novel non-Intersection over Union (IoU) alternative to Non-Maxima Suppression (NMS) in bounding box post-processing in object detection. It overcomes the inherent limitations of IoU-based NMS variants to provide a more stable, consistent predictor of bounding box clustering by using a normalized Manhattan Distance inspired proximity metric to represent bounding box clustering. Unlike Greedy and Soft NMS, it does not rely solely on classification confidence scores to select optimal bounding boxes, instead selecting the box which is closest to every other box within a given cluster and removing highly confluent neighboring boxes. Confluence is experimentally validated on the MS COCO and CrowdHuman benchmarks, improving Average Precision by 0.2--2.7% and 1--3.8% respectively and Average Recall by 1.3--9.3 and 2.4--7.3% when compared against Greedy and Soft-NMS variants. Quantitative results are supported by extensive qualitative analysis and threshold sensitivity analysis experiments support the conclusion that Confluence is more robust than NMS variants. Confluence represents a paradigm shift in bounding box processing, with potential to replace IoU in bounding box regression processes.
Many computervision problems are formulated as the optimization of a cost function. This approach faces two main challenges: designing a cost function with a local optimum at an acceptable solution, and developing an...
详细信息
Many computervision problems are formulated as the optimization of a cost function. This approach faces two main challenges: designing a cost function with a local optimum at an acceptable solution, and developing an efficient numerical method to search for this optimum. While designing such functions is feasible in the noiseless case, the stability and location of local optima are mostly unknown under noise, occlusion, or missing data. In practice, this can result in undesirable local optima or not having a local optimum in the expected place. On the other hand, numerical optimization algorithms in high-dimensional spaces are typically local and often rely on expensive first or second order information to guide the search. To overcome these limitations, we propose Discriminative Optimization (DO), a method that learns search directions from data without the need of a cost function. DO explicitly learns a sequence of updates in the search space that leads to stationary points that correspond to the desired solutions. We provide a formal analysis of DO and illustrate its benefits in the problem of 3D registration, camera pose estimation, and image denoising. We show that DO outperformed or matched state-of-the-art algorithms in terms of accuracy, robustness, and computational efficiency.
In this paper, we introduce a local image descriptor, DAISY, which is very efficient to compute densely. We also present an EM-based algorithm to compute dense depth and occlusion maps from wide-baseline image pairs u...
详细信息
In this paper, we introduce a local image descriptor, DAISY, which is very efficient to compute densely. We also present an EM-based algorithm to compute dense depth and occlusion maps from wide-baseline image pairs using this descriptor. This yields much better results in wide-baseline situations than the pixel and correlation-based algorithms that are commonly used in narrow-baseline stereo. Also, using a descriptor makes our algorithm robust against many photometric and geometric transformations. Our descriptor is inspired from earlier ones such as SIFT and GLOH but can be computed much faster for our purposes. Unlike SURF, which can also be computed efficiently at every pixel, it does not introduce artifacts that degrade the matching performance when used densely. It is important to note that our approach is the first algorithm that attempts to estimate dense depth maps from wide-baseline image pairs, and we show that it is a good one at that with many experiments for depth estimation accuracy, occlusion detection, and comparing it against other descriptors on laser-scanned ground truth scenes. We also tested our approach on a variety of indoor and outdoor scenes with different photometric and geometric transformations and our experiments support our claim to being robust against these.
The development of customized imageprocessing applications is time consuming and requires high level skills. This paper describes the design of an interactive application generation system oriented towards producing ...
详细信息
The development of customized imageprocessing applications is time consuming and requires high level skills. This paper describes the design of an interactive application generation system oriented towards producing imageprocessing software programs. The description is focused on two models which constitute the core of the human-computer interaction. First, the formulation model identifies and organizes information that is assumed necessary and sufficient for developing imageprocessing applications. This model is represented as a domain ontology which provides primitives for the formulation language. Second, the interaction model defines ways to acquire such information from end-users. The result of the interaction is an application ontology from which a suitable software is generated. This model emphases the gradual emergence of a semantics of the problem through purely symbolic representations. Based on these two models, a prototype system has been implemented to conduct experiments. (C) 2010 Elsevier Ltd. All rights reserved.
We propose a learning approach to tracking explicitly minimizing the computational complexity of the tracking process subject to user-defined probability of failure (loss-of-lock) and precision. The tracker is formed ...
详细信息
We propose a learning approach to tracking explicitly minimizing the computational complexity of the tracking process subject to user-defined probability of failure (loss-of-lock) and precision. The tracker is formed by a Number of Sequences of Learned Linear Predictors (NoSLLiP). Robustness of NoSLLiP is achieved by modeling the object as a collection of local motion predictors-object motion is estimated by the outlier-tolerant RANSAC algorithm from local predictions. The efficiency of the NoSLLiP tracker stems 1) from the simplicity of the local predictors and 2) from the fact that all design decisions, the number of local predictors used by the tracker, their computational complexity (i.e., the number of observations the prediction is based on), locations as well as the number of RANSAC iterations, are all subject to the optimization (learning) process. All time-consuming operations are performed during the learning stage-tracking is reduced to only a few hundred integer multiplications in each step. On PC with 1xK8 3200+, a predictor evaluation requires about 30 mu s. The proposed approach is verified on publicly available sequences with approximately 12,000 frames with ground truth. Experiments demonstrate superiority in frame rates and robustness with respect to the SIFT detector, Lucas-Kanade tracker, and other trackers.
Nestor is a real-time recognition and camera pose estimation system for planar shapes. The system allows shapes that carry contextual meanings for humans to be used as Augmented Reality (AR) tracking targets. The user...
详细信息
Nestor is a real-time recognition and camera pose estimation system for planar shapes. The system allows shapes that carry contextual meanings for humans to be used as Augmented Reality (AR) tracking targets. The user can teach the system new shapes in real time. New shapes can be shown to the system frontally, or they can be automatically rectified according to previously learned shapes. Shapes can be automatically assigned virtual content by classification according to a shape class library. Nestor performs shape recognition by analyzing contour structures and generating projective-invariant signatures from their concavities. The concavities are further used to extract features for pose estimation and tracking. Pose refinement is carried out by minimizing the reprojection error between sample points on each image contour and its library counterpart. Sample points are matched by evolving an active contour in real time. Our experiments show that the system provides stable and accurate registration, and runs at interactive frame rates on a Nokia N95 mobile phone.
The uidA gene codifies for a glucuronidase (GUS) enzyme which has been used as a biotechnological tool during the last years. When uidA gene is fused to a gene's promotor region, it is possible to evaluate the act...
详细信息
The uidA gene codifies for a glucuronidase (GUS) enzyme which has been used as a biotechnological tool during the last years. When uidA gene is fused to a gene's promotor region, it is possible to evaluate the activity of this one in response to a stimulus. Arabidopsis thaliana has served as the biological platform to elucidate molecular and regulatory signaling responses in plants. Transgenic lines of A. thaliana, tagged with the uidA gene, have allowed explaining how plants modify their hormonal pathways depending on the environmental conditions. Although the information extracted from microscopic images of these transgenic plants is often qualitative and in many publications is not subjected to quantification, in this paper we report the development of an informatics tool focused on computervision for processing and analysis of digital images in order to analyze the expression of the GUS signal in A. thaliana roots, which is strongly correlated with the intensity of the grayscale images. This means that the presence of the GUS-induced color indicates where the gene has been actively expressed, such as our statistical analysis has demonstrated after treatment of A. thaliana DR5::GUS with naphtalen-acetic acid (0.0001 mM and 1 mM). GUSignal is a free informatics tool that aims to be fast and systematic during the image analysis since it executes specific and ordered instructions, to offer a segmented analysis by areas or regions of interest, providing quantitative results of the image intensity levels.
This paper describes and analyses the performance of a novel feature extraction technique for the recognition of segmented/cursive characters that may be used in the context of a segmentation-based handwritten word re...
详细信息
This paper describes and analyses the performance of a novel feature extraction technique for the recognition of segmented/cursive characters that may be used in the context of a segmentation-based handwritten word recognition system. The modified direction feature (MDF) extraction technique builds upon the direction feature (DF) technique proposed previously that extracts direction information from the structure of character contours. This principal was extended so that the direction information is integrated with a technique for detecting transitions between background and foreground pixels in the character image. In order to improve on the DF extraction technique, a number of modifications were undertaken. With a view to describe the character contour more effectively, a re-design of the direction number determination technique was performed. Also, an additional global feature was introduced to improve the recognition accuracy for those characters that were most frequently confused with patterns of similar appearance. MDF was tested using a neural network-based classifier and compared to the DF and transition feature (TF) extraction techniques. MDF outperformed both DF and TF techniques using a benchmark dataset and compared favourably with the top results in the literature. A recognition accuracy of above 89% is reported on characters from the CEDAR dataset. (c) 2006 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
暂无评论