Local descriptors are the ground layer of recognition feature based systems for still images and video. We propose a new framework for the design of local descriptors and their evaluation. This framework is based on t...
详细信息
Local descriptors are the ground layer of recognition feature based systems for still images and video. We propose a new framework for the design of local descriptors and their evaluation. This framework is based on the descriptors decomposition in three levels: primitive extraction, primitive coding and code aggregation. With this framework, we are able to explain most of the popular descriptors in the literature such as HOG, HOF or SURF. This framework provides an efficient and rigorous approach for the evaluation of local descriptors, and allows us to uncover the best parameters for each descriptor family. Moreover, we are able to extend usual descriptors by changing the code aggregation or adding new primitive coding method. The experiments are carried out on images (VOC 2007) and videos datasets (KTH, Hollywood2, UCF11 and UCF101), and achieve equal or better performances than the literature. (C) 2014 Elsevier Ltd. All rights reserved.
Many computervision problems are formulated as the optimization of a cost function. This approach faces two main challenges: designing a cost function with a local optimum at an acceptable solution, and developing an...
详细信息
Many computervision problems are formulated as the optimization of a cost function. This approach faces two main challenges: designing a cost function with a local optimum at an acceptable solution, and developing an efficient numerical method to search for this optimum. While designing such functions is feasible in the noiseless case, the stability and location of local optima are mostly unknown under noise, occlusion, or missing data. In practice, this can result in undesirable local optima or not having a local optimum in the expected place. On the other hand, numerical optimization algorithms in high-dimensional spaces are typically local and often rely on expensive first or second order information to guide the search. To overcome these limitations, we propose Discriminative Optimization (DO), a method that learns search directions from data without the need of a cost function. DO explicitly learns a sequence of updates in the search space that leads to stationary points that correspond to the desired solutions. We provide a formal analysis of DO and illustrate its benefits in the problem of 3D registration, camera pose estimation, and image denoising. We show that DO outperformed or matched state-of-the-art algorithms in terms of accuracy, robustness, and computational efficiency.
Confluence is a novel non-Intersection over Union (IoU) alternative to Non-Maxima Suppression (NMS) in bounding box post-processing in object detection. It overcomes the inherent limitations of IoU-based NMS variants ...
详细信息
Confluence is a novel non-Intersection over Union (IoU) alternative to Non-Maxima Suppression (NMS) in bounding box post-processing in object detection. It overcomes the inherent limitations of IoU-based NMS variants to provide a more stable, consistent predictor of bounding box clustering by using a normalized Manhattan Distance inspired proximity metric to represent bounding box clustering. Unlike Greedy and Soft NMS, it does not rely solely on classification confidence scores to select optimal bounding boxes, instead selecting the box which is closest to every other box within a given cluster and removing highly confluent neighboring boxes. Confluence is experimentally validated on the MS COCO and CrowdHuman benchmarks, improving Average Precision by 0.2--2.7% and 1--3.8% respectively and Average Recall by 1.3--9.3 and 2.4--7.3% when compared against Greedy and Soft-NMS variants. Quantitative results are supported by extensive qualitative analysis and threshold sensitivity analysis experiments support the conclusion that Confluence is more robust than NMS variants. Confluence represents a paradigm shift in bounding box processing, with potential to replace IoU in bounding box regression processes.
In many 3D object-detection and pose-estimation problems, runtime performance is of critical importance. However, there usually is time to train the system, which we will show to be very useful. Assuming that several ...
详细信息
In many 3D object-detection and pose-estimation problems, runtime performance is of critical importance. However, there usually is time to train the system, which we will show to be very useful. Assuming that several registered images of the target object are available, we developed a keypoint-based approach that is effective in this context by formulating wide-baseline matching of keypoints extracted from the input images to those found in the model images as a classification problem. This shifts much of the computational burden to a training phase, without sacrificing recognition performance. As a results, the resulting algorithm is robust, accurate, and fast-enough for frame-rate performance. This reduction in runtime computational complexity is our first contribution. Our second contribution is to show that, in this context, a simple and fast keypoint detector suffices to support detection and tracking even under large perspective and scale variations. While earlier methods require a detector that can be expected to produce very repeatable results, in general, which usually is very time-consuming, we simply find the most repeatable object keypoints for the specific target object during the training phase. We have incorporated these ideas into a real-time system that detects planar, nonplanar, and deformable objects. It then estimates the pose of the rigid ones and the deformations of the others.
Wide field-of-view (FOV) cameras, which capture a larger scene area than narrow FOV cameras, are used in many applications including 3D reconstruction, autonomous driving, and video surveillance. However, wide-angle i...
详细信息
Wide field-of-view (FOV) cameras, which capture a larger scene area than narrow FOV cameras, are used in many applications including 3D reconstruction, autonomous driving, and video surveillance. However, wide-angle images contain distortions that violate the assumptions underlying pinhole camera models, resulting in object distortion, difficulties in estimating scene distance, area, and direction, and preventing the use of off-the-shelf deep models trained on undistorted images for downstream computervision tasks. image rectification, which aims to correct these distortions, can solve these problems. In this paper, we comprehensively survey progress in wide-angle image rectification from transformation models to rectification methods. Specifically, we first present a detailed description and discussion of the camera models used in different approaches. Then, we summarize several distortion models including radial distortion and projection distortion. Next, we review both traditional geometry-based image rectification methods and deep learning-based methods, where the former formulates distortion parameter estimation as an optimization problem and the latter treats it as a regression problem by leveraging the power of deep neural networks. We evaluate the performance of state-of-the-art methods on public datasets and show that although both kinds of methods can achieve good results, these methods only work well for specific camera models and distortion types. We also provide a strong baseline model and carry out an empirical study of different distortion models on synthetic datasets and real-world wide-angle images. Finally, we discuss several potential research directions that are expected to further advance this area in the future.
Experiments on public datasets suggest that this method certifies its effectiveness, reaches human-level performance, and outperforms current state-of-the-art methods with 92.8% on the extended Cohn-Kanade (CK+) and 8...
详细信息
Experiments on public datasets suggest that this method certifies its effectiveness, reaches human-level performance, and outperforms current state-of-the-art methods with 92.8% on the extended Cohn-Kanade (CK+) and 87.0% on FERPLUS.
“A locally-processed light-weight deep neural network for detecting colorectal polyps in wireless capsule endoscopes” propose a light-weight DNN model that has the potential of running locally in the WCE [2].
[...]only images indicating potential diseases are transmitted, saving energy on data transmission.
Background subtraction is a substantially important video processing task that aims at separating the foreground from a video in order to make the post-processing tasks efficient.
[...]several different techniques have been proposed for this task but most of them cannot perform well for the videos having variations in both the foreground and the background.
“Background subtraction in videos using LRMF and CWM algorithm,” a novel background subtraction technique is proposed that aims at progressively fitting a particular subspace for the background that is obtained from L1-low rank matrix regularization using the cyclic weighted median algorithm and a certain distribution of a mixture of Gaussian noise for the foreground [3].
Preventing accidental injuries of toddlers requires thorough, consistent supervision, but this isn't always practical. A proposed vision-based system detects three fall risk factors in the home environment to help...
详细信息
Preventing accidental injuries of toddlers requires thorough, consistent supervision, but this isn't always practical. A proposed vision-based system detects three fall risk factors in the home environment to help caregivers supervise nearby toddlers when they can't give continuous attention to the toddlers. The crucial technical challenge is to differentiate a human from other foreground objects in the images. Unlike previous systems, this one uses multiple dynamic motion cues for human detection, employing cues related to human appearance.
Many different deep networks have been used to approximate, accelerate or improve traditional image operators. Among these traditional operators, many contain parameters which need to be tweaked to obtain the satisfac...
详细信息
Many different deep networks have been used to approximate, accelerate or improve traditional image operators. Among these traditional operators, many contain parameters which need to be tweaked to obtain the satisfactory results, which we refer to as "parameterized image operators". However, most existing deep networks trained for these operators are only designed for one specific parameter configuration, which does not meet the needs of real scenarios that usually require flexible parameters settings. To overcome this limitation, we propose a new decoupled learning algorithm to learn from the operator parameters to dynamically adjust the weights of a deep network for image operators, denoted as the base network. The learned algorithm is formed as another network, namely the weight learning network, which can be end-to-end jointly trained with the base network. Experiments demonstrate that the proposed framework can be successfully applied to many traditional parameterized image operators. To accelerate the parameter tuning for practical scenarios, the proposed framework can be further extended to dynamically change the weights of only one single layer of the base network while sharing most computation cost. We demonstrate that this cheap parameter-tuning extension of the proposed decoupled learning framework even outperforms the state-of-the-art alternative approaches.
We propose to model the statistics of natural images, thanks to the large class of stochastic processes called Infinitely Divisible Cascades ( IDCs). IDCs were first introduced in one dimension to provide multifractal...
详细信息
We propose to model the statistics of natural images, thanks to the large class of stochastic processes called Infinitely Divisible Cascades ( IDCs). IDCs were first introduced in one dimension to provide multifractal time series to model the so- called intermittency phenomenon in hydrodynamical turbulence. We have extended the definition of scalar IDCs from one to N dimensions and commented on the relevance of such a model in fully developed turbulence in [ 1]. In this paper, we focus on the particular 2D case. IDCs appear as good candidates to model the statistics of natural images. They share most of their usual properties and appear to be consistent with several independent theoretical and experimental approaches of the literature. We point out the interest of IDCs for applications to procedural texture synthesis.
暂无评论