This paper proposes the pairwise linear regression classification (PLRC) for image set retrieval. In PLRC, we first define a new concept of the unrelated subspace and introduce two strategies to constitute the unrelat...
详细信息
ISBN:
(纸本)9781467388511
This paper proposes the pairwise linear regression classification (PLRC) for image set retrieval. In PLRC, we first define a new concept of the unrelated subspace and introduce two strategies to constitute the unrelated subspace. In order to increase the information of maximizing the query set and the unrelated image set, we introduce a combination metric for two new classifiers based on two constitution strategies of the unrelated subspace. Extensive experiments on six well-known databases prove that the performance of PLRC is better than that of DLRC and several state-of-theart classifiers for different visionrecognition tasks: clusterbased face recognition, video-based face recognition, object recognition and action recognition.
Incremental Structure-from-Motion is a prevalent strategy for 3D reconstruction from unordered image collections. While incremental reconstruction systems have tremendously advanced in all regards, robustness, accurac...
详细信息
ISBN:
(纸本)9781467388511
Incremental Structure-from-Motion is a prevalent strategy for 3D reconstruction from unordered image collections. While incremental reconstruction systems have tremendously advanced in all regards, robustness, accuracy, completeness, and scalability remain the key problems towards building a truly general-purpose pipeline. We propose a new SfM technique that improves upon the state of the art to make a further step towards this ultimate goal. The full reconstruction pipeline is released to the public as an open-source implementation.
We present an approach to long-range spatio-temporal regularization in semantic video segmentation. Temporal regularization in video is challenging because both the camera and the scene may be in motion. Thus Euclidea...
详细信息
ISBN:
(纸本)9781467388511
We present an approach to long-range spatio-temporal regularization in semantic video segmentation. Temporal regularization in video is challenging because both the camera and the scene may be in motion. Thus Euclidean distance in the space-time volume is not a good proxy for correspondence. We optimize the mapping of pixels to a Euclidean feature space so as to minimize distances between corresponding points. Structured prediction is performed by a dense CRF that operates on the optimized features. Experimental results demonstrate that the presented approach increases the accuracy and temporal consistency of semantic video segmentation.
Recently, algorithms for object recognition and related tasks have become sufficiently proficient that new vision tasks can now be pursued. In this paper, we build a system capable of answering open-ended text-based q...
详细信息
ISBN:
(纸本)9781467388511
Recently, algorithms for object recognition and related tasks have become sufficiently proficient that new vision tasks can now be pursued. In this paper, we build a system capable of answering open-ended text-based questions about images, which is known as Visual Question Answering (VQA). Our approach's key insight is that we can predict the form of the answer from the question. We formulate our solution in a Bayesian framework. When our approach is combined with a discriminative model, the combined model achieves state-of-the-art results on four benchmark datasets for open-ended VQA: DAQUAR, COCO-QA, The VQA Dataset, and Visual7W.
A novel method for visual place recognition is introduced and evaluated, demonstrating robustness to perceptual aliasing and observation noise. This is achieved by increasing discrimination through a more structured r...
详细信息
ISBN:
(纸本)9781467388511
A novel method for visual place recognition is introduced and evaluated, demonstrating robustness to perceptual aliasing and observation noise. This is achieved by increasing discrimination through a more structured representation of visual observations. Estimation of observation likelihoods are based on graph kernel formulations, utilizing both the structural and visual information encoded in covisibility graphs. The proposed probabilistic model is able to circumvent the typically difficult and expensive posterior normalization procedure by exploiting the information available in visual observations. Furthermore, the place recognition complexity is independent of the size of the map. Results show improvements over the state-of-theart on a diverse set of both public datasets and novel experiments, highlighting the benefit of the approach.
In this paper, we present a part-based sparse tracker in a particle filter framework where both the motion and appearance model are formulated in 3D. The motion model is adaptive and directed according to a simple yet...
详细信息
ISBN:
(纸本)9781467388511
In this paper, we present a part-based sparse tracker in a particle filter framework where both the motion and appearance model are formulated in 3D. The motion model is adaptive and directed according to a simple yet powerful occlusion handling paradigm, which is intrinsically fused in the motion model. Also, since 3D trackers are sensitive to synchronization and registration noise in the RGB and depth streams, we propose automated methods to solve these two issues. Extensive experiments are conducted on a popular RGBD tracking benchmark, which demonstrate that our tracker can achieve superior results, outperforming many other recent and state-of-the-art RGBD trackers.
We present a method that learns to answer visual questions by selecting image regions relevant to the text-based query. Our method maps textual queries and visual features from various regions into a shared space wher...
详细信息
ISBN:
(纸本)9781467388511
We present a method that learns to answer visual questions by selecting image regions relevant to the text-based query. Our method maps textual queries and visual features from various regions into a shared space where they are compared for relevance with an inner product. Our method exhibits significant improvements in answering questions such as "what color," where it is necessary to evaluate a specific location, and "what room," where it selectively identifies informative image regions. Our model is tested on the recently released VQA [1] dataset, which features free-form human-annotated questions and answers.
A methodology for 3D surface modeling from a single image is proposed. The principal novelty is concave and specular surface modeling without any externally imposed prior. The main idea of the method is to use BRDFs a...
详细信息
ISBN:
(纸本)9781467388511
A methodology for 3D surface modeling from a single image is proposed. The principal novelty is concave and specular surface modeling without any externally imposed prior. The main idea of the method is to use BRDFs and generated rendered surfaces, to transfer the normal field, computed for the generated samples, to the unknown surface. The transferred information is adequate to blow and sculpt the segmented image mask in to a bas-relief of the object. The object surface is further refined basing on a photo-consistency formulation that relates for error minimization the original image and the modeled object.
Recently, convolutional neural networks (CNN) have demonstrated impressive performance in various computervision tasks. However, high performance hardware is typically indispensable for the application of CNN models ...
详细信息
ISBN:
(纸本)9781467388511
Recently, convolutional neural networks (CNN) have demonstrated impressive performance in various computervision tasks. However, high performance hardware is typically indispensable for the application of CNN models due to the high computation complexity, which prohibits their further extensions. In this paper, we propose an efficient framework, namely Quantized CNN, to simultaneously speed-up the computation and reduce the storage and memory overhead of CNN models. Both filter kernels in convolutional layers and weighting matrices in fully-connected layers are quantized, aiming at minimizing the estimation error of each layer's response. Extensive experiments on the ILSVRC-12 benchmark demonstrate 4 ∼ 6× speed-up and 15 ∼ 20× compression with merely one percentage loss of classification accuracy. With our quantized CNN model, even mobile devices can accurately classify images within one second.
In this paper, we present a new algorithm for finding all intersections of three quadrics. The proposed method is algebraic in nature and it is considerably more efficient than the Gröbner basis and resultant-bas...
详细信息
ISBN:
(纸本)9781467388511
In this paper, we present a new algorithm for finding all intersections of three quadrics. The proposed method is algebraic in nature and it is considerably more efficient than the Gröbner basis and resultant-based solutions previously used in computervision applications. We identify several computervision problems that are formulated and solved as systems of three quadratic equations and for which our algorithm readily delivers considerably faster results. Also, we propose new formulations of three important vision problems: absolute camera pose with unknown focal length, generalized pose-and-scale, and hand-eye calibration with known translation. These new formulations allow our algorithm to significantly outperform the state-of-theart in speed.
暂无评论