This paper takes a step forward in image and video coding by extending the well-known Vector of Locally Aggregated Descriptors (VLAD) onto an extensive space of curved Riemannian manifolds. We provide a comprehensive ...
详细信息
ISBN:
(纸本)9781467369640
This paper takes a step forward in image and video coding by extending the well-known Vector of Locally Aggregated Descriptors (VLAD) onto an extensive space of curved Riemannian manifolds. We provide a comprehensive mathematical framework that formulates the aggregation problem of such manifold data into an elegant solution. In particular, we consider structured descriptors from visual data, namely Region Covariance Descriptors and linear subspaces that reside on the manifold of Symmetric Positive Definite matrices and the Grassmannian manifolds, respectively. Through rigorous experimental validation, we demonstrate the superior performance of this novel Riemannian VLAD descriptor on several visual classification tasks including video-based face recognition, dynamic scene recognition, and head pose classification.
The next wave of micro and nano devices will create a world with trillions of small networked cameras. This will lead to increased concerns about privacy and security. Most privacy preserving algorithms for computer v...
详细信息
ISBN:
(纸本)9781467369640
The next wave of micro and nano devices will create a world with trillions of small networked cameras. This will lead to increased concerns about privacy and security. Most privacy preserving algorithms for computervision are applied after image/video data has been captured. We propose to use privacy preserving optics that filter or block sensitive information directly from the incident lightfield before sensor measurements are made, adding a new layer of privacy. In addition to balancing the privacy and utility of the captured data, we address trade-offs unique to miniature vision sensors, such as achieving high-quality field-of-view and resolution within the constraints of mass and volume. Our privacy preserving optics enable applications such as depth sensing, full-body motion tracking, people counting, blob detection and privacy preserving face recognition. While we demonstrate applications on macroscale devices (smartphones, webcams, etc.) our theory has impact for smaller devices.
We present a multi-instance object segmentation algorithm to tackle occlusions. As an object is split into two parts by an occluder, it is nearly impossible to group the two separate regions into an instance by purely...
详细信息
ISBN:
(纸本)9781467369640
We present a multi-instance object segmentation algorithm to tackle occlusions. As an object is split into two parts by an occluder, it is nearly impossible to group the two separate regions into an instance by purely bottom-up schemes. To address this problem, we propose to incorporate top-down category specific reasoning and shape prediction through exemplars into an intuitive energy minimization framework. We perform extensive evaluations of our method on the challenging PASCAL VOC 2012 segmentation set. The proposed algorithm achieves favorable results on the joint detection and segmentation task against the state-of-the-art method both quantitatively and qualitatively.
Shape cues play an important role in computervision, but shape is not the only information available in images. Materials, such as fabric and plastic, are discernible in images even when shapes, such as those of an o...
详细信息
ISBN:
(纸本)9781467369640
Shape cues play an important role in computervision, but shape is not the only information available in images. Materials, such as fabric and plastic, are discernible in images even when shapes, such as those of an object, are not. We argue that it would be ideal to recognize materials without relying on object cues such as shape. This would allow us to use materials as a context for other vision tasks, such as object recognition. Humans are intuitively able to find visual cues that describe materials. Previous frameworks attempt to recognize these cues (as visual material traits) using fully-supervised learning. This requirement is not feasible when multiple annotators and large quantities of images are involved. In this paper, we derive a framework that allows us to discover locally-recognizable material attributes from crowdsourced perceptual material distances. We show that the attributes we discover do in fact separate material categories. Our learned attributes exhibit the same desirable properties as material traits, despite the fact that they are discovered using only partial supervision.
The nine papers in this special section focus on the development of new computervision techniques for the interpretation of remote sensing images. These papers represent a follow-up of two workshops held in conjuncti...
详细信息
The nine papers in this special section focus on the development of new computervision techniques for the interpretation of remote sensing images. These papers represent a follow-up of two workshops held in conjunction with the ieee conference on computer vision and pattern recognition (cvpr) 2015, that was held in Boston, MA, EARTHvision2015 and MSF 2015. The purpose of both workshops and of this special issue is to foster fruitful collaboration of computervision, Earth observation, and geospatial analysis communities.
We propose new dense descriptors for texture segmentation. Given a region of arbitrary shape in an image, these descriptors are formed from shape-dependent scale spaces of oriented gradients. These scale spaces are de...
详细信息
ISBN:
(纸本)9781467369640
We propose new dense descriptors for texture segmentation. Given a region of arbitrary shape in an image, these descriptors are formed from shape-dependent scale spaces of oriented gradients. These scale spaces are defined by Poisson-like partial differential equations. A key property of our new descriptors is that they do not aggregate image data across the boundary of the region, in contrast to existing descriptors based on aggregation of oriented gradients. As an example, we show how the descriptor can be incorporated in a Mumford-Shah energy for texture segmentation. We test our method on several challenging datasets for texture segmentation and textured object tracking. Experiments indicate that our descriptors lead to more accurate segmentation than non-shape dependent descriptors and the state-of-the-art in texture segmentation.
Many computervision problems arise from information processing of data sources with nuisance variances like scale, orientation, contrast, perspective foreshortening or in medical imaging - staining and local warping....
详细信息
ISBN:
(纸本)9781467369640
Many computervision problems arise from information processing of data sources with nuisance variances like scale, orientation, contrast, perspective foreshortening or in medical imaging - staining and local warping. In most cases these variances can be stated a priori and can be used to improve the generalization of recognition algorithms. We propose a novel supervised feature learning approach, which efficiently extracts information from these constraints to produce interpretable, transformation-invariant features. The proposed method can incorporate a large class of transformations, e.g., shifts, rotations, change of scale, morphological operations, non-linear distortions, photometric transformations, etc. These features boost the discrimination power of a novel image classification and segmentation method, which we call Transformation-Invariant Convolutional Jungles (TICJ). We test the algorithm on two benchmarks in face recognition and medical imaging, where it achieves state of the art results, while being computationally significantly more efficient than Deep Neural Networks.
We present a new approach to wide baseline matching. We propose to use a hierarchical decomposition of the image domain and coarse-to-fine selection of regions to match. In contrast to interest point matching methods,...
详细信息
ISBN:
(纸本)9781467369640
We present a new approach to wide baseline matching. We propose to use a hierarchical decomposition of the image domain and coarse-to-fine selection of regions to match. In contrast to interest point matching methods, which sample salient regions to reduce the cost of comparing all regions in two images, our method eliminates regions systematically to achieve efficiency. One advantage of our approach is that it is not restricted to covariant salient regions, which is too restrictive under large viewpoint and leads to few corresponding regions. Affine invariant matching of regions in the hierarchy is achieved efficiently by a coarse-to-fine search of the affine space. Experiments on two benchmark datasets shows that our method finds more correct correspondence of the image (with fewer false alarms) than other wide baseline methods on large viewpoint change.
Though recent advanced convolutional neural networks (CNNs) have been improving the image recognition accuracy, the models are getting more complex and time-consuming. For real-world applications in industrial and com...
详细信息
ISBN:
(纸本)9781467369640
Though recent advanced convolutional neural networks (CNNs) have been improving the image recognition accuracy, the models are getting more complex and time-consuming. For real-world applications in industrial and commercial scenarios, engineers and developers are often faced with the requirement of constrained time budget. In this paper, we investigate the accuracy of CNNs under constrained time cost. Under this constraint, the designs of the network architectures should exhibit as trade-offs among the factors like depth, numbers of filters, filter sizes, etc. With a series of controlled comparisons, we progressively modify a baseline model while preserving its time complexity. This is also helpful for understanding the importance of the factors in network designs. We present an architecture that achieves very competitive accuracy in the ImageNet dataset (11.8% top-5 error, 10-view test), yet is 20% faster than "AlexNet" [14] (16.0% top-5 error, 10-view test).
The estimation of material properties is important for scene understanding, with many applications in vision, robotics, and structural engineering. This paper connects fundamentals of vibration mechanics with computer...
详细信息
ISBN:
(纸本)9781467369640
The estimation of material properties is important for scene understanding, with many applications in vision, robotics, and structural engineering. This paper connects fundamentals of vibration mechanics with computervision techniques in order to infer material properties from small, often imperceptible motion in video. Objects tend to vibrate in a set of preferred modes. The shapes and frequencies of these modes depend on the structure and material properties of an object. Focusing on the case where geometry is known or fixed, we show how information about an object's modes of vibration can be extracted from video and used to make inferences about that object's material properties. We demonstrate our approach by estimating material properties for a variety of rods and fabrics by passively observing their motion in high-speed and regular framerate video.
暂无评论