Scaling up fine-grained recognition to all domains of fine-grained objects is a challenge the computervision community will need to face in order to realize its goal of recognizing all object categories. Current stat...
详细信息
ISBN:
(纸本)9781467369640
Scaling up fine-grained recognition to all domains of fine-grained objects is a challenge the computervision community will need to face in order to realize its goal of recognizing all object categories. Current state-of-the-art techniques rely heavily upon the use of keypoint or part annotations, but scaling up to hundreds or thousands of domains renders this annotation cost-prohibitive for all but the most important categories. In this work we propose a method for fine-grained recognition that uses no part annotations. Our method is based on generating parts using co-segmentation and alignment, which we combine in a discriminative mixture. Experimental results show its efficacy, demonstrating state-of-the-art results even when compared to methods that use part annotations during training.
Capturing and understanding visual signals is one of the core interests of computervision. Much progress has been made w.r.t. many aspects of imaging, but the reconstruction of refractive phenomena, such as turbulenc...
详细信息
ISBN:
(纸本)9781479951178
Capturing and understanding visual signals is one of the core interests of computervision. Much progress has been made w.r.t. many aspects of imaging, but the reconstruction of refractive phenomena, such as turbulence, gas and heat flows, liquids, or transparent solids, has remained a challenging problem. In this paper, we derive an intuitive formulation of light transport in refractive media using light fields and the transport of intensity equation. We show how coded illumination in combination with pairs of recorded images allow for robust computational reconstruction of dynamic two and three-dimensional refractive phenomena.
CoMoGAN is a continuous GAN relying on the unsupervised reorganization of the target data on a functional manifold. To that matter, we introduce a new Functional Instance Normalization layer and residual mechanism, wh...
详细信息
ISBN:
(纸本)9781665445092
CoMoGAN is a continuous GAN relying on the unsupervised reorganization of the target data on a functional manifold. To that matter, we introduce a new Functional Instance Normalization layer and residual mechanism, which together disentangle image content from position on target manifold. We rely on naive physics-inspired models to guide the training while allowing private model/translations features. CoMoGAN can be used with any GAN backbone and allows new types of image translation, such as cyclic image translation like timelapse generation, or detached linear translation. On all datasets, it outperforms the literature.
The problem of finding the closest point in high-dimensional spaces is common in computational vision. Unfortunately, the complexity of most existing search algorithms, such as k-d tree and R-tree, grows exponentially...
详细信息
ISBN:
(纸本)0818672587
The problem of finding the closest point in high-dimensional spaces is common in computational vision. Unfortunately, the complexity of most existing search algorithms, such as k-d tree and R-tree, grows exponentially with dimension, making them impractical for dimensionality above 15. In nearly all applications, the closest point is of interest only if it lies within a user specified distance ε. We present a simple and practical algorithm to efficiently search for the nearest neighbor within Euclidean distance ε. Our algorithm uses a projection search technique along with a novel data structure to dramatically improve performance in high dimensions. A complexity analysis is presented which can help determine ε in structured problems. Benchmarks clearly show the superiority of the proposed algorithm for high dimensional search problems frequently encountered in machine vision, such as real-time object recognition.
Automatic video browsing requires algorithms for detecting a variety of events, including production effects (e.g., scene breaks and captions) and moving objects. We present new methods that use edges and motion for d...
详细信息
ISBN:
(纸本)0818672587
Automatic video browsing requires algorithms for detecting a variety of events, including production effects (e.g., scene breaks and captions) and moving objects. We present new methods that use edges and motion for detecting production effects and computing motion segmentation. Production effects, such as cuts, dissolves, wipes and captions, can be detected by looking for new edges that are far from previous edges. A global motion computation is used to register consecutive images. We have also developed a method for motion segmentation, which does not require computing local optical flow. Our methods run at several frames per second on a Sparc workstation, and tolerate compression artifacts.
The paper presents an analysis of the stability of pose estimation. The investigated pose estimation technique is based on orientations of three edge segments and provides the rotation part of object pose. The specifi...
详细信息
ISBN:
(纸本)0818672587
The paper presents an analysis of the stability of pose estimation. The investigated pose estimation technique is based on orientations of three edge segments and provides the rotation part of object pose. The specific emphasis of the analysis is on determining how the stability varies with view point relative to an object. The stability investigation propagates the uncertainty in edge segment orientations to the resulting effect on the pose parameters. It is shown that there is a very strong variation in noise sensitivity over the range of viewpoints and that exactly what viewpoints offer highest robustness towards noise can be determined in advance. Experiments on real images verify the theoretical results and show that, dependent on viewpoint, pose parameter variance varies from 0.05 to 20 (degrees squared).
Large-scale recognition problems with thousands of classes pose a particular challenge because applying the classifier requires more computation as the number of classes grows. The label tree model integrates classifi...
详细信息
ISBN:
(纸本)9780769549897
Large-scale recognition problems with thousands of classes pose a particular challenge because applying the classifier requires more computation as the number of classes grows. The label tree model integrates classification with the traversal of the tree so that complexity grows logarithmically. In this paper we show how the parameters of the label tree can be found using maximum likelihood estimation. This new probabilistic learning technique produces a label tree with significantly improved recognition accuracy.
We present a new approach to the tracking of very non rigid patterns of motion, such as water flowing down a stream. The algorithm is based on a ''disturbance map,'' which is obtained by linearly subtr...
详细信息
ISBN:
(纸本)0780342364
We present a new approach to the tracking of very non rigid patterns of motion, such as water flowing down a stream. The algorithm is based on a ''disturbance map,'' which is obtained by linearly subtracting the temporal average of the previous frames from the new frame. Every local motion creates a disturbance having the form of a wave, with a ''head'' at the present position of the motion and a historical ''tail'' that indicates the previous locations of that motion. These disturbances serve as loci of attraction for ''tracking particles'' that are scattered throughout the image. The algorithm is very fast and can be performed in real time. We provide excellent tracking results on various complex sequences, using both stabilized and moving cameras, showing: a busy ant column, waterfalls. rapids and. flowing streams, shoppers in a mall, and cars in a traffic intersection.
In this paper, a new skeleton-based approach is proposed for 3D hand gesture recognition. Specifically, we exploit the geometric shape of the hand to extract an effective descriptor from hand skeleton connected joints...
详细信息
ISBN:
(纸本)9781509014378
In this paper, a new skeleton-based approach is proposed for 3D hand gesture recognition. Specifically, we exploit the geometric shape of the hand to extract an effective descriptor from hand skeleton connected joints returned by the Intel RealSense depth camera. Each descriptor is then encoded by a Fisher Vector representation obtained using a Gaussian Mixture Model. A multi-level representation of Fisher Vectors and other skeleton-based geometric features is guaranteed by a temporal pyramid to obtain the final feature vector, used later to achieve the classification by a linear SVM classifier. The proposed approach is evaluated on a challenging hand gesture dataset containing 14 gestures, performed by 20 participants performing the same gesture with two different numbers of fingers. Experimental results show that our skeleton-based approach consistently achieves superior performance over a depth-based approach.
In this paper, we introduce a novel framework for video-based action recognition, which incorporates the sequential information with the spatiotemporal features. Specifically, the spatiotemporal features are extracted...
详细信息
ISBN:
(纸本)9781509014378
In this paper, we introduce a novel framework for video-based action recognition, which incorporates the sequential information with the spatiotemporal features. Specifically, the spatiotemporal features are extracted from the sliced clips of videos, and then a recurrent neural network is applied to embed the sequential information into the final feature representation of the video. In contrast to most current deep learning methods for the video-based tasks, our framework incorporates both long-term dependencies and spatiotemporal information of the clips in the video. To extract the spatiotemporal features from the clips, both dense trajectories (DT) and a newly proposed 3D neural network, C3D, are applied in our experiments. Our proposed framework is evaluated on the benchmark datasets of UCF101 and HMDB51, and achieves comparable performance compared with the state-of-the-art results.
暂无评论