Ubiquitous image blur brings out a practically important question - what are effective features to differentiate between blurred and unblurred image regions. We address it by studying a few blur feature representation...
详细信息
ISBN:
(纸本)9781479951178
Ubiquitous image blur brings out a practically important question - what are effective features to differentiate between blurred and unblurred image regions. We address it by studying a few blur feature representations in image gradient, Fourier domain, and data-driven local filters. Unlike previous methods, which are often based on restoration mechanisms, our features are constructed to enhance discriminative power and are adaptive to various blur scales in images. To avail evaluation, we build a new blur perception dataset containing thousands of images with labeled ground-truth. Our results are applied to several applications, including blur region segmentation, deblurring, and blur magnification.
In this paper, we propose a robust method for visual tracking relying on mean shift, sparse coding and spatial pyramids. Firstly, we extend the original mean shift approach to handle orientation space and scale space ...
详细信息
ISBN:
(纸本)9781479951178
In this paper, we propose a robust method for visual tracking relying on mean shift, sparse coding and spatial pyramids. Firstly, we extend the original mean shift approach to handle orientation space and scale space and name this new method as mean transform. the mean transform method estimates the motion, including the location, orientation and scale, of the interested object window simultaneously and effectively. Secondly, a pixel-wise dense patch sampling technique and a region-wise trivial template designing scheme are introduced which enable our approach to run very accurately and efficiently. In addition, instead of using either holistic representation or local representation only, we apply spatial pyramids by combining these two representations into our approach to deal with partial occlusion problems robustly. Observed from the experimental results, our approach outperforms state-of-theart methods in many benchmark sequences.
In this paper, we propose a Switchable Deep Network (SDN) for pedestrian detection. the SDN automatically learns hierarchical features, salience maps, and mixture representations of different body parts. Pedestrian de...
详细信息
ISBN:
(纸本)9781479951178
In this paper, we propose a Switchable Deep Network (SDN) for pedestrian detection. the SDN automatically learns hierarchical features, salience maps, and mixture representations of different body parts. Pedestrian detection faces the challenges of background clutter and large variations of pedestrian appearance due to pose and viewpoint changes and other factors. One of our key contributions is to propose a Switchable Restricted Boltzmann Machine (SRBM) to explicitly model the complex mixture of visual variations at multiple levels. At the feature levels, it automatically estimates saliency maps for each test sample in order to separate background clutters from discriminative regions for pedestrian detection. At the part and body levels, it is able to infer the most appropriate template for the mixture models of each part and the whole body. We have devised a new generative algorithm to effectively pretrain the SDN and then fine-tune it with back-propagation. Our approach is evaluated on the Caltech and Eth datasets and achieves the state-of-the-art detection performance.
In this paper, we focus on the problem of point-to-set classification, where single points are matched against sets of correlated points. Since the points commonly lie in Euclidean space while the sets are typically m...
详细信息
ISBN:
(纸本)9781479951178
In this paper, we focus on the problem of point-to-set classification, where single points are matched against sets of correlated points. Since the points commonly lie in Euclidean space while the sets are typically modeled as elements on Riemannian manifold, they can be treated as Euclidean points and Riemannian points respectively. To learn a metric between the heterogeneous points, we propose a novel Euclidean-to-Riemannian metric learning framework. Specifically, by exploiting typical Riemannian metrics, the Riemannian manifold is first embedded into a high dimensional Hilbert space to reduce the gaps between the heterogeneous spaces and meanwhile respect the Riemannian geometry of the manifold. the final distance metric is then learned by pursuing multiple transformations from the Hilbert space and the original Euclidean space (or its corresponding Hilbert space) to a common Euclidean subspace, where classical Euclidean distances of transformed heterogeneous points can be measured. Extensive experiments clearly demonstrate the superiority of our proposed approach over the state-of-the-art methods.
Groups are the primary entities that make up a crowd. Understanding group-level dynamics and properties is thus scientifically important and practically useful in a wide range of applications, especially for crowd und...
详细信息
ISBN:
(纸本)9781479951178
Groups are the primary entities that make up a crowd. Understanding group-level dynamics and properties is thus scientifically important and practically useful in a wide range of applications, especially for crowd understanding. In this study we show that fundamental group-level properties, such as intra-group stability and inter-group conflict, can be systematically quantified by visual descriptors. this is made possible through learning a novel Collective Transition prior, which leads to a robust approach for group segregation in public spaces. From the prior, we further devise a rich set of group property visual descriptors. these descriptors are scene-independent, and can be effectively applied to public-scene with variety of crowd densities and distributions. Extensive experiments on hundreds of public scene video clips demonstrate that such property descriptors are not only useful but also necessary for group state analysis and crowd scene understanding.
Image matching is one of the most challenging stages in 3D reconstruction, which usually occupies half of computational cost and inaccurate matching may lead to failure of reconstruction. therefore, fast and accurate ...
详细信息
ISBN:
(纸本)9781479951178
Image matching is one of the most challenging stages in 3D reconstruction, which usually occupies half of computational cost and inaccurate matching may lead to failure of reconstruction. therefore, fast and accurate image matching is very crucial for 3D reconstruction. In this paper, we proposed a Cascade Hashing strategy to speed up the image matching. In order to accelerate the image matching, the proposed Cascade Hashing method is designed to be three-layer structure: hashing lookup, hashing remapping, and hashing ranking. Each layer adopts different measures and filtering strategies, which is demonstrated to be less sensitive to noise. Extensive experiments show that image matching can be accelerated by our approach in hundreds times than brute force matching, even achieves ten times or more than Kd-tree based matching while retaining comparable accuracy.
this work proposes a novel framework for optimization in the constrained diffeomorphism space for deformable surface registration. First the diffeomorphismspace is modeled as a special complex functional space on the ...
详细信息
ISBN:
(纸本)9781479951178
this work proposes a novel framework for optimization in the constrained diffeomorphism space for deformable surface registration. First the diffeomorphismspace is modeled as a special complex functional space on the source surface, the Beltrami coefficient space. the physically plausible constraints, in terms of feature landmarks and deformation types, define subspaces in the Beltrami coefficient space. then the harmonic energy of the registration is minimized in the constrained subspaces. the minimization is achieved by alternating two steps: 1) optimization - diffuse the Beltrami coefficient, and 2) projection - first deform the conformal structure by the current Beltrami coefficient and then compose with a harmonic map from the deformed conformal structure to the target. the registration result is diffeomorphic, satisfies the physical landmark and deformation constraints, and minimizes the conformality distortion. Experiments on human facial surfaces demonstrate the efficiency and efficacy of the proposed registration framework.
Brain-inspired computervision (BICV) has evolved rapidly in recent years and it is now competitive with traditional CV approaches. However, most of BICV algorithms have been developed on high power-and-performance pl...
详细信息
ISBN:
(纸本)9781479943098
Brain-inspired computervision (BICV) has evolved rapidly in recent years and it is now competitive with traditional CV approaches. However, most of BICV algorithms have been developed on high power-and-performance platforms (e.g. workstations) or special purpose hardware. We propose two different algorithms for counting people in a classroom, both based on Convolutional Neural Networks (CNNs), a state-of-art deep learning model that is inspired on the structure of the human visual cortex. Furthermore, we provide a standalone parallel C library that implements CNNs and use it to deploy our algorithms on the embedded mobile ARM big. LITTLE-based Odroid-XU platform. Our performance and power measurements show that neuromorphic vision is feasible on off-the-shelf embedded mobile platforms, and we show that it can reach very good energy efficiency for non-time-critical tasks such as people counting.
When one records a video/image sequence through a transparent medium (e.g. glass), the image is often a superposition of a transmitted layer (scene behind the medium) and a reflected layer. Recovering the two layers f...
详细信息
ISBN:
(纸本)9781479951178
When one records a video/image sequence through a transparent medium (e.g. glass), the image is often a superposition of a transmitted layer (scene behind the medium) and a reflected layer. Recovering the two layers from such images seems to be a highly ill-posed problem since the number of unknowns to recover is twice as many as the given measurements. In this paper, we propose a robust method to separate these two layers from multiple images, which exploits the correlation of the transmitted layer across multiple images, and the sparsity and independence of the gradient fields of the two layers. A novel Augmented Lagrangian Multiplier based algorithm is designed to efficiently and effectively solve the decomposition problem. the experimental results on both simulated and real data demonstrate the superior performance of the proposed method over the state of the arts, in terms of accuracy and simplicity.
Document images captured by a digital camera often suffer from serious geometric distortions. In this paper, we propose an active method to correct geometric distortions in a camera-captured document image. Unlike man...
详细信息
ISBN:
(纸本)9781479951178
Document images captured by a digital camera often suffer from serious geometric distortions. In this paper, we propose an active method to correct geometric distortions in a camera-captured document image. Unlike many passive rectification methods that rely on text-lines or features extracted from images, our method uses two structured beams illuminating upon the document page to recover two spatial curves. A developable surface is then interpolated to the curves by finding the correspondence between them. the developable surface is finally flattened onto a plane by solving a system of ordinary differential equations. Our method is a content independent approach and can restore a corrected document image of high accuracy with undistorted contents. Experimental results on a variety of real-captured document images demonstrate the effectiveness and efficiency of the proposed method.
暂无评论