We tackle stationary crowd analysis in this paper, which is similarly important as modeling mobile groups in crowd scenes and finds many applications in surveillance. Our key contribution is to propose a robust algori...
详细信息
ISBN:
(纸本)9781479951178
We tackle stationary crowd analysis in this paper, which is similarly important as modeling mobile groups in crowd scenes and finds many applications in surveillance. Our key contribution is to propose a robust algorithm of estimating how long a foreground pixel becomes stationary. It is much more challenging than only subtracting background because failure at a single frame due to local movement of objects, lighting variation, and occlusion could lead to large errors on stationary time estimation. To accomplish decent results, sparse constraints along spatial and temporal dimensions are jointly added by mixed partials to shape a 3D stationary time map. It is formulated as a L-0 optimization problem. Besides background subtraction, it distinguishes among different foreground objects, which are close or overlapped in the spatio-temporal space by using a locally shared foreground codebook. the proposed technologies are used to detect four types of stationary group activities and analyze crowd scene structures. We provide the first public benchmark dataset(1) for stationary time estimation and stationary group analysis.
Head detection and localization are one of the most investigated and demanding tasks of the computervision community. these are also a key element for many disciplines, like Human computer Interaction, Human Behavior...
详细信息
ISBN:
(纸本)9781538637883
Head detection and localization are one of the most investigated and demanding tasks of the computervision community. these are also a key element for many disciplines, like Human computer Interaction, Human Behavior Understanding, Face Analysis and Video Surveillance. In last decades, many efforts have been conducted to develop accurate and reliable head or face detectors on standard RGB images, but only few solutions concern other types of images, such as depth maps. In this paper, we propose a novel method for head detection on depth images, based on a deep learning approach. In particular, the presented system overcomes the classic sliding-window approach, that is often the main computational bottleneck of many object detectors, through a Fully Convolutional Network. Two public datasets, namely Pandora and Watch-n-Patch, are exploited to train and test the proposed network. Experimental results confirm the effectiveness of the method, that is able to exceed all the state-of-art works based on depth images and to run with real time performance.
Face recognition is a research problem across multiple disciplines such as computervision, patternrecognition, artificial intelligence, psychology, healthcare etc. At present, the rehabilitation centers in hospitals...
详细信息
Video compression algorithms have been designed aiming at pleasing human viewers, and are driven by video quality metrics that are designed to account for the capabilities of the human visual system. However, thanks t...
详细信息
ISBN:
(纸本)9781538637883
Video compression algorithms have been designed aiming at pleasing human viewers, and are driven by video quality metrics that are designed to account for the capabilities of the human visual system. However, thanks to the advances in computervision systems more and more videos are going to be watched by algorithms, e.g. implementing video surveillance systems or performing automatic video tagging. this paper describes an adaptive video coding approach for computervision-based systems. We show how to control the quality of video compression so that automatic object detectors can still process the resulting video, improving their detection performance, by preserving the elements of the scene that are more likely to contain meaningful content. Our approach is based on computation of saliency maps exploiting a fast objectness measure. the computational efficiency of this approach makes it usable in a real-time video coding pipeline. Experiments show that our technique outperforms standard H.265 in speed and coding efficiency, and can be applied to different types of video domains, from surveillance to web videos.
this paper studies visual pattern discovery in large-scale image collections via binarized mode seeking, where images can only be represented as binary codes for efficient storage and computation. We address this prob...
详细信息
ISBN:
(纸本)9781538604571
this paper studies visual pattern discovery in large-scale image collections via binarized mode seeking, where images can only be represented as binary codes for efficient storage and computation. We address this problem from the perspective of binary space mode seeking. First, a binary mean shift (bMS) is proposed to discover frequent patterns via mode seeking directly in binary space. the binomial-based kernel and binary constraint are introduced for binarized analysis. Second, we further extend bMS to a more general form, namely contrastive binary mean shift (cbMS), which maximizes the contrastive density in binary space, for finding informative patterns that are both frequent and discriminative for the dataset. Withthe binarized algorithm and optimization, our methods demonstrate significant computation (50x) and storage (32x) improvement compared to standard techniques operating in Euclidean space, while the performance does not largely degenerate. Furthermore, cbMS discovers more informative patterns by suppressing low discriminative modes. We evaluate our methods on both annotated ILSVRC (1M images) and un-annotated blind Flickr (10M images) datasets with million scale images, which demonstrates boththe scalability and effectiveness of our algorithms for discovering frequent and informative patterns in large scale collection.
Weighted median, in the form of either solver or filter, has been employed in a wide range of computervision solutions for its beneficial properties in sparsity representation. But it is hard to be accelerated due to...
详细信息
ISBN:
(纸本)9781479951178
Weighted median, in the form of either solver or filter, has been employed in a wide range of computervision solutions for its beneficial properties in sparsity representation. But it is hard to be accelerated due to the spatially varying weight and the median property. We propose a few efficient schemes to reduce computation complexity from O(r(2)) to O(r) where r is the kernel size. Our contribution is on a new joint-histogram representation, median tracking, and a new data structure that enables fast data access. the effectiveness of these schemes is demonstrated on optical flow estimation, stereo matching, structure-texture separation, image filtering, to name a few. the running time is largely shortened from several minutes to less than 1 second. the source code is provided in the project website.
In this paper, we propose a novel approach of learning mid-level filters from automatically discovered patch clusters for person re-identification. It is well motivated by our study on what are good filters for person...
详细信息
ISBN:
(纸本)9781479951178
In this paper, we propose a novel approach of learning mid-level filters from automatically discovered patch clusters for person re-identification. It is well motivated by our study on what are good filters for person re-identification. Our mid-level filters are discriminatively learned for identifying specific visual patterns and distinguishing persons, and have good cross-view invariance. First, local patches are qualitatively measured and classified withtheir discriminative power. Discriminative and representative patches are collected for filter learning. Second, patch clusters with coherent appearance are obtained by pruning hierarchical clustering trees, and a simple but effective cross-view training strategy is proposed to learn filters that are view-invariant and discriminative. third, filter responses are integrated with patch matching scores in RankSVM training. the effectiveness of our approach is validated on the VIPeR dataset and the CUHK01 dataset. the learned mid-level features are complementary to existing handcrafted low-level features, and improve the best Rank-1 matching rate on the VIPeR dataset by 14%.
Explosive growth of surveillance video data presents formidable challenges to its browsing, retrieval and storage. Video synopsis, an innovation proposed by Peleg and his colleagues, is aimed for fast browsing by shor...
详细信息
ISBN:
(纸本)9781467312288
Explosive growth of surveillance video data presents formidable challenges to its browsing, retrieval and storage. Video synopsis, an innovation proposed by Peleg and his colleagues, is aimed for fast browsing by shortening the video into a synopsis while keeping activities in video captured by a camera. However, the current techniques are offline methods requiring that all the video data be ready for the processing, and are expensive in time and space. In this paper, we propose an online and efficient solution, and its supporting algorithms to overcome the problems. the method adopts an online content-aware approach in a stepwise manner, hence applicable to endless video, with less computational cost. Moreover, we propose a novel tracking method, called sticky tracking, to achieve high-quality visualization. the system can achieve a faster-than-real-time speed with a multi-core CPU implementation. the advantages are demonstrated by extensive experiments with a wide variety of videos. the proposed solution and algorithms could be integrated with surveillance cameras, and impact the way that surveillance videos are recorded.
We show in this paper that the success of previous maximum a posterior (MAP) based blur removal methods partly stems from their respective intermediate steps, which implicitly or explicitly create an unnatural represe...
详细信息
ISBN:
(纸本)9780769549897
We show in this paper that the success of previous maximum a posterior (MAP) based blur removal methods partly stems from their respective intermediate steps, which implicitly or explicitly create an unnatural representation containing salient image structures. We propose a generalized and mathematically sound L-0 sparse expression, together with a new effective method, for motion deblurring. Our system does not require extra filtering during optimization and demonstrates fast energy decreasing, making a small number of iterations enough for convergence. It also provides a unified framework for both uniform and non-uniform motion deblurring. We extensively validate our method and show comparison with other approaches with respect to convergence speed, running time, and result quality.
Document images captured by a digital camera often suffer from serious geometric distortions. In this paper, we propose an active method to correct geometric distortions in a camera-captured document image. Unlike man...
详细信息
ISBN:
(纸本)9781479951178
Document images captured by a digital camera often suffer from serious geometric distortions. In this paper, we propose an active method to correct geometric distortions in a camera-captured document image. Unlike many passive rectification methods that rely on text-lines or features extracted from images, our method uses two structured beams illuminating upon the document page to recover two spatial curves. A developable surface is then interpolated to the curves by finding the correspondence between them. the developable surface is finally flattened onto a plane by solving a system of ordinary differential equations. Our method is a content independent approach and can restore a corrected document image of high accuracy with undistorted contents. Experimental results on a variety of real-captured document images demonstrate the effectiveness and efficiency of the proposed method.
暂无评论