Detecting pedestrians in cluttered scenes is a challenging problem in computervision. the difficulty is added when several pedestrians overlap in images and occlude each other. We observe, however, that the occlusion...
详细信息
ISBN:
(纸本)9780769549897
Detecting pedestrians in cluttered scenes is a challenging problem in computervision. the difficulty is added when several pedestrians overlap in images and occlude each other. We observe, however, that the occlusion/visibility statuses of overlapping pedestrians provide useful mutual relationship for visibility estimation - the visibility estimation of one pedestrian facilitates the visibility estimation of another. In this paper, we propose a mutual visibility deep model that jointly estimates the visibility statuses of overlapping pedestrians. the visibility relationship among pedestrians is learned from the deep model for recognizing co-existing pedestrians. Experimental results show that the mutual visibility deep model effectively improves the pedestrian detection results. Compared with existing image-based pedestrian detection approaches, our approach has the lowest average miss rate on the Caltech-Train dataset, the Caltech-Test dataset and the Eth dataset. Including mutual visibility leads to 4%- 8% improvements on multiple benchmark datasets.
Groups are the primary entities that make up a crowd. Understanding group-level dynamics and properties is thus scientifically important and practically useful in a wide range of applications, especially for crowd und...
详细信息
ISBN:
(纸本)9781479951178
Groups are the primary entities that make up a crowd. Understanding group-level dynamics and properties is thus scientifically important and practically useful in a wide range of applications, especially for crowd understanding. In this study we show that fundamental group-level properties, such as intra-group stability and inter-group conflict, can be systematically quantified by visual descriptors. this is made possible through learning a novel Collective Transition prior, which leads to a robust approach for group segregation in public spaces. From the prior, we further devise a rich set of group property visual descriptors. these descriptors are scene-independent, and can be effectively applied to public-scene with variety of crowd densities and distributions. Extensive experiments on hundreds of public scene video clips demonstrate that such property descriptors are not only useful but also necessary for group state analysis and crowd scene understanding.
Relationships among objects play a crucial role in image understanding. Despite the great success of deep learning techniques in recognizing individual objects, reasoning about the relationships among objects remains ...
详细信息
ISBN:
(纸本)9781538604571
Relationships among objects play a crucial role in image understanding. Despite the great success of deep learning techniques in recognizing individual objects, reasoning about the relationships among objects remains a challenging task. Previous methods often treat this as a classification problem, considering each type of relationship (e.g. "ride") or each distinct visual phrase (e.g. "person-ride-horse") as a category. Such approaches are faced with significant difficulties caused by the high diversity of visual appearance for each kind of relationships or the large number of distinct visual phrases. We propose an integrated framework to tackle this problem. At the heart of this framework is the Deep Relational Network, a novel formulation designed specifically for exploiting the statistical dependencies between objects and their relationships. On two large data sets, the proposed method achieves substantial improvement over state-of-the-art.
Computational color constancy is a very important topic in computervision and has attracted many researchers' attention. Recently, lots of research has shown the effects of using high level visual content cues fo...
详细信息
ISBN:
(纸本)9780769549897
Computational color constancy is a very important topic in computervision and has attracted many researchers' attention. Recently, lots of research has shown the effects of using high level visual content cues for improving illumination estimation. However, nearly all the existing methods are essentially combinational strategies in which image's content analysis is only used to guide the combination or selection from a variety of individual illumination estimation methods. In this paper, we propose a novel bilayer sparse coding model for illumination estimation that considers image similarity in terms of both low level color distribution and high level image scene content simultaneously. For the purpose, the image's scene content information is integrated with its color distribution to obtain optimal illumination estimation model. the experimental results on real-world image sets show that our algorithm is superior to some prevailing illumination estimation methods, even better than some combinational methods.
We present a new distance measure between sequences that can tackle local temporal distortion and periodic sequences with arbitrary starting points. through viewing the instances of sequences as empirical samples of a...
详细信息
ISBN:
(纸本)9781538604571
We present a new distance measure between sequences that can tackle local temporal distortion and periodic sequences with arbitrary starting points. through viewing the instances of sequences as empirical samples of an unknown distribution, we cast the calculation of the distance between sequences as the optimal transport problem. To preserve the inherent temporal relationships of the instances in sequences, we smooththe optimal transport problem with two novel temporal regularization terms. the inverse difference moment regularization enforces transport with local homogeneous structures, and the KL-divergence with a prior distribution regularization prevents transport between instances with far temporal positions. We show that this problem can be efficiently optimized through the matrix scaling algorithm. Extensive experiments on different datasets with different classifiers show that the proposed distance outperforms the traditional DTW variants and the smoothed optimal transport distance without temporal regularization.
Video-based handwritten Character recognition (VCR) system is a new type of character recognition system with many unique advantages over on-line character recognition system. Its main problem is to effectively extrac...
详细信息
ISBN:
(纸本)0769519504
Video-based handwritten Character recognition (VCR) system is a new type of character recognition system with many unique advantages over on-line character recognition system. Its main problem is to effectively extract stroke dynamic information from video data for character recognition. In this paper, we propose a new stroke extraction algorithm through dynamic stroke information analysis for a VCR system. the experimental results on over 3000 video character sequences show that our system can extract the chinese character stroke dynamic information similar to an on-line system.
Scene parsing is challenging for unrestricted open vocabulary and diverse scenes. In this paper, we exploit the capability of global context information by different-region-based context aggregation through our pyrami...
详细信息
ISBN:
(纸本)9781538604571
Scene parsing is challenging for unrestricted open vocabulary and diverse scenes. In this paper, we exploit the capability of global context information by different-region-based context aggregation through our pyramid pooling module together withthe proposed pyramid scene parsing network (PSPNet). Our global prior representation is effective to produce good quality results on the scene parsing task, while PSPNet provides a superior framework for pixel-level prediction. the proposed approach achieves state-of-the-art performance on various datasets. It came first in ImageNet scene parsing challenge 2016, PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields the new record of mIoU accuracy 85.4% on PASCAL VOC 2012 and accuracy 80.2% on Cityscapes.
Real-life applications of neural networks require a high degree of success, usability and reliability. Image processing has an importance for both data preparation and human vision to increase the success and reliabil...
详细信息
ISBN:
(纸本)354029032X
Real-life applications of neural networks require a high degree of success, usability and reliability. Image processing has an importance for both data preparation and human vision to increase the success and reliability of patternrecognition applications. the combination of both image processing and neural networks can provide sufficient and robust solutions to problems where intelligent recognition is required. this paper presents an implementation of neural networks for the recognition of various banknotes. One combined neural network will be trained to recognize all the banknotes of the Turkish Lira and the Cyprus Pound;as they are the main currencies used in Cyprus. the flexibility, usability and reliability of this Intelligent Banknote Identification System (IBIS) will be shown through the results and a comparison will be drawn between using separate neural networks or a combined neural network for each currency.
the article discusses the development of an information system for organizing the work of a production workshop, capable of automatically assessing the quality of products, recognizing defects on products using comput...
详细信息
Images and videos are often characterized by multiple types of local descriptors such as SIFT, HOG and HOF, each of which describes certain aspects of object feature. recognition systems benefit from fusing multiple t...
详细信息
ISBN:
(纸本)9781479951178
Images and videos are often characterized by multiple types of local descriptors such as SIFT, HOG and HOF, each of which describes certain aspects of object feature. recognition systems benefit from fusing multiple types of these descriptors. Two widely applied fusion pipelines are descriptor concatenation and kernel average. the first one is effective when different descriptors are strongly correlated, while the second one is probably better when descriptors are relatively independent. In practice, however, different descriptors are neither fully independent nor fully correlated, and previous fusion methods may not be satisfying. In this paper, we propose a new global representation, Multi-View Super Vector (MVSV), which is composed of relatively independent components derived from a pair of descriptors. Kernel average is then applied on these components to produce recognition result. To obtain MVSV, we develop a generative mixture model of probabilistic canonical correlation analyzers (M-PCCA), and utilize the hidden factors and gradient vectors of M-PCCA to construct MVSV for video representation. Experiments on video based action recognition tasks show that MVSV achieves promising results, and outperforms FV and VLAD with descriptor concatenation or kernel average fusion strategy.
暂无评论