Textural patterns are often complex, exhibit scale-dependent changes in structure and are difficult to identify and describe. Lacunarity has been proposed as a general method for the analysis of several spatial patter...
详细信息
Textural patterns are often complex, exhibit scale-dependent changes in structure and are difficult to identify and describe. Lacunarity has been proposed as a general method for the analysis of several spatial patterns. Lacunarity data can designate a mathematical index of spatial heterogeneity, therefore the corresponding feature vectors should possess the necessary inter-class statistical properties that would enable them to be used for patternrecognition purposes. The objective of this work is to construct a supervised classification model of binary lacunarity data - computed by Valous et al. (2009) - from pork ham slice (three qualities) surface images, with the aid of kernel principal component analysis (KPCA) and a multilayer perceptron (MLP) neural network, using a portion of informative salient features. According to the principle of parsimony, the smallest possible number of features should be used so as to give an adequate representation of the feature space. Therefore, the dimension of the initial space, comprising of 510 features, was reduced by 90% in order to avoid any noise effects in the subsequent classification. Then, using KPCA, the first nineteen kernel principal components (99.04% of total variance) were extracted from the reduced feature space, and were used as input in the MLP. The correct classification percentages for the training, test and validation sets using the neural classifier were 86.7%, 86.7%, and 85.0%, respectively. The binary lacunarity spatial metric captured relevant information that provided a good level of differentiation among pork ham slice images.
Visual vocabulary is now widely used in many video analysis tasks, such as event detection, video retrieval and video classification. In most approaches the vocabularies are solely based on statistics of visual featur...
详细信息
Visual vocabulary is now widely used in many video analysis tasks, such as event detection, video retrieval and video classification. In most approaches the vocabularies are solely based on statistics of visual features and generated by clustering. Little attention has been paid to the interclass similarity among different events or actions. In this paper, we present a novel approach to mine the interclass visual similarity statistically and then use it to supervise the generation of visual vocabulary. We construct a measurement of interclass similarity, embed the similarity to the Euclidean distance and use the refined distance to generate visual vocabulary iteratively. The experiments in Weizmann and KTH datasets show that our approach outperforms the traditional vocabulary based approach by about 5%.
Local space-time features and bag-of-feature (BOF) representation are often used for action recognition in previous approaches. For complicated human activities, however, the limitation of these approaches blows up be...
详细信息
Local space-time features and bag-of-feature (BOF) representation are often used for action recognition in previous approaches. For complicated human activities, however, the limitation of these approaches blows up because of the local properties of features and the lack of context. This paper addresses the problem by exploiting the spatio-temporal context information between *** first define a spatio-temporal context, which combines the scale invariant spatio-temporal neighberhood of local features with the spatio-temporal relationships between them. Then, we introduce a spatio-temporal context kernel (STCK), which not only takes into account the local properties of features but also considers their spatial and temporal context information. STCK has a promising generalization property and can be plugged into SVMs for activities recognition. The experimental results on challenging activity datasets show that, compared to context-free model, the spatio-temporal context kernel improves the recognition performance.
Due to its invariance to monotonic grayscale transformation and simple computation, Local Binary pattern (LBP) is broadly used as feature extractor in face recognition tasks in recent years [3]. In previous work, peop...
详细信息
ISBN:
(纸本)9781457720086
Due to its invariance to monotonic grayscale transformation and simple computation, Local Binary pattern (LBP) is broadly used as feature extractor in face recognition tasks in recent years [3]. In previous work, people have proposed methods of using Adaboost to select most representative features in samples. Zhang et al. proposed a method applying Adaboost algorithm to select those most distinctive features from which they extract LBP features. Though LBP features selected by Adaboost represent local textures effectively. Their method, however, neglects exploitation of holistic spatial information in nature of image samples. To solve this problem, we proposed the spatial enhanced multi-level boosing using uniform LBP and multilevel Adaboost algorithm. In this paper, we select most distinctive features which then being concatenated to represent spatial information using multi-level boosting algorithm. Experiments on ORL database yielded an exciting recognition rate of 98.96%.
Mean shift is a popular method used in object tracking. The method, which relies on shifting the search area to the weight center of a generated “weight image” to track objects between consecutive frames, acquired a...
详细信息
Mean shift is a popular method used in object tracking. The method, which relies on shifting the search area to the weight center of a generated “weight image” to track objects between consecutive frames, acquired a classifier based framework by using classifiers to generate the weight image. In this work, using multiple classifiers to generate the weight image and calculating contributions of the independent classifiers dynamically by using correlations between histograms of their weight images and histogram of a defined ideal weight image are presented.
For a better expression of images, we propose a Bag of Words approach which is position aware and uses saliency based segmentation to build vocabularies of their own segment. We also apply pLSA algorithm to find hidde...
详细信息
For a better expression of images, we propose a Bag of Words approach which is position aware and uses saliency based segmentation to build vocabularies of their own segment. We also apply pLSA algorithm to find hidden topics in each scene image. The experiments show that the separate representation of the regions found by a rough segmentation increase the classification performance.
Expression variations in facial images is one of the most crucial and difficult problems in face-based computervision applications. Although numerous systems have been proposed for robustness against facial expressio...
详细信息
Expression variations in facial images is one of the most crucial and difficult problems in face-based computervision applications. Although numerous systems have been proposed for robustness against facial expressions, so far it still persists to be an open *** that the knowledge on the type of the expression in a facial image would greatly facilitate the solution of this issue, in this paper we present an analysis for facial expressions classification in 2D frontal views. With the motivation of the success that sparse coding achieved in face recognition, similar principals are applied for to both original and dimension-reduced (via PCA) images and the resulting codes are classified based on two different approaches: minimum residual error and maximum interclass summation of the coefficients. Extensive tests are conducted on Bosphorus database, in which different expressions are available for 105 persons.
State of the art local stereo correspondence algorithms that adapt their supports to image content allow to infer very accurate disparity maps often comparable to algorithms based on global disparity optimization meth...
详细信息
State of the art local stereo correspondence algorithms that adapt their supports to image content allow to infer very accurate disparity maps often comparable to algorithms based on global disparity optimization methods. However, despite their effectiveness, accurate local approaches based on this methodology are also computationally expensive and several simplifications aimed at reducing their computational load have been proposed. Unfortunately, compared to the original approaches, the effectiveness of most of these simplified techniques is significantly reduced. In this paper, we consider an efficient and accurate algorithm referred to as Fast Bilateral Stereo (FBS) that enables to efficiently obtain results comparable to state of the art local approaches describing its mapping on GPUs with CUDA. Experimental results on two NVIDIA GPUs show that our CUDA implementation delivers, on standard stereo pairs, accurate and dense disparity maps in near real-time achieving speedup greater than 100X with respect to the equivalent CPU-based implementation.
We propose a novel framework to recognize human-vehicle interactions from aerial video. In this scenario, the object resolution is low, the visual cues are vague, and the detection and tracking of objects are less rel...
详细信息
We propose a novel framework to recognize human-vehicle interactions from aerial video. In this scenario, the object resolution is low, the visual cues are vague, and the detection and tracking of objects are less reliable us a consequence. Any methods that require, the accurate tracking of objects or the exact matching of event definition are better avoided. To address these issues, we present a temporal logic based approach which does not require training from event examples. At the low-level, we employ dynamic programming to perform fast model fitting between the tracked vehicle and the rendered 3-D vehicle models. At the semantic-level, given the localized event region of interest (ROI), we verify the time series of human-vehicle relationships with the pre-specified event definitions in a piecewise fashion. With special interest in recognizing a person getting into and out of a vehicle, we have tested our method on a subset of the VIRAT Aerial Video dataset [ ] and achieved superior results. Our framework can be easily extended to recognize other types of human-vehicle interactions.
暂无评论