Motivated from the need of automation of plant speciess recognition and availability of digital databases of plants,we propose an image based identification of speciess of plant. These images may belong to different o...
详细信息
ISBN:
(纸本)9781467373104
Motivated from the need of automation of plant speciess recognition and availability of digital databases of plants,we propose an image based identification of speciess of plant. These images may belong to different organs of the plants such as leaf, stem or bark, flower and fruit. Different methods for recognition of the speciess are used according to the part of the plant to which the image belongs to. For flower category, fusion of shape, color and texture features are used. For other categories like stem, fruit, leaf and leafscan, sparsely coded SIFT features pooled with Spatial pyramid matching approach is used. To cater the seasonal and topographic influences on the appearance of the plant, our system also uses metadata i.e. content, date, time, latitude, longitude associated with images to aid the identification process and obtain more accurate results. For a given image of plant and associated metadata, the system recognizes the speciess of the given plant image and produces an output that contains the Family, Genus, and Speciess name. The proposed framework is implemented and tested on ImageClef data with 50 different classes of speciess. Maximum accuracy of 98% is attained in leaf scan sub-category whereas minimum accuracy is achieved in fruit sub-category which is 67.3 %.
Auditory neurons can be characterized by a spectro-temporal receptive field, the kernel of a linear filter model describing the neuronal response to a stimulus. With a view to better understanding the tuning propertie...
详细信息
Auditory neurons can be characterized by a spectro-temporal receptive field, the kernel of a linear filter model describing the neuronal response to a stimulus. With a view to better understanding the tuning properties of these cells, the receptive fields of neurons in the zebra finch auditory fore-brain are compared to a set of artificial kernels generated under the assumption of sparseness;that is, the assumption that in the sensory pathway only a small number of neurons need be highly active at any time. The sparse kernels are calculated by finding a sparse basis for a corpus of zebra-finch songs. This calculation is complicated by the highly-structured nature of the songs and requires regularization. The sparse kernels and the receptive fields, though differing in some respects, display several significant similarities, which are described by computing quantative properties such as the seperability index and Q-factor. By comparison, an identical calculation performed on human speech recordings yields a set of kernels which exhibit widely different tuning. These findings imply that Field L neurons are specifically adapted to sparsely encode birdsong and supports the idea that sparsification may be an important element of early sensory processing.
Texture parsing benefits attribute-based clothing analysis and related applications, such as clothing retrieval and recognition. To deal with the large variations of clothing textures, in this paper, a new method is p...
详细信息
ISBN:
(纸本)9781479957514
Texture parsing benefits attribute-based clothing analysis and related applications, such as clothing retrieval and recognition. To deal with the large variations of clothing textures, in this paper, a new method is presented in which refined texture attributes are parsed. Based on the characteristics of clothing textures, refined texture attributes are proposed and parameterized. To estimate the attribute parameters, we exploit the discriminative meanings of sparse codes: the underlying connections between the attribute parameters and each component of sparse codes. The attribute parameters are mapped from the dominating components of sparse codes. Our experiments demonstrate the effectiveness of the proposed method.
In crowded scenes, the extracted low-level features, such as optical flow or spatio-temporal interest point, are inevitably noisy and uncertainty. In this paper, we propose a fully unsupervised non-negative sparse cod...
详细信息
In crowded scenes, the extracted low-level features, such as optical flow or spatio-temporal interest point, are inevitably noisy and uncertainty. In this paper, we propose a fully unsupervised non-negative sparse coding based approach for abnormality event detection in crowded scenes, which is specifically tailored to cope with feature noisy and uncertainty. The abnormality of query sample is decided by the sparse reconstruction cost from an atomically learned event dictionary, which forms a sparse coding bases. In our algorithm, we formulate the task of dictionary learning as a non-negative matrix factorization (NMF) problem with a sparsity constraint. We take the robust Earth Mover's Distance (EMD), instead of traditional Euclidean distance, as distance metric reconstruction cost function. To reduce the computation complexity of EMD, an approximate EMD, namely wavelet EMD, is introduced and well combined into our approach, without losing performance. In addition, the combination of wavelet EMD with our approach guarantees the convexity of optimization in dictionary learning. To handle both local abnormality detection (LAD) and global abnormality detection, we adopt two different types of spatio-temporal basis. Experiments conducted on four public available datasets demonstrate the promising performance of our work against the state-of-the-art methods. (C) 2013 Elsevier Ltd. All rights reserved.
Dynamic scene analysis has become a popular research area especially in video surveillance. The goal of this paper is to mine semantic motion patterns and detect abnormalities deviating from normal ones occurring in c...
详细信息
Dynamic scene analysis has become a popular research area especially in video surveillance. The goal of this paper is to mine semantic motion patterns and detect abnormalities deviating from normal ones occurring in complex dynamic scenarios. To address this problem, we propose a data-driven and scene-independent approach, namely, Bilayer sparse topic model (BiSTM), where a given surveillance video is represented by a word-document hierarchical generative process. In this BiSTM, motion patterns are treated as latent topics sparsely distributed over low-level motion vectors, whereas a video clip can be sparsely reconstructed by a mixture of topics (motion pattern). In addition to capture the characteristic of extreme imbalance between numerous typical normal activities and few rare abnormalities in surveillance video data, a one-class constraint is directly imposed on the distribution of documents as a discriminant priori. By jointly learning topics and one-class document representation within a discriminative framework, the topic (pattern) space is more specific and explicit. An effective alternative iteration algorithm is presented for the model learning. Experimental results and comparisons on various public data sets demonstrate the promise of the proposed approach.
This paper presents the first theoretical results showing that stable identification of overcomplete µ-coherent dictionaries Φ ε Rd×K is locally possible from training signals with sparsity levels S up to ...
详细信息
This paper presents the first theoretical results showing that stable identification of overcomplete µ-coherent dictionaries Φ ε Rd×K is locally possible from training signals with sparsity levels S up to the order O(µ-2) and signal to noise ratios up to O(√d). In particular the dictionary is recoverable as the local maximum of a new maximization criterion that generalizes the K-means criterion. For this maximization criterion results for asymptotic exact recovery for sparsity levels up to O(µ-1) and stable recovery for sparsity levels up to O(µ-2) as well as signal to noise ratios up to O(√d) are provided. These asymptotic results translate to finite sample size recovery results with high probability as long as the sample size N scales as O(K3dsε-2), where the recovery precision ε can go down to the asymptotically achievable precision. Further, to actually and the local maxima of the new criterion, a very simple Iterative Thresholding and K (signed) Means algorithm (ITKM), which has complexity O(dKN) in each iteration, is presented and its local efficiency is demonstrated in several experiments.
sparse coding is an efficient way of coding information. In a sparse code most of the code elements are zero;very few are active. sparse codes are intended to correspond to the spike trains with which biological neuro...
详细信息
sparse coding is an efficient way of coding information. In a sparse code most of the code elements are zero;very few are active. sparse codes are intended to correspond to the spike trains with which biological neurons communicate. In this article, we show how sparse codes can be used to do continuous speech recognition. We use the TIDIGITS dataset to illustrate the process. First a waveform is transformed into a spectrogram, and a sparse code for the spectrogram is found by means of a linear generative model. The spike train is classified by making use of a spike train model and dynamic programming. It is computationally expensive to find a sparse code. We use an iterative subset selection algorithm with quadratic programming for this process. This algorithm finds a sparse code in reasonable time if the input is limited to a fairly coarse spectral resolution. At this resolution, our system achieves a word error rate of 19%, whereas a system based on Hidden Markov Models achieves a word error rate of 15% at the same resolution. (C) 2008 Elsevier Ltd. All rights reserved.
sparse approximation is a hypothesized coding strategy where a population of sensory neurons (e. g. V1) encodes a stimulus using as few active neurons as possible. We present the Spiking LCA (locally competitive algor...
详细信息
sparse approximation is a hypothesized coding strategy where a population of sensory neurons (e. g. V1) encodes a stimulus using as few active neurons as possible. We present the Spiking LCA (locally competitive algorithm), a rate encoded Spiking Neural Network (SNN) of integrate and fire neurons that calculate sparse approximations. The Spiking LCA is designed to be equivalent to the nonspiking LCA, an analog dynamical system that converges on a l(1)-norm sparse approximations exponentially. We show that the firing rate of the Spiking LCA converges on the same solution as the analog LCA, with an error inversely proportional to the sampling time. We simulate in NEURON a network of 128 neuron pairs that encode 8x8 pixel image patches, demonstrating that the network converges to nearly optimal encodings within 20ms of biological time. We also show that when using more biophysically realistic parameters in the neurons, the gain function encourages additional l(0)-norm sparsity in the encoding, relative both to ideal neurons and digital solvers.
Image denoising is a well explored topic in the field of image processing. In the past several decades, the progress made in image denoising has benefited from the improved modeling of natural images. In this paper, w...
详细信息
Image denoising is a well explored topic in the field of image processing. In the past several decades, the progress made in image denoising has benefited from the improved modeling of natural images. In this paper, we introduce a new taxonomy based on image representations for a better understanding of state-of-the-art image denoising techniques. Within each category, several representative algorithms are selected for evaluation and comparison. The experimental results are discussed and analyzed to determine the overall advantages and disadvantages of each category. In general, the nonlocal methods within each category produce better denoising results than local ones. In addition, methods based on overcomplete representations using learned dictionaries perform better than others. The comprehensive study in this paper would serve as a good reference and stimulate new research ideas in image denoising.
In complex visual recognition tasks, it is typical to adopt multiple descriptors, which describe different aspects of the images, for obtaining an improved recognition performance. Descriptors that have diverse forms ...
详细信息
In complex visual recognition tasks, it is typical to adopt multiple descriptors, which describe different aspects of the images, for obtaining an improved recognition performance. Descriptors that have diverse forms can be fused into a unified feature space in a principled manner using kernel methods. sparse models that generalize well to the test data can be learned in the unified kernel space, and appropriate constraints can be incorporated for application in supervised and unsupervised learning. In this paper, we propose to perform sparse coding and dictionary learning in the multiple kernel space, where the weights of the ensemble kernel are tuned based on graph-embedding principles such that class discrimination is maximized. In our proposed algorithm, dictionaries are inferred using multiple levels of 1D subspace clustering in the kernel space, and the sparse codes are obtained using a simple levelwise pursuit scheme. Empirical results for object recognition and image clustering show that our algorithm outperforms existing sparse coding based approaches, and compares favorably to other state-of-the-art methods.
暂无评论