Content fingerprinting is a powerful solution for media indexing, searching and digital right management, in which the perceptual content of digital media is summarized to a robust and discriminative digest. In this l...
详细信息
Content fingerprinting is a powerful solution for media indexing, searching and digital right management, in which the perceptual content of digital media is summarized to a robust and discriminative digest. In this letter, we develop a general paradigm for image fingerprinting by exploiting the capability of sparse coding in capturing the visual characteristics of digital image. Furthermore, the impact of the dictionary for sparse coding on the performance of fingerprinting algorithm is analyzed. Accordingly, the problem of dictionary learning is studied in the context of content fingerprinting by incorporating the robustness and discriminability requirements. Comparative experiments indicate that the proposed work exhibits much higher content identification accuracy than the state-of-the-art ones, and the dictionary learned by the proposed work can substantially improve the performance of fingerprinting algorithm. In addition, our algorithm is highly efficient, and its average fingerprint computation time is less than 0.024s.
Sign language, for deaf-impaired people, plays an important role in communicationIn this paper, we devise a Taiwan Sign Language recognition systemWe use the Kinect2 sensor to get data from 94 sign morphemes shown onc...
详细信息
Sign language, for deaf-impaired people, plays an important role in communicationIn this paper, we devise a Taiwan Sign Language recognition systemWe use the Kinect2 sensor to get data from 94 sign morphemes shown once by 4 people, and extract hand shape features and trajectory features from depth images and joints of the body skeletonFinally, we have each sign morpheme dictionary trained by label consistent K-SVD(LC-KSVD) sparse coding algorithm for recognitionExperiments show our system performs well and the accuracy achieves 99.47% in close test.
Manifold regularized sparse coding shows promising performance for various applications. The key issue that must be considered in the application is how to adaptively select the suitable graph hyper-parameters in mani...
详细信息
Manifold regularized sparse coding shows promising performance for various applications. The key issue that must be considered in the application is how to adaptively select the suitable graph hyper-parameters in manifold learning for the sparse coding task. Usually, cross validation is applied, but it does not necessarily scale up and easily leads to overfitting. In this article, multiple graph sparse coding (MGrSc) and multiple Hypergraph sparse coding (MHGrSc) for image representation are proposed. Inspired by the Ensemble Manifold Regularizer, we formulate multiple graph and multiple Hypergraph regularizers to guarantee the smoothness of sparse codes along the geodesics of a data manifold, which is characterized by fusing the multiple previously given graph Laplacians or Hypergraph Laplacians. Then, the proposed regularziers, respectively, are incorporated into the traditional sparse coding framework, which results in two unified objective functions of sparse coding. Alternating optimization is used to optimize the objective functions, and two, novel manifold regularized sparse coding algorithms are presented. The proposed two sparse coding methods learn both the composite manifold and the sparse coding jointly, and it is fully automatic for learning the graph hyper-parameters in the manifold learning. Image clustering tests on real world datasets demonstrated that the proposed sparse coding methods are superior to the state-of-the-art methods. (C) 2014 Elsevier B.V. All rights reserved.
Although sparse coding has emerged as an extremely powerful tool for texture and image classification, it neglects the relationship of coding coefficients from the same class in the training stage, which may cause a d...
详细信息
Although sparse coding has emerged as an extremely powerful tool for texture and image classification, it neglects the relationship of coding coefficients from the same class in the training stage, which may cause a decline in the classification performance. In this paper, we propose a novel coding strategy named compact sparse coding for ground-based cloud classification. We add a constraint on coding coefficients into the objective function of traditional sparse coding. In this way, coding coefficients from the same class can be forced to their mean vector, making them more compact and discriminative. Experiments demonstrate that our method achieves better performance than the state-of-the-art methods.
To achieve the aim of classification for polarimetric synthetic aperture radar(Pol SAR) images, the supervised classification approach based on sparse coding of covariance matrix is proposed in this paper. Being diffe...
详细信息
To achieve the aim of classification for polarimetric synthetic aperture radar(Pol SAR) images, the supervised classification approach based on sparse coding of covariance matrix is proposed in this paper. Being different from traditional classification methods which are based on polarization features extraction or statistical distribution models, our method research the sparse coding algorithm for covariance matrices under the circumstances of Riemannian manifold. The proposed method first obtains the coding dictionary by using k-means clustering. Then, each covariance matrix is decomposed into the sparse linear combination of those atoms in the coding dictionary via Riemannian sparse coding approach. Finally, the sparse coding coefficients are used as the feature vectors to obtain the final classification results with support vector machine(SVM) classifier. The experimental results of different real Pol SAR images demonstrate that our method is effectiveness.
Many approaches to transform classification problems from non-linear to linear by feature transformation have been recently presented in the literature. These notably include sparse coding methods and deep neural netw...
详细信息
Many approaches to transform classification problems from non-linear to linear by feature transformation have been recently presented in the literature. These notably include sparse coding methods and deep neural networks. However, many of these approaches require the repeated application of a learning process upon the presentation of unseen data input vectors, or else involve the use of large numbers of parameters and hyper-parameters, which must be chosen through cross-validation, thus increasing running time dramatically. In this paper, we propose and experimentally investigate a new approach for the purpose of overcoming limitations of both kinds. The proposed approach makes use of a linear auto-associative network (called SCNN) with just one hidden layer. The combination of this architecture with a specific error function to be minimized enables one to learn a linear encoder computing a sparse code which turns out to be as similar as possible to the sparse coding that one obtains by re-training the neural network. Importantly, the linearity of SCNN and the choice of the error function allow one to achieve reduced running time in the learning phase. The proposed architecture is evaluated on the basis of two standard machine learning tasks. Its performances are compared with those of recently proposed non-linear auto-associative neural networks. The overall results suggest that linear encoders can be profitably used to obtain sparse data representations in the context of machine learning problems, provided that an appropriate error function is used during the learning phase. (c) 2014 Elsevier B.V. All rights reserved.
Recently, sparse coding has attracted considerable attention in speech processing. As a promising technique, sparse coding can be widely used for analysis, representation, compression, denoising and separation of spee...
详细信息
Recently, sparse coding has attracted considerable attention in speech processing. As a promising technique, sparse coding can be widely used for analysis, representation, compression, denoising and separation of speech. To represent signals accurately and sparsely, a good dictionary which contains elemental signals is preferred and many methods have been propesed to learn such a dictionary. However, there is a lack of reasonable evaluation methods to judge whether a dictionary is good enough. To solve this problem, we define a group of measures for dictionary evaluation. These measures not only address sparseness and reconstruction error of signal representation, but also consider denoising and separating performance. We show how to evaluate dictionaries with these measures, and further propose two methods to optimize dictionaries by improving relative measures. The first method improves the efficiency of sparse coding by removing unimportant atoms;the second one improves denoising performance of dictionaries by removing harmful atoms. Experimental results show that the measures can provide reasonable evaluations and the proposed methods for optimization can further improve given dictionaries. (C) 2015 Published by Elsevier Inc.
In this paper, we investigate order-preserving sparse coding for classifying structured data whose atomic features possess ordering relationships. Examples include time sequences where individual frame-wise features a...
详细信息
In this paper, we investigate order-preserving sparse coding for classifying structured data whose atomic features possess ordering relationships. Examples include time sequences where individual frame-wise features are temporally ordered, as well as still images (landscape, street view, etc.) where different regions of the image are spatially ordered. Classification of these structured data is often tackled by first decomposing the input data into individual atomic features, then performing sparse coding or other processing for each atomic feature vector independently, and finally aggregating individual responses to classify the input data. However, this heuristic approach ignores the underlying order of the individual atomic features within the input data, and results in suboptimal discriminative capability. In this work, we introduce an order preserving regularizer which aims to preserve the ordering structure of the reconstruction coefficients within the sparse coding framework. An efficient Nesterov-type smooth approximation method is developed for optimization of the new regularization criterion, with theoretically guaranteed error bound. We perform extensive experiments for time series classification on a synthetic dataset, several machine learning benchmarks, and an RGB-D human activity dataset. We also report experiments for scene classification on a benchmark image dataset. The encoded representation is discriminative and robust, and our classifier outperforms state-of-the-art methods on these tasks.
This work presents an approach to category-based action recognition in video using sparse coding techniques. The proposed approach includes two main contributions: i) A new method to handle intra-class variations by d...
详细信息
ISBN:
(纸本)9781467388528
This work presents an approach to category-based action recognition in video using sparse coding techniques. The proposed approach includes two main contributions: i) A new method to handle intra-class variations by decomposing each video into a reduced set of representative atomic action acts or key-sequences, and ii) A new video descriptor, ITRA: Inter-Temporal Relational Act Descriptor, that exploits the power of comparative reasoning to capture relative similarity relations among key-sequences. In terms of the method to obtain key-sequences, we introduce a loss function that, for each video, leads to the identification of a sparse set of representative key-frames capturing both, relevant particularities arising in the input video, as well as relevant generalities arising in the complete class collection. In terms of the method to obtain the ITRA descriptor, we introduce a novel scheme to quantify relative intra and inter-class similarities among local temporal patterns arising in the videos. The resulting ITRA descriptor demonstrates to be highly effective to discriminate among action categories. As a result, the proposed approach reaches remarkable action recognition performance on several popular benchmark datasets, outperforming alternative state-of-the-art techniques by a large margin.
Hardware-based computer vision accelerators will be an essential part of future mobile devices to meet the low power and real-time processing requirement. To realize a high energy efficiency and high throughput, the a...
详细信息
Hardware-based computer vision accelerators will be an essential part of future mobile devices to meet the low power and real-time processing requirement. To realize a high energy efficiency and high throughput, the accelerator architecture can be massively parallelized and tailored to vision processing, which is an advantage over software-based solutions and general-purpose hardware. In this work, we present an ASIC that is designed to learn and extract features from images and videos. The ASIC contains 256 leaky integrate-and-fire neurons connected in a scalable two-layer network of 8 8 grids linked in a 4-stage ring. sparse neuron activation and the relatively small grid keep the spike collision probability low to save access arbitration. The weight memory is divided into core memory and auxiliary memory, such that the auxiliary memory is only powered on for learning to save inference power. High-throughput inference is accomplished by the parallel operation of neurons. Efficient learning is implemented by passing parameter update messages, which is further simplified by an approximation technique. A 3.06 mm 65 nm CMOS ASIC test chip is designed to achieve a maximum inference throughput of 1.24 Gpixel/s at 1.0 V and 310 MHz, and on-chip learning can be completed in seconds. To improve the power consumption and energy efficiency, core memory supply voltage can be reduced to 440 mV to take advantage of the error resilience of the algorithm, reducing the inference power to 6.67 mW for a 140 Mpixel/s throughput at 35 MHz.
暂无评论