Recently, a trend in speech recognition is to introduce sparse coding for noise robustness. Although several methods have been proposed, the performance of sparse coding in speech denoising is not so optimistic. One a...
详细信息
Recently, a trend in speech recognition is to introduce sparse coding for noise robustness. Although several methods have been proposed, the performance of sparse coding in speech denoising is not so optimistic. One assumption with sparse coding is that the representation of speech over the speech dictionary is sparse, while that of the noise is dense. This assumption is obviously not sustained in the speech denoising scenario. Many noises are also sparse over the speech dictionary. In such a condition, the representation of noisy speech still contains noise components, resulting in degraded performance. To solve this problem, we first analyze the assumption of sparse coding and then propose a novel method to enhance speech spectrum. This method first finds out the atoms which represent the noise sparsely, and then selectively ignores them in the reconstruction of speech to reduce the residual noise. Speech features are then extracted from the enhanced spectrum for speech recognition. Experimental results show that the proposed method can improve the noise robustness of a speech recognition system substantially. (C) 2015 Elsevier Inc. All rights reserved.
The scarcity of labeled data and the high-dimensionality of multimedia data are the major obstacles for image classification. Due to these concerns, this paper proposes a novel algorithm, Iterative Semi-supervised Spa...
详细信息
The scarcity of labeled data and the high-dimensionality of multimedia data are the major obstacles for image classification. Due to these concerns, this paper proposes a novel algorithm, Iterative Semi-supervised sparse coding (ISSC), which jointly explores the advantages of both sparse coding and graph-based semi-supervised learning in order to learn discriminative sparse codes as well as an effective classification function. The ISSC algorithm fully exploits initial labels and the subsequently predicted labels for sparse codes learning. At the same time, during the graph-based semi-supervised learning stage, similarity matrix is firstly adjusted through the latest learned sparse codes, and then is utilized to obtain a better classification function. To make the ISSC scale up to larger databases, a novel online dictionary learning algorithm is also proposed to update the dictionary incrementally. In particular, by solving quadratic optimization, the ISSC approach can give rise to closed-form solutions for sparse codes and classification function, respectively. It has been extensively evaluated over three widely used datasets for image classification task. The experimental results in terms of classification accuracy demonstrate the proposed ISSC approach can achieve significant performance improvements with respect to the state-of-the-arts.
Recently, different methods for obtaining sparse representations of a signal using dictionaries of waveforms have been studied. They are often motivated by the way the brain seems to process certain sensory signals. A...
详细信息
Recently, different methods for obtaining sparse representations of a signal using dictionaries of waveforms have been studied. They are often motivated by the way the brain seems to process certain sensory signals. Algorithms have been developed using a specific criterion to choose the waveforms occurring in the representation. The waveforms are choosen from a fixed dictionary and some algorithms also construct them as a part of the method. In the case of speech signals, most approaches do not take into consideration the important temporal correlations that are exhibited. It is known that these correlations are well approximated by linear models. Incorporating this a priori knowledge of the signal can facilitate the search for a suitable representation solution and also can help with its interpretation. Lewicki proposed a method to solve the noisy and overcomplete independent component analysis problem. In the present paper we propose a modification of this statistical technique for obtaining a sparse representation using a generative parametric model. The representations obtained with the method proposed here and other techniques are applied to artificial data and real speech signals, and compared using different coding costs and sparsity measures. The results show that the proposed method achieves more efficient representations of these signals compared to the others. A qualitative analysis of these results is also presented, which suggests that the restriction imposed by the parametric model is helpful in discovering meaningful characteristics of the signals. (c) 2005 Elsevier B.V. All rights reserved.
Most face recognition approaches developed so far regard the sparse coding as one of the essential means, while the sparse coding models have been hampered by the extremely expensive computational cost in the implemen...
详细信息
Most face recognition approaches developed so far regard the sparse coding as one of the essential means, while the sparse coding models have been hampered by the extremely expensive computational cost in the implementation. In this paper, a novel scheme for the fast face recognition is presented via extreme learning machine (ELM) and sparse coding. The common feature hypothesis is first introduced to extract the basis function from the local universal images, and then the single hidden layer feedforward network (SLFN) is established to simulate the sparse coding process for the face images by ELM algorithm. Some developments have been done to maintain the efficient inherent information embedding in the ELM learning. The resulting local sparse coding coefficient will then be grouped into the global representation and further fed into the ELM ensemble which is composed of a number of SLFNs for face recognition. The simulation results have shown the good performance in the proposed approach that could be comparable to the state-of-the-art techniques at a much higher speed.
Some linear sparse coding models have been proposed for modeling responses in the early stage of the visual system, but nonlinear operations are ubiquitous in visual cortex. So we put forward an associative sparse cod...
详细信息
Some linear sparse coding models have been proposed for modeling responses in the early stage of the visual system, but nonlinear operations are ubiquitous in visual cortex. So we put forward an associative sparse coding neural network (ASCNN) with nonlinear response property in top-layer coding units. In our ASCNN model, the choices of sparseness function must correspond to the characters of activation function. This paper gives several reasonable and efficient methods for constructing sparseness functions and activation functions. Experiment on benchmark natural image dataset shows that our model can successfully simulate receptive field and nonlinear sparse response property of simple cells. Moreover, in two recognition tasks on face images and handwritten digits, experimental results show that our model works much better than linear sparse coding model (sparsenet) by combining with linear neural network classifier. (C) 2009 Elsevier B.V. Ali rights reserved.
In this paper, we propose an adaptive face and ear based bimodal recognition framework using sparse coding, namely ABSRC, which can effectively reduce the adverse effect of degraded modality. A unified and reliable bi...
详细信息
In this paper, we propose an adaptive face and ear based bimodal recognition framework using sparse coding, namely ABSRC, which can effectively reduce the adverse effect of degraded modality. A unified and reliable biometric quality measure based on sparse coding is presented for both face and ear, which relies On the collaborative representation by all classes. For adaptive feature fusion, a flexible piecewise function is carefully designed to select feature weights based on their qualities. ABSRC utilizes a two-phase sparse coding strategy. At first, face and ear features are separately coded on their associated dictionaries for individual quality assessments. Secondly, the weighted features are concatenated to form a unique feature vector, which is then coded and classified in multimodal feature space. Experiments demonstrate that ABSRC achieves quite encouraging robustness against image degeneration, and outperforms many up-to-date methods. Very impressively, even when query sample of one modality is extremely degraded by random pixel corruption, illumination variation, etc., ABSRC can still get performance comparable to the unimodal recognition based on the other modality. (C) 2014 Elsevier B.V. All rights reserved.
We present the learning algorithm orthogonal sparse coding (OSC) to find an orthogonal basis in which a given data set has a maximally sparse representation. OSC is based on stochastic descent by Hebbian-like updates ...
详细信息
We present the learning algorithm orthogonal sparse coding (OSC) to find an orthogonal basis in which a given data set has a maximally sparse representation. OSC is based on stochastic descent by Hebbian-like updates and Gram-Schmidt orthogonalizations, and is motivated by an algorithm that we introduce as the canonical approach (CA). First, we evaluate how well OSC can recover a generating basis from synthetic data. We show that, in contrast to competing methods, OSC can recover the generating basis for quite low and, remarkably, unknown sparsity levels. Moreover, on natural image patches and on images of handwritten digits, OSC learns orthogonal bases that attain significantly sparser representations compared to alternative orthogonal transforms. Furthermore, we demonstrate an application of OSC for image compression by showing that the rate-distortion performance can be improved relative to the JPEG standard. Finally, we demonstrate the state-of-the-art image denoising performance of OSC dictionaries. Our results demonstrate the potential of OSC for feature extraction, data compression, and image denoising, which is due to two important aspects: 1) the learned bases are adapted to the signal class, and 2) the sparse approximation problem can be solved efficiently and exactly.
Cognitive radio is an intelligent and adaptive radio that improves the utilization of the spectrum by its opportunistic sharing. However, it is inherently vulnerable to primary user emulation and jamming attacks that ...
详细信息
Cognitive radio is an intelligent and adaptive radio that improves the utilization of the spectrum by its opportunistic sharing. However, it is inherently vulnerable to primary user emulation and jamming attacks that degrade the spectrum utilization. In this paper, an algorithm for the detection of primary user emulation and jamming attacks in cognitive radio is proposed. The proposed algorithm is based on the sparse coding of the compressed received signal over a channel-dependent dictionary. More specifically, the convergence patterns in sparse coding according to such a dictionary are used to distinguish between a spectrum hole, a legitimate primary user, and an emulator or a jammer. The process of decision-making is carried out as a machine learning-based classification operation. Extensive numerical experiments show the effectiveness of the proposed algorithm in detecting the aforementioned attacks with high success rates. This is validated in terms of the confusion matrix quality metric. Besides, the proposed algorithm is shown to be superior to energy detection-based machine learning techniques in terms of receiver operating characteristics curves and the areas under these curves.
In this paper, we present a novel bottom-up saliency detection algorithm from the perspective of covariance matrices on a Riemannian manifold. Each superpixel is described by a region covariance matrix on Riemannian M...
详细信息
In this paper, we present a novel bottom-up saliency detection algorithm from the perspective of covariance matrices on a Riemannian manifold. Each superpixel is described by a region covariance matrix on Riemannian Manifolds. We carry out a two-stage sparse coding scheme via Log-Euclidean kernels to extract salient objects efficiently. In the first stage, given background dictionary on image borders, sparse coding of each region covariance via Log-Euclidean kernels is performed. The reconstruction error on the background dictionary is regarded as the initial saliency of each superpixel. In the second stage, an improvement of the initial result is achieved by calculating reconstruction errors of the superpixels on foreground dictionary, which is extracted from the first stage saliency map. The sparse coding in the second stage is similar to the first stage, but is able to effectively highlight the salient objects uniformly from the background. Finally, three post-processing methods - highlight-inhibition function, context based saliency weighting, and the graph cut - are adopted to further refine the saliency map. Experiments on four public benchmark datasets show that the proposed algorithm outperforms the state-of-the-art methods in terms of precision, recall and mean absolute error, and demonstrate the robustness and efficiency of the proposed method. (C) 2017 Elsevier Ltd. All rights reserved.
Local spatio-temporal features are popular in the human action recognition task. In practice, they are usually coupled with a feature encoding approach, which helps to obtain the video-level vector representations tha...
详细信息
Local spatio-temporal features are popular in the human action recognition task. In practice, they are usually coupled with a feature encoding approach, which helps to obtain the video-level vector representations that can be used in learning and recognition. In this paper, we present an efficient local feature encoding approach, which is called Approximate sparse coding (ASC). ASC computes the sparse codes for a large collection of prototype local feature descriptors in the off-line learning phase using sparse coding (SC) and look up the nearest prototype's precomputed sparse code for each to-be-encoded local feature in the encoding phase using Approximate Nearest Neighbour (ANN) search. It shares the low dimensionality of SC and the high speed of ANN, which are both desired properties for a local feature encoding approach. ASC has been excessively evaluated on the KTH dataset and the HMDB51 dataset. We confirmed that it is able to encode large quantity of local video features into discriminative low dimensional representations efficiently.
暂无评论