Data provided by most optical satellites, such as Quick Bird, IKONOS, Geo Eye-1 and World View-2, are composed of several Low-Resolution Multispectral(LRMS) bands and a High-Resolution Panchromatic(HRP) image. The pro...
详细信息
Data provided by most optical satellites, such as Quick Bird, IKONOS, Geo Eye-1 and World View-2, are composed of several Low-Resolution Multispectral(LRMS) bands and a High-Resolution Panchromatic(HRP) image. The process of combining an LRMS image and its corresponding HRP image is called as pans-sharpening or image fusion, which aims to obtain a HRMS image. In this paper, we propose a new pan-sharpening method with sparse coding technique. We suppose that the HRMS and LRMS image pairs share the same sparse coefficients over the dictionary pairs constructed by the HRP image and the Low-Resolution Panchromatic(LRP) image. For each image patch of LRMS image, the low-resolution dictionary is constructed by selecting the K closest image patches to it from the LRP image patches. In addition, we add some feature vectors of the selected image patches into the low-resolution ***, the corresponding high-resolution dictionary with the same form is designed for each image patch of HRMS image. The sparse vector for each patch of LRMS image can be obtained by sparse coding technique. Finally, we can obtain the HRMS image patches by multiplying the sparse vectors and the corresponding high-resolution dictionaries. The proposed method is compared with the state-of-the-art fusion methods. Quantitative assessment results demonstrate that the proposed method has superior performance.
sparse coding models of natural images and sounds have been able to predict several response properties of neurons in the visual and auditory systems. While the success of these models suggests that the structure they...
详细信息
sparse coding models of natural images and sounds have been able to predict several response properties of neurons in the visual and auditory systems. While the success of these models suggests that the structure they capture is universal across domains to some degree, it is not yet clear which aspects of this structure are universal and which vary across sensory modalities. To address this, we fit complete and highly overcomplete sparse coding models to natural images and spectrograms of speech and report on differences in the statistics learned by these models. We find several types of sparse features in natural images, which all appear in similar, approximately Laplace distributions, whereas the many types of sparse features in speech exhibit a broad range of sparse distributions, many of which are highly asymmetric. Moreover, individual sparse coding units tend to exhibit higher lifetime sparseness for overcomplete models trained on images compared to those trained on speech. Conversely, population sparseness tends to be greater for these networks trained on speech compared with sparse coding models of natural images. To illustrate the relevance of these findings to neural coding, we studied how they impact a biologically plausible sparse coding network's representations in each sensory modality. In particular, a sparse coding network with synaptically local plasticity rules learns different sparse features from speech data than are found by more conventional sparse coding algorithms, but the learned features are qualitatively the same for these models when trained on natural images.
In this paper, we propose an effective scene text recognition method using sparse coding based features, called Histograms of sparse Codes (HSC) features. For character detection, we use the HSC features instead of us...
详细信息
ISBN:
(纸本)9781479957521
In this paper, we propose an effective scene text recognition method using sparse coding based features, called Histograms of sparse Codes (HSC) features. For character detection, we use the HSC features instead of using the Histograms of Oriented Gradients (HOG) features. HSC features are extracted by computing sparse codes with dictionaries, which are learned from data using K-SVD, and aggregating per-pixel sparse codes to form local histograms. For word recognition, we integrate multiple cues including character detection scores and geometric contexts in an objective function. The final recognition result is obtained by searching for the word which corresponds to the maximum value of the objective function. The parameters in the objective function are learned using the Minimum Classification Error (MCE) training method. Experiments on the ICDAR2003 and SVT datasets demonstrate that the HSC-based scene text recognition method outperforms the HOG-based method significantly and achieves the state-of-the-art performance.
To generate the visual codebook, a step of quantization process is obligatory. Several works have proved the efficiency of sparse coding in feature quantization process of BoW based image representation. Furthermore, ...
详细信息
ISBN:
(纸本)9781479928941
To generate the visual codebook, a step of quantization process is obligatory. Several works have proved the efficiency of sparse coding in feature quantization process of BoW based image representation. Furthermore, it is an important method which encodes the original signal in a sparse signal space. Yet, this method neglects the relationships among features. To reduce the impact of this issue, we suggest in this paper, a Laplacian Tensor sparse coding method, which will aim to profit from the relationship among the local features. Precisely, we propose to apply the similarity of tensor descriptors to create a Laplacian Tensor similarity matrix, which can better present in the same time the closeness of local features in the data space and the topological relationship among the spatially near local descriptors. Moreover, we integrate statistical analysis applied to the local features assigned to each visual word in the pooling step. Our experimental results prove that our method prevails or exceeds existing background results.
Image registration as a basic task in image processing has been studied widely in the literature. It is an important preprocessing step in various applications such as medical imaging, super resolution, and remote sen...
详细信息
Image registration as a basic task in image processing has been studied widely in the literature. It is an important preprocessing step in various applications such as medical imaging, super resolution, and remote sensing. In this paper, we proposed a novel dense registration method based on sparse coding and belief propagation. We used image blocks as features, and then we employed sparse coding to find a set of candidate points. To select optimum matches, belief propagation was subsequently applied on these candidate points. Experimental results show that the proposed approach is able to robustly register scenes and is competitive as compared to high accuracy optical flow Brox et al. (2004) [1], and SIFT flow Liu et al. [2]. (C) 2012 Elsevier Inc. All rights reserved.
Gaze mismatch is a common problem in video conferencing, where the viewpoint captured by a camera (usually located above or below a display monitor) is not aligned with the gaze direction of the human subject, who typ...
详细信息
ISBN:
(纸本)9781479957521
Gaze mismatch is a common problem in video conferencing, where the viewpoint captured by a camera (usually located above or below a display monitor) is not aligned with the gaze direction of the human subject, who typically looks at his counterpart in the center of the screen. This means that the two parties cannot converse eye-to-eye, hampering the quality of visual communication. One conventional approach to the gaze mismatch problem is to synthesize a gaze-corrected face image as viewed from center of the screen via depth-image-based rendering (DIBR), assuming texture and depth maps are available at the camera-captured viewpoint(s). Due to self-occlusion, however, there will be missing pixels in the DIBR-synthesized view image that require satisfactory filling. In this paper, we propose to jointly solve the hole-filling problem and the face beautification problem (subtle modifications of facial features to enhance attractiveness of the rendered face) via a unified dual sparse coding framework. Specifically, we first train two dictionaries separately: one for face images of the intended conference subject, one for images of "beautiful" human faces. During synthesis, we simultaneously seek two code vectors - one is sparse in the first dictionary and explains the available DIBR-synthesized pixels, the other is sparse in the second dictionary and matches well with the first vector up to a restricted linear transform. This ensures a good match with the intended target face, while increasing proximity to "beautiful" facial features to improve attractiveness. Experimental results show naturally rendered human faces with noticeably improved attractiveness.
Vision is the main sensory organ for human beings to contact and understand the objective world. The results of various statistical data show that more than 60% of all ways for human beings to obtain external informat...
详细信息
Vision is the main sensory organ for human beings to contact and understand the objective world. The results of various statistical data show that more than 60% of all ways for human beings to obtain external information are through the visual system. Vision is of great significance for human beings to obtain all kinds of information needed for survival, which is the most important sense of human beings. The rapid growth of computer technology, image processing, pattern recognition, and other disciplines have been widely applied. Traditional image processing algorithms have some limitations when dealing with complex images. To solve these problems, some scholars have proposed various new methods. Most of these methods are based on statistical models or artificial neural networks. Although they meet the requirements of modern computer vision systems for feature extraction algorithms with high accuracy, high speed, and low complexity, these algorithms still have many shortcomings. For example, many researchers have used different methods for feature extraction and segmentation to get better segmentation results. Scale-invariant feature transform (SIFT) is a description used in the field of image processing. This description has scale invariance and can detect key points in the image. It is a local feature descriptor. A sparse coding algorithm is an unsupervised learning method, which is used to find a set of "super complete" basis vectors to represent sample data more efficiently. Therefore, combining SIFT and sparse coding, this article proposed an image feature extraction algorithm based on visual information to extract image features. The results showed that the feature extraction time of X algorithm for different targets was within 0.5 s when the other conditions were the same. The feature matching time was within 1 s, and the correct matching rate was more than 90%. The feature extraction time of Y algorithm for different targets was within 2 s. The feature match
In this paper, we address the problem of non-rigid 3D shape retrieval. The proposed method extract high-level features that are invariant to non-rigid shape deformations by integrating deep dictionary learning and a s...
详细信息
ISBN:
(纸本)9781450381048
In this paper, we address the problem of non-rigid 3D shape retrieval. The proposed method extract high-level features that are invariant to non-rigid shape deformations by integrating deep dictionary learning and a sparse coding approach. A stacked sparse coding network is constructed to achieve a multiple layers dictionary learning instead of a single level dictionary learning. Then, for a given 3D query, a 3D shape descriptor is calculated, providing a multi-scale shape representations. This descriptor is, therefore, used to access deep learned dictionary. The proposed method is validated on two benchmarks, namely Shrec'11 and Shrec'15, for 3D non-rigid object retrieval and compared with existing deep learning-based approaches.
This work proposes a method for estimating dynamics on graph by using dynamic mode decomposition (DMD) and sparse approximation with graph filter banks (GFBs). The motivation of introducing DMD on graph is to predict ...
详细信息
ISBN:
(纸本)9781728176055
This work proposes a method for estimating dynamics on graph by using dynamic mode decomposition (DMD) and sparse approximation with graph filter banks (GFBs). The motivation of introducing DMD on graph is to predict multi-point river water levels for forecasting river flood and giving proper evacuation warnings. The proposed method represents a spatio-temporal variation of physical quantities on a graph as a time-evolution equation. Specifically, water level observation data available on the Internet is collected by web scraping. As well, the graph structure is defined based on numerical river information published by Ministry of Land, Infrastructure, Transport and Tourism (MILT) of Japan and the graph is used to construct GFBs for analyzing and synthesizing the water level data. GFBs work in combination with a sparse approximation algorithm for feature extraction of water level distribution. The features are exploited to derive the time-evolution equation through the extended DMD (EDMD) framework. The time-evolution equation is applied to predict river water level distribution. In order to verify the significance of the proposed method, the river water level prediction is conducted for real web-scraped data. The performance evaluation shows the superiority to the normal DMD approach.
暂无评论