This paper proposes to employ deep learning model to encode local descriptors for image classification. Previous works using deep architectures to obtain higher representations are often operated from pixel level, whi...
详细信息
ISBN:
(纸本)9781479983407
This paper proposes to employ deep learning model to encode local descriptors for image classification. Previous works using deep architectures to obtain higher representations are often operated from pixel level, which lack the power to be generalized to large-size and complex images due to computational burdens and internal essence capture. Our method slips the leash of this limitation by starting from local descriptors to leverage more semantical inputs. We investigate to use two layers of Restricted Boltzmann Machines (RBMs) to encode different local descriptors with a novel group sparse learning (GSL) inspired by the recent success of sparse coding. Besides, unlike the most existing pure unsupervised feature coding strategies, we use another RBM corresponding to semantic labels to perform supervised fine-tuning which makes our model more suitable for classification task. Experimental results on Caltech-256 and Indoor-67 datasets demonstrate the effectiveness of our method.
To reduce quantization error, preserve the manifold of local features, distinguish the ambiguous features, and model the spatial configuration of features for Bag-of-features (BoF) model-based human action recognition...
详细信息
To reduce quantization error, preserve the manifold of local features, distinguish the ambiguous features, and model the spatial configuration of features for Bag-of-features (BoF) model-based human action recognition, a novel feature coding method called spatially regularized and locality-constrained linear coding (SLLC) is proposed. The spatial regularization and locality constraint are involved in the feature coding phase to model the spatial configuration of features and preserve their nonlinear manifold. The action recognition experimental results on benchmark datasets show that SLLC achieves better performance than the state-of-the-art feature coding methods such as soft vector quantization, sparse coding, and locality-constrained linear coding. (C) 2014 The Japan Society of Applied Physics
In visual sensor networks, local feature descriptors can be computed at the sensing nodes, which work collaboratively on the data obtained to make an efficient visual analysis. In fact, with a minimal amount of comput...
详细信息
ISBN:
(纸本)9780992862619
In visual sensor networks, local feature descriptors can be computed at the sensing nodes, which work collaboratively on the data obtained to make an efficient visual analysis. In fact, with a minimal amount of computational effort, the detection and extraction of local features, such as binary descriptors, can provide a reliable and compact image representation. In this paper, it is proposed to extract and code binary descriptors to meet the energy and bandwidth constraints at each sensing node. The major contribution is a binary descriptor coding technique that exploits the correlation using two different coding modes: Intra, which exploits the correlation between the elements that compose a descriptor;and Inter, which exploits the correlation between descriptors of the same image. The experimental results show bitrate savings up to 35% without any impact in the performance efficiency of the image retrieval task.
Representing images with their descriptive features is the fundamental problem in CBIR. feature coding as a key-step in feature description has attracted the attentions in recent years. Among the proposed coding strat...
详细信息
ISBN:
(纸本)9781479939909
Representing images with their descriptive features is the fundamental problem in CBIR. feature coding as a key-step in feature description has attracted the attentions in recent years. Among the proposed coding strategies, Bag-of-Words (BoW) is the most widely used model. Recently saliency has been mentioned as the fundamental characteristic of BoW. Base on this idea, Salient coding (SaC) has been introduced. Empirical studies show that SaC is not able to represent the global structure of data with small number of codewords. In this paper, we remedy this limitation by introducing Locally Linear Salient coding (LLSaC). This method discovers the global structure of the data by exploiting the local linear reconstructions of the data points. This knowledge in addition to the salient responses, provided by SaC, helps to describe the structure of the data even with a few codewords. Experimental results show that LLSaC obtains state-of-the-art results on various data types such as multimedia and Earth Observation.
We propose to learn an extremely compact visual descriptor from the mobile contexts towards low bit rate mobile location search. Our scheme combines location related side information from the mobile devices to adaptiv...
详细信息
We propose to learn an extremely compact visual descriptor from the mobile contexts towards low bit rate mobile location search. Our scheme combines location related side information from the mobile devices to adaptively supervise the compact visual descriptor design in a flexible manner, which is very suitable to search locations or landmarks within a bandwidth constraint wireless link. Along with the proposed compact descriptor learning, a large-scale, contextual aware mobile visual search benchmark dataset PKUBench is also introduced, which serves as the first comprehensive benchmark for the quantitative evaluation of how the cheaply available mobile contexts can help the mobile visual search systems. Our proposed contextual learning based compact descriptor has shown to outperform the existing works in terms of compression rate and retrieval effectiveness. (c) 2012 Elsevier B.V. All rights reserved.
Rapid advancements in communication technologies have enabled the digital community to benefit from the advantages of fast and simple digital information exchange over the internet. Such benefits, however, come in-han...
详细信息
ISBN:
(纸本)9781479933433
Rapid advancements in communication technologies have enabled the digital community to benefit from the advantages of fast and simple digital information exchange over the internet. Such benefits, however, come in-hand with the problems and threats associated with ensuring digital copyright protection preventing digital counterfeiting, proof-of-authentication, tamper-detection, and content-originality verification for multimedia digital content. These issues have been largely addressed in the literature for the case of image, audio, and video, with notably less emphasis on addressing the challenge of text media. With text being the predominant communication medium on the internet, it is clear that more attention is required to secure and protect text document. In this work, invisible watermarking technique based on Kashida-marks is proposed. The watermarking key is predefined whereby a Kashida (redundant Arabic character extension) is placed for a bit 1 and omitted for a bit 0. By going through the document, Kashidas are inserted before a specific list of characters until the end of the key is reached. If the end of the document is not reached then we repeat the key-embedding for the remainder of the document in a round robin fashion until the end. In comparison to other Kashida methods in the literature, our proposed technique proved to achieve our goal of document protection and authenticity with enhanced robustness and improved perceptual similarity with the original cover-text.
Nowadays, visual sensor networks have emerged as an important research area for distributed signal processing, with unique challenges in terms of performance, complexity, and resource allocation. In visual sensor netw...
详细信息
ISBN:
(纸本)9781467358057
Nowadays, visual sensor networks have emerged as an important research area for distributed signal processing, with unique challenges in terms of performance, complexity, and resource allocation. In visual sensor networks, the energy consumption must be kept low to extend the lifetime of each battery-operated camera node. Thus, considering the large amount of data that visual sensors can generate, all the sensing, processing, and transmission operations must be optimized considering strict energy constraints. In this paper, camera nodes sense the visual scene but instead of transmitting the pixel coded representation, which demands high computation and bandwidth, a compact but yet rich visual representation is created and transmitted. This representation consists of discriminative visual features offering tremendous potential for several image analysis tasks. From all low-level image features available, the novel class of binary features, very fast to compute and match, are well suited for visual sensor networks. In this paper, lossless compression of binary image features is proposed to further lower the energy and bandwidth requirements. The coding solution exploits the redundancy between descriptors of an image by sorting the descriptors and applying DPCM and arithmetic coding. Experimental results show improvements up to 32% in terms of bitrate savings without any impact in the final image retrieval task accuracy.
Palm vein recognition is a relatively new method in biometrics. This paper presents an effective palm vein feature extraction approach for improving the efficiency of palm vein identification. In this paper, relevant ...
详细信息
ISBN:
(纸本)9780819495037
Palm vein recognition is a relatively new method in biometrics. This paper presents an effective palm vein feature extraction approach for improving the efficiency of palm vein identification. In this paper, relevant preprocessing steps as rotation and extraction of the Region of Interest are presented. In feature extraction, multiple 2D Gabor filters with 4 orientations are employed to extract the phase information on a palm vein image, which is then merged into unique feature according to an encoding rule. Hamming distance is used for vein recognition. Experiments are carried on a self-made palm vein database. Experimental results show that the method in this paper achieved a higher correct recognition rate and a faster speed.
To handle with the limitation of bag-of-features (BoF) model which ignores spatial and temporal relationships of local features in human action recognition in video, a Local Spatiotemporal coding (LSC) is proposed. Ra...
详细信息
ISBN:
(纸本)9783037858462
To handle with the limitation of bag-of-features (BoF) model which ignores spatial and temporal relationships of local features in human action recognition in video, a Local Spatiotemporal coding (LSC) is proposed. Rather than the exiting methods only uses the feature appearance information for coding, LSC encodes feature appearance and spatiotemporal positions information simultaneously with vector quantization (VQ). It can directly models the spatiotemporal relationships of local features in space time volume (STY). In implement, the local features are projected into sub-space-time-volume (sub-STY), and encoded with LSC. In addition a multi-level LSC is also provided. Then a group of sub-STV descriptors obtained from videos with multi-level LSC and Avg-pooling are used for action video classification. A sparse representation based classification method is adopted to classify action videos upon these sub-STV descriptors. The experimental results on KTH, Weizmann, and UCF sports datasets show that our method achieves better performance than the previous local spatiotemporal features based human action recognition methods.
The state-of-the-art text clustering methods suffer from the huge size of documents with high-dimensional features. In this paper, we studied fast SUM clustering technology for Text Information. Our focus is on how to...
详细信息
The state-of-the-art text clustering methods suffer from the huge size of documents with high-dimensional features. In this paper, we studied fast SUM clustering technology for Text Information. Our focus is on how to enhance the efficiency of text clustering system whereas high clustering qualities are also kept. To achieve this goal, we separate the system into two stages: offline and online. In order to make text clustering system more efficient, feature extraction and semantic quantization are done offline. Although neurons are represented as numerical vectors in high-dimension space, documents are represented as collections of some important keywords, which is different from many related works, thus the requirement for both time and space in the offline stage can be alleviated. Based on this scenario, fast clustering techniques for online stage are proposed including how to project documents onto output layers in SOM, fast similarity computation method and the scheme of Incremental clustering technology for real-time processing, We tested the system using different datasets, the practical performance demonstrate that our approach has been shown to be much superior in clustering efficiency whereas the clustering quality are comparable to traditional methods. (C) 2011 Elsevier Ltd. All rights reserved.
暂无评论