Generative adversarial networks (GANs) have shown remarkable success in generating realistic data from some predefined prior distribution (e.g., Gaussian noises). However, such prior distribution is often independent ...
详细信息
Generative adversarial networks (GANs) have shown remarkable success in generating realistic data from some predefined prior distribution (e.g., Gaussian noises). However, such prior distribution is often independent of real data and thus may lose semantic information (e.g., geometric structure or content in images) of data. In practice, the semantic information might be represented by some latent distribution learned from data. However, such latent distribution may incur difficulties in data sampling for GAN methods. In this paper, rather than sampling from the predefined prior distribution, we propose a GAN model with local coordinate coding (LCC), termed LCCGAN, to improve the performance of the image generation. First, we propose an LCC sampling method in LCCGAN to sample meaningful points from the latent manifold. With the LCC sampling method, we can explicitly exploit the local information on the latent manifold and thus produce new data with promising quality. Second, we propose an improved version, namely LCCGAN++, by introducing a higher-order term in the generator approximation. This term is able to achieve better approximation and thus further improve the performance. More critically, we derive the generalization bound for both LCCGAN and LCCGAN++ and prove that a low-dimensional input is sufficient to achieve good generalization performance. Extensive experiments on several benchmark datasets demonstrate the superiority of the proposed method over existing GAN methods.
Dictionary in local coordinate coding (LCC) is important to approximate a non-linear function with linear ones. Optimizing dictionary from predefined coding schemes is a challenge task. This paper focuses on learning ...
详细信息
Dictionary in local coordinate coding (LCC) is important to approximate a non-linear function with linear ones. Optimizing dictionary from predefined coding schemes is a challenge task. This paper focuses on learning dictionary from two locality coding Adaptors (LCAs), i.e., locality Gaussian Adaptor (GA) and locality Euclidean Adaptor (EA), for large-scale and high-dimension datasets. Online dictionary learning is formulated as two cycling steps, localcoding and dictionary updating. Both stages scale up gracefully to large-scale datasets with millions of data. The experiments on different applications demonstrate that our method leads to a faster dictionary learning than the classical ones or the state-of-the-art methods. (C) 2015 Elsevier B.V. All rights reserved.
作者:
Xiao, WeiLiu, HongTang, HaoLiu, HuapingPeking Univ
Engn Lab Intelligent Percept Internet Things ELIP Key Lab Machine Percept Shenzhen Grad Sch Beijing 100871 Peoples R China Tsinghua Univ
State Key Lab Intelligent Technol & Syst Dept Comp Sci & Technol Beijing 100084 Peoples R China
Extracting informative regularized representations of input signals plays a key role in the field of artificial intelligence, such as machine learning and robotics. Traditional approaches feature l(2) norm and sparse ...
详细信息
ISBN:
(数字)9783662485583
ISBN:
(纸本)9783662485583;9783662485576
Extracting informative regularized representations of input signals plays a key role in the field of artificial intelligence, such as machine learning and robotics. Traditional approaches feature l(2) norm and sparse inducing l(p) norm (0 <= p <= 1) based optimization methods, imposing strict regularization on the representations. However, these approaches overlook the fact that signals and atoms in the overcomplete dictionaries usually contain such wealth of structural information that could improves representations. This paper systematically exploits data manifold geometric structure where signals and atoms reside in, and thus presents a principled extension of sparse coding, i. e. two-layers local coordinate coding, which demonstrates a high dimensional nonlinear function could be locally approximated by a global linear function with quadratic approximation power. Moreover, to learn each latent layer, corresponding patterned optimization approaches are developed, encoding distance information between signals and atoms into the representations. Experimental results demonstrate the significance of this extension on improving the image classification performance and its potential applications for object recognition in robot system are also exploited.
In the 3D facial animation and synthesis community, input faces are usually required to be labeled by a set of landmarks for parameterization. Because of the variations in pose, expression and resolution, automatic 3D...
详细信息
In the 3D facial animation and synthesis community, input faces are usually required to be labeled by a set of landmarks for parameterization. Because of the variations in pose, expression and resolution, automatic 3D face landmark localization remains a challenge. In this paper, a novel landmark localization approach is presented. The approach is based on local coordinate coding (LCC) and consists of two stages. In the first stage, we perform nose detection, relying on the fact that the nose shape is usually invariant under the variations in the pose, expression, and resolution. Then, we use the iterative closest points algorithm to find a 3D affine transformation that aligns the input face to a reference face. In the second stage, we perform resampling to build correspondences between the input 3D face and the training faces. Then, an LCC-based localization algorithm is proposed to obtain the positions of the landmarks in the input face. Experimental results show that the proposed method is comparable to state of the art methods in terms of its robustness, flexibility, and accuracy.
Human Activity Recognition (HAR) is the task to automatically analyze and recognize human body gestures or actions. HAR using time-series multi-modal sensory data is a challenging and important task in the field of ma...
详细信息
Human Activity Recognition (HAR) is the task to automatically analyze and recognize human body gestures or actions. HAR using time-series multi-modal sensory data is a challenging and important task in the field of machine learning and feature engineering due to its increasing demands innumerous real-world applications such as healthcare, sports and surveillance. Numerous daily wearable devices e.g., smartphones, smartwatches, and smart glasses can be used to collect and analyze the human activities on an unprecedented scale. This paper presents a generic framework to recognize the different human activities using continuous time-series multimodal sensory data of these smart gadgets. The proposed framework follows the channel of Bag-of- Features which consists of four steps: (i) Data acquisition and pre-processing, (ii) codebook computation, (iii) feature encoding, and (iv) classification. Each step in the framework plays a significant role to generate an appropriate feature representation of raw sensory data for efficient activity recognition. In the first step, we employed a simple overlapped-window sampling approach to segment the continuous time-series sensory data to make it suitable for activity recognition. Secondly, we build a codebook using k-means clustering algorithm to group the similar sub-sequences. The center of each group is known as codeword and we assume that it represents a specific movement in the activity sequence. The third step consists of feature encoding which transform the raw sensory data of activity sequence into its respective high-level representation for the classification. Specifically, we presented three reconstruction-based encoding techniques to encode sensory data, namely: Sparse coding, local coordinate coding, and locality-constrained Linear coding. The segmented activity sub-sequences are transformed to high-level representation using these techniques and earlier computed codebook. Finally, the encoded features are classified u
Non-negative matrix factorization (NMF) is an effective model in converting data into non-negative coefficient representation whose discriminative ability is usually enhanced to be used for diverse pattern recognition...
详细信息
Non-negative matrix factorization (NMF) is an effective model in converting data into non-negative coefficient representation whose discriminative ability is usually enhanced to be used for diverse pattern recognition tasks. In NMF-based clustering, we often need to perform K-means on the learned coefficient as postprocessing step to get the final cluster assignments. This breaks the connection between the feature learning and recognition stages. In this paper, we propose to learn the non-negative coefficient matrix based on which we jointly perform fuzzy clustering, by viewing that each column of the dictionary matrix as a concept of each cluster. As a result, we formulate a new fuzzy clustering model, termed Joint Non-negative and Fuzzy coding with Graph regularization (G-JNFC), and design an effective optimization method to solve it under the alternating direction optimization framework. Besides the convergence and computational complexity analysis on G-JNFC, we conduct extensive experiments on both synthetic and representative benchmark data sets. The results show that the proposed G-JNFC model is effective in data clustering. (C) 2020 THE AUTHORS. Published by Elsevier BV on behalf of Faculty of Computers and Artificial Intelligence, Cairo University.
Existing matrix factorization based techniques, such as nonnegative matrix factorization and concept factorization, have been widely applied for data representation. In order to make the obtained concepts to be as clo...
详细信息
Existing matrix factorization based techniques, such as nonnegative matrix factorization and concept factorization, have been widely applied for data representation. In order to make the obtained concepts to be as close to the original data points as possible, one state-of-the-art method called locality constraint concept factorization is put forward, which represent the data by a linear combination of only a few nearby basis concepts. But its locality constraint does not well reveal the intrinsic data structure since it only requires the concept to be as close to the original data points as possible. To address these problems, by considering the manifold geometrical structure in local concept factorization via graph-based learning, we propose a novel algorithm, called graph-regularized localcoordinate concept factorization (GRLCF). By constructing a parameter-free graph using constrained Laplacian rank (CLR) algorithm, we also present an extension of GRLCF algorithm as . Moreover, we develop the iterative updating optimization schemes, and provide the convergence proof of our optimization scheme. Since GRLCF simultaneously considers the geometric structures of the data manifold and the locality conditions as additional constraints, it can obtain more compact and better structured data representation. Experimental results on ORL, Yale and Mnist image datasets demonstrate the effectiveness of our proposed algorithm.
Traffic signs are characterized by a wide variability in their visual appearance in real-world environments. Supervised algorithms have achieved superior results on German Traffic Sign Recognition Bench-mark (GTSRB) d...
详细信息
ISBN:
(纸本)9781538627266
Traffic signs are characterized by a wide variability in their visual appearance in real-world environments. Supervised algorithms have achieved superior results on German Traffic Sign Recognition Bench-mark (GTSRB) database. However, these models cannot transfer knowledge across domains, e.g. transfer knowledge learned from Synthetic Signs database to recognize the traffic signs in GTSRB database. Through Synthetic Signs database shares exactly the same class label with GTSRB, the data distribution between them are divergent. Such task is called transfer learning, that is a basic ability for human being but a challenge problem for machines. In order to make these algorithms have ability to transfer knowledge between domains, we propose a variant of Generalized Auto-Encoder (GAE) in this paper. Traditional transfer learning algorithms, e.g. Stacked Autoencoder(SA), usually attempt to reconstruct target data from source data or man-made corrupted data. In contrast, we assume the source and target data are two different corrupted versions of a domain-invariant data. And there is a latent subspace that can reconstruct the domain-invariant data as well as preserve the local manifold of it. Therefore, the domain-invariant data can be obtained not only by de-noising from the nearest source and target data but also by reconstructing from the latent subspace. In order to make the statistical and geometric property preserved simultaneously, we additionally propose a local coordinate coding (LCC)-based relational function to construct the deep nonlinear architecture. The experimental results on several benchmark datasets demonstrate the effectiveness of our proposed approach in comparison with several traditional methods.
Feature pooling in a majority of sparse coding-based tracking algorithms computes final feature vectors only by low-order statistics or extreme responses of sparse codes. The high-order statistics and the correlations...
详细信息
Feature pooling in a majority of sparse coding-based tracking algorithms computes final feature vectors only by low-order statistics or extreme responses of sparse codes. The high-order statistics and the correlations between responses to different dictionary items are neglected. We present a more generalized feature pooling method for visual tracking by utilizing the probabilistic function to model the statistical distribution of sparse codes. Since immediate matching between two distributions usually requires high computational costs, we introduce the Fisher vector to derive a more compact and discriminative representation for sparse codes of the visual target. We encode target patches by local coordinate coding, utilize Gaussian mixture model to compute Fisher vectors, and finally train semi-supervised linear kernel classifiers for visual tracking. In order to handle the drifting problem during the tracking process, these classifiers are updated online with current tracking results. The experimental results on two challenging tracking benchmarks demonstrate that the proposed approach achieves a better performance than the state-of-the-art tracking algorithms.
Traffic signs are characterized by a wide variability in their visual appearance in real-world environments. Supervised algorithms have achieved superior results on German Traffic Sign Recognition Bench-mark (GTSRB) d...
详细信息
Traffic signs are characterized by a wide variability in their visual appearance in real-world environments. Supervised algorithms have achieved superior results on German Traffic Sign Recognition Bench-mark (GTSRB) database. However, these models cannot transfer knowledge across domains, e.g. transfer knowledge learned from Synthetic Signs database to recognize the traffic signs in GTSRB database. Through Synthetic Signs database shares exactly the same class label with GTSRB, the data distribution between them are divergent. Such task is called transfer learning, that is a basic ability for human being but a challenge problem for machines. In order to make these algorithms have ability to transfer knowledge between domains, we propose a variant of Generalized Auto-Encoder (GAE) in this paper. Traditional transfer learning algorithms, *** Autoencoder(SA), usually attempt to reconstruct target data from source data or man-made corrupted data. In contrast, we assume the source and target data are two different corrupted versions of a domain-invariant data. And there is a latent subspace that can reconstruct the domain-invariant data as well as preserve the local manifold of it. Therefore, the domain-invariant data can be obtained not only by de-noising from the nearest source and target data but also by reconstructing from the latent subspace. In order to make the statistical and geometric property preserved simultaneously, we additionally propose a local coordinate coding (LCC)-based relational function to construct the deep nonlinear architecture. The experimental results on several benchmark datasets demonstrate the effectiveness of our proposed approach in comparison with several traditional methods.
暂无评论