Dictionary generation is a core technique of the bag-of-visual-words (BOV) models when applied to image categorization. Most of previous approaches generate dictionaries by unsupervised clustering techniques, e.g. k-m...
详细信息
Dictionary generation is a core technique of the bag-of-visual-words (BOV) models when applied to image categorization. Most of previous approaches generate dictionaries by unsupervised clustering techniques, e.g. k-means. However, the features obtained by such kind of dictionaries may not be optimal for image classification. In this paper, we propose a probabilistic model for supervised dictionary learning (SDLM) which seamlessly combines an unsupervised model (a Gaussian Mixture Model) and a supervised model (a logistic regression model) in a probabilistic framework. In the model, image category information directly affects the generation of a dictionary. A dictionary obtained by this approach is a trade-off between minimization of distortions of clusters and maximization of discriminative power of image-wise representations, i.e. histogram representations of images. We further extend the model to incorporate spatial information during the dictionary learning process in a spatial pyramid matching like manner. We extensively evaluated the two models on various benchmark dataset and obtained promising results.
In most large-scale real-world pattern classification problems, there is always some explicit information besides given training data, namely prior knowledge, with which the training data are organized. In this paper,...
详细信息
In most large-scale real-world pattern classification problems, there is always some explicit information besides given training data, namely prior knowledge, with which the training data are organized. In this paper, we proposed a framework for incorporating this kind of prior knowledge into the training of min-max modular (M3) classifier to improve learning performance. In order to evaluate the proposed method, we perform experiments on a large-scale Japanese patent classification problem and consider two kinds of prior knowledge included in patent documents: patent's publishing date and the hierarchical structure of patent classification system. In the experiments, traditional support vector machine (SVM) and Ma-SVM without prior knowledge are adopted as baseline classifiers. Experimental results demonstrate that the proposed method is superior to the baseline classifiers in terms of training cost and generalization accuracy. Moreover, Ma-SVM with prior knowledge is found to be much more robust than traditional support vector machine to noisy dated patent samples, which is crucial for incremental learning.
暂无评论