版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:School of Computer Science and Engineering Interdisciplinary Center for Neural Computation Hebrew University Jerusalem 91904 Israel
出 版 物:《Journal of Machine Learning Research》 (J. Mach. Learn. Res.)
年 卷 期:2003年第3卷
页 面:1307-1331页
核心收录:
学科分类:1205[管理学-图书情报与档案管理] 0810[工学-信息与通信工程] 080701[工学-工程热物理] 0817[工学-化学工程与技术] 08[工学] 0807[工学-动力工程及工程热物理] 0703[理学-化学] 0835[工学-软件工程] 0714[理学-统计学(可授理学、经济学学位)] 0701[理学-数学] 0812[工学-计算机科学与技术(可授工学、理学学位)]
主 题:Feature extraction
摘 要:Dimensionality reduction of empirical co-occurrence data is a fundamental problem in unsupervised learning. It is also a well studied problem in statistics known as the analysis of cross-classified data. One principled approach to this problem is to represent the data in low dimension with minimal loss of (mutual) information contained in the original data. In this paper we introduce an information theoretic nonlinear method for finding such a most informative dimension reduction. In contrast with previously introduced clustering based approaches, here we extract continuous feature functions directly from the co-occurrencematrix. In a sense, we automatically extract functions of the variables that serve as approximate sufficient statistics for a sample of one variable about the other one. Our method is different from dimensionality reduction methods which are based on a specific, sometimes arbitrary, metric or embedding. Another interpretation of our method is as generalized - multi-dimensional - non-linear regression, where rather than fitting one regression function through two dimensional data, we extract d-regression functions whose expectation values capture the information among the variables. It thus presents a new learning paradigm that unifies aspects from both supervised and unsupervised learning. The resulting dimension reduction can be described by two conjugate d-dimensional differential manifolds that are coupled through Maximum Entropy I-projections. The Riemannian metrics of these manifolds are determined by the observed expectation values of our extracted features. Following this geometric interpretation we present an iterative information projection algorithm for finding such features and prove its convergence. Our algorithm is similar to the method of association analysis in statistics, though the feature extraction context as well as the information theoretic and geometric interpretation are new. The algorithm is illustrated by various syntheti