We investigate the online version of Principle Component Analysis (PCA), where in each trial t the learning algorithm chooses a k-dimensional subspace, and upon receiving the next instance vector x(t), suffers the &qu...
详细信息
We investigate the online version of Principle Component Analysis (PCA), where in each trial t the learning algorithm chooses a k-dimensional subspace, and upon receiving the next instance vector x(t), suffers the "compression loss", which is the squared Euclidean distance between this instance and its projection into the chosen subspace. When viewed in the right parameterization, this compression loss is linear, i.e. it can be rewritten as tr(W(t)x(t)x(t)(inverted perpendicular)), where W-t is the parameter of the algorithm and the outer product x(t)x(t)(inverted perpendicular) (with parallel to x(t)parallel to <= 1) is the instance matrix. In this paper generalize PCA to arbitrary positive definite instance matrices X-t with the linear loss tr(WtXt). We evaluate online algorithms in terms of their worst-case regret, which is a bound on the additional total loss of the online algorithm on all instances matrices over the compression loss of the best k-dimensional subspace (chosen in hindsight). We focus on two popular online algorithms for generalized PCA: the gradient Descent (GD) and matrixexponentiatedgradient (MEG) algorithms. We show that if the regret is expressed as a function of the number of trials, then both algorithms are optimal to within a constant factor on worst-case sequences of positive definite instances matrices with trace norm at most one (which subsumes the original PCA problem with outer products). This is surprising because MEG is believed be suboptimal in this case. We also show that when considering regret bounds as a function of a loss budget, then MEG remains optimal and strictly outperforms GD when the instance matrices are trace norm bounded. Next, we consider online PCA when the adversary is allowed to present the algorithm with positive semide finite instance matrices whose largest eigenvalue is bounded (rather than their trace which is the sum of their eigenvalues). Again we can show that MEG is optimal and strictly better than GD i
We consider the following type of online variance minimization problem: In every trial t our algorithms get a covariance matrix C-t and try to select a parameter vector w(t-1) such that the total variance over a seque...
详细信息
We consider the following type of online variance minimization problem: In every trial t our algorithms get a covariance matrix C-t and try to select a parameter vector w(t-1) such that the total variance over a sequence of trials Sigma(T)(t=1)(w(t-1))(Tau)C(t)w(t-1) is not much larger than the total variance of the best parameter vector u chosen in hindsight. Two parameter spaces in R-n are considered-the probability simplex and the unit sphere. The first space is associated with the problem of minimizing risk in stock portfolios and the second space leads to an online calculation of the eigenvector with minimum eigenvalue of the total covariance matrix Sigma C-T(t=1)t. For the first parameter space we apply the exponentiatedgradientalgorithm which is motivated with a relative entropy regularization. In the second case, the algorithm has to maintain uncertainty information over all unit directions u. For this purpose, directions are represented as dyads uu(Tau) and the uncertainty over all directions as a mixture of dyads which is a density matrix. The motivating divergence for density matrices is the quantum version of the relative entropy and the resulting algorithm is a special case of the matrix exponentiated gradient algorithm. In each of the two cases we prove bounds on the additional total variance incurred by the online algorithm over the best offline parameter.
We investigate the online version of Principle Component Analysis (PCA), where in each trial t the learning algorithm chooses a k-dimensional subspace, and upon receiving the next instance vector xt, suffers the "...
详细信息
We investigate the online version of Principle Component Analysis (PCA), where in each trial t the learning algorithm chooses a k-dimensional subspace, and upon receiving the next instance vector xt, suffers the "compression loss", which is the squared Euclidean distance between this instance and its projection into the chosen subspace. When viewed in the right parameterization, this compression loss is linear, i.e. it can be rewritten as tr(Wtxtxt⊤), where Wt is the parameter of the algorithm and the outer product xtxt⊤ (with ||xt|| ≤ 1) is the instance matrix. In this paper generalize PCA to arbitrary positive definite instance matrices Xt with the linear loss tr(WtXt).We evaluate online algorithms in terms of their worst-case regret, which is a bound on the additional total loss of the online algorithm on all instances matrices over the compression loss of the best k-dimensional subspace (chosen in hindsight). We focus on two popular online algorithms for generalized PCA: the gradient Descent (GD) and matrixexponentiatedgradient (MEG) algorithms. We show that if the regret is expressed as a function of the number of trials, then both algorithms are optimal to within a constant factor on worst-case sequences of positive definite instances matrices with trace norm at most one (which subsumes the original PCA problem with outer products). This is surprising because MEG is believed be suboptimal in this case. We also show that when considering regret bounds as a function of a loss budget, then MEG remains optimal and strictly outperforms GD when the instance matrices are trace norm ***, we consider online PCA when the adversary is allowed to present the algorithm with positive semidefinite instance matrices whose largest eigenvalue is bounded (rather than their trace which is the sum of their eigenvalues). Again we can show that MEG is optimal and strictly better than GD in this setting.
暂无评论