This paper presents an online support vector machine (SVM) that uses the stochastic meta-descent (SMD) algorithm to adapt its step size automatically. We formulate the online learning problem as a stochastic gradient ...
详细信息
This paper presents an online support vector machine (SVM) that uses the stochastic meta-descent (SMD) algorithm to adapt its step size automatically. We formulate the online learning problem as a stochastic gradient descent in reproducing kernel Hilbert space (RKHS) and translate SMD to the nonparametric setting, where its gradient trace parameter is no longer a coefficient vector but an element of the RKHS. We derive efficient updates that allow us to perform the step size adaptation in linear time. We apply the online SVM framework to a variety of loss functions, and in particular show how to handle structured output spaces and achieve efficient online multiclass classification. Experiments show that our algorithm outperforms more primitive methods for setting the gradient step size.
The performance of ICA algorithms significantly depends on the choice of the contrast function and the optimisation algorithm used in obtaining the demixing matrix. In this paper we focus on the standard linear nonpar...
详细信息
ISBN:
(纸本)3540464794
The performance of ICA algorithms significantly depends on the choice of the contrast function and the optimisation algorithm used in obtaining the demixing matrix. In this paper we focus on the standard linear nonparametric ICA problem from an optimisation point of view. It is well known that after a pre-whitening process, the problem can be solved via an optimisation approach on a suitable manifold. We propose an approximate Newton's method on the unit sphere to solve the one-unit linear nonparametric ICA problem. The local convergence properties are discussed. The performance of the proposed algorithms is investigated by numerical experiments.
Semi-supervised learning algorithms have been successfully applied in many applications with scarce labeled data, by utilizing the unlabeled data. One important category is graph based semi-supervised learning algorit...
Semi-supervised learning algorithms have been successfully applied in many applications with scarce labeled data, by utilizing the unlabeled data. One important category is graph based semi-supervised learning algorithms, for which the performance depends considerably on the quality of the graph, or its hyperparameters. In this paper, we deal with the less explored problem of learning the graphs. We propose a graph learning method for the harmonic energy minimization method; this is done by minimizing the leave-one-out prediction error on labeled data points. We use a gradient based method and designed an efficient algorithm which significantly accelerates the calculation of the gradient by applying the matrix inversion lemma and using careful pre-computation. Experimental results show that the graph learning method is effective in improving the performance of the classification algorithm.
In regression, the desired estimate of y vertical bar x is not always given by a conditional mean, although this is most common. Sometimes one wants to obtain a good estimate that satisfies the property that a proport...
详细信息
In regression, the desired estimate of y vertical bar x is not always given by a conditional mean, although this is most common. Sometimes one wants to obtain a good estimate that satisfies the property that a proportion, tau, of y vertical bar x, will be below the estimate. For tau = 0.5 this is an estimate of the median. What might be called median regression, is subsumed under the term quantile regression. We present a nonparametric version of a quantile estimator, which can be obtained by solving a simple quadratic programming problem and provide uniform convergence statements and bounds on the quantile property of our estimator. Experimental results show the feasibility of the approach and competitiveness of our method with existing ones. We discuss several types of extensions including an approach to solve the quantile crossing problems, as well as a method to incorporate prior qualitative knowledge such as monotonicity constraints.
We propose a novel second order cone programming formulation for designing robust classifiers which can handle uncertainty in observations. Similar formulations are also derived for designing regression functions whic...
详细信息
We propose a novel second order cone programming formulation for designing robust classifiers which can handle uncertainty in observations. Similar formulations are also derived for designing regression functions which are robust to uncertainties in the regression setting. The proposed formulations are independent of the underlying distribution, requiring only the existence of second order moments. These formulations are then specialized to the case of missing values in observations for both classification and regression problems. Experiments show that the proposed formulations outperform imputation.
Efficient representations and solutions for large decision problems with continuous and discrete variables are among the most important challenges faced by the designers of automated decision support systems. In this ...
详细信息
Efficient representations and solutions for large decision problems with continuous and discrete variables are among the most important challenges faced by the designers of automated decision support systems. In this paper, we describe a novel hybrid factored Markov decision process (MDP) model that allows for a compact representation of these problems, and a new hybrid approximate linear programming (HALP) framework that permits their efficient solutions. The central idea of HALP is to approximate the optimal value function by a linear combination of basis functions and optimize its weights by linear programming. We analyze both theoretical and computational aspects of this approach, and demonstrate its scale-up potential on several hybrid optimization problems.
We propose a family of kernels based on the Binet-Cauchy theorem and its extension to Fredholm operators. This includes as special cases all currently known kernels derived from the behavioral framework, diffusion pro...
ISBN:
(纸本)0262195348
We propose a family of kernels based on the Binet-Cauchy theorem and its extension to Fredholm operators. This includes as special cases all currently known kernels derived from the behavioral framework, diffusion processes, marginalized kernels, kernels on graphs, and the kernels on sets arising from the subspace angle approach. Many of these kernels can be seen as the extrema of a new continuum of kernel functions, which leads to numerous new special cases. As an application, we apply the new class of kernels to the problem of clustering of video sequences with encouraging results.
Efficient learnability using the state merging algorithm is known for a subclass of probabilistic automata termed μ-distinguishable. In this paper, we prove that state merging algorithms can be extended to efficientl...
详细信息
We present a method for performing transductive inference on very large datasets. Our algorithm is based on multiclass Gaussian processes and is effective whenever the multiplication of the kernel matrix or its invers...
详细信息
ISBN:
(纸本)9780262232531
We present a method for performing transductive inference on very large datasets. Our algorithm is based on multiclass Gaussian processes and is effective whenever the multiplication of the kernel matrix or its inverse with a vector can be computed sufficiently fast. This holds, for instance, for certain graph and string kernels. Transduction is achieved by variational inference over the unlabeled data subject to a balancing constraint.
We present methods for dealing with missing variables in the context of Gaussian Processes and Support Vector machines. This solves an important problem which has largely been ignored by kernel methods: How to systema...
详细信息
ISBN:
(纸本)097273581X
We present methods for dealing with missing variables in the context of Gaussian Processes and Support Vector machines. This solves an important problem which has largely been ignored by kernel methods: How to systematically deal with incomplete data? Our method can also be applied to problems with partially observed labels as well as to the transductive setting where we view the labels as missing data. Our approach relies on casting kernel methods as an estimation problem in exponential families. Hence, estimation with missing variables becomes a problem of computing marginal distributions, and finding efficient optimization methods. To that extent we propose an optimization scheme which extends the Concave Convex Procedure (CCP) of Yuille and Rangarajan, and present a simplified and intuitive proof of its convergence. We show how our algorithm can be specialized to various cases in order to efficiently solve the optimization problems that arise. Encouraging preliminary experimental results on the USPS dataset are also presented.
暂无评论