We discuss a simple sparse linear problem that is hard to learn with any algorithm that uses a linear combination of the training instances as its weight vector. The hardness holds even if we allow the learner to embe...
详细信息
ISBN:
(纸本)3540265562
We discuss a simple sparse linear problem that is hard to learn with any algorithm that uses a linear combination of the training instances as its weight vector. The hardness holds even if we allow the learner to embed the instances into any higher dimensional feature space (and use a kernel function to define the dot product between the embedded instances). These algorithms are inherently limited by the fact that after seeing k instances only a weight space of dimension k can be spanned. Our hardness result is surprising because the same problem can be efficiently learned using the exponentiated gradient (EG) algorithm: Now the component-wise logarithms of the weights are essentially a linear combination of the training instances and after seeing k instances. This algorithm enforces additional constraints on the weights (all must be non-negative and sum to one) and in some cases these constraints alone k force the rank of the weight space to grow as fast as 2(k).
We present an online Support Vector machine (SVM) that uses Stochastic Meta-Descent (SMD) to adapt its step size automatically. We formulate the online learning problem as a stochastic gradient descent in Reproducing ...
详细信息
This paper presents an algorithm to estimate simultaneously both mean and variance of a non parametric regression problem. The key point is that we are able to estimate variance locally unlike standard Gaussian Proces...
详细信息
ISBN:
(纸本)1595931805
This paper presents an algorithm to estimate simultaneously both mean and variance of a non parametric regression problem. The key point is that we are able to estimate variance locally unlike standard Gaussian Process regression or SVMs. This means that our estimator adapts to the local noise. The problem is cast in the setting of maximum a posteriori estimation in exponential families. Unlike previous work, we obtain a convex optimization problem which can be solved via Newton's method.
We present a method for performing transductive inference on very large datasets. Our algorithm is based on multiclass Gaussian processes and is effective whenever the multiplication of the kernel matrix or its invers...
We present a method for performing transductive inference on very large datasets. Our algorithm is based on multiclass Gaussian processes and is effective whenever the multiplication of the kernel matrix or its inverse with a vector can be computed sufficiently fast. This holds, for instance, for certain graph and string kernels. Transduction is achieved by varia-tional inference over the unlabeled data subject to a balancing constraint.
We propose a convex optimization based strategy to deal with uncertainty in the observations of a classification problem. We assume that instead of a sample (xi;yi) a distribution over (xi;y i) is specified. In partic...
详细信息
ISBN:
(纸本)0262195348
We propose a convex optimization based strategy to deal with uncertainty in the observations of a classification problem. We assume that instead of a sample (xi;yi) a distribution over (xi;y i) is specified. In particular, we derive a robust formulation when the distribution is given by a normal distribution. It leads to Second Order Cone programming formulation. Our method is applied to the problem of missing data, where it outperforms direct imputation.
We propose a family of kernels based on the Binet-Cauchy theorem and its extension to Fredholm operators. This includes as special cases all currently known kernels derived from the behavioral framework, diffusion pro...
We propose a family of kernels based on the Binet-Cauchy theorem and its extension to Fredholm operators. This includes as special cases all currently known kernels derived from the behavioral framework, diffusion processes, marginalized kernels, kernels on graphs, and the kernels on sets arising from the subspace angle approach. Many of these kernels can be seen as the extrema of a new continuum of kernel functions, which leads to numerous new special cases. As an application, we apply the new class of kernels to the problem of clustering of video sequences with encouraging results.
We propose a convex optimization based strategy to deal with uncertainty in the observations of a classification problem. We assume that instead of a sample (xi, yi) a distribution over (xi, yi) is specified. In parti...
We propose a convex optimization based strategy to deal with uncertainty in the observations of a classification problem. We assume that instead of a sample (xi, yi) a distribution over (xi, yi) is specified. In particular, we derive a robust formulation when the distribution is given by a normal distribution. It leads to Second Order Cone programming formulation. Our method is applied to the problem of missing data, where it outperforms direct imputation.
We present a fast iterative support vector training algorithm for a large variety of different formulations. It works by incrementally changing a candidate support vector set using a greedy approach, until the support...
详细信息
ISBN:
(纸本)1577351894
We present a fast iterative support vector training algorithm for a large variety of different formulations. It works by incrementally changing a candidate support vector set using a greedy approach, until the supporting hyperplane is found within a finite number of iterations. It is derived from a simple active set method which sweeps through the set of Lagrange multipliers and keeps optimality in the unconstrained variables, while discarding large amounts of bound-constrained variables. The hard-margin version can be viewed as a simple (yet computationally crucial) modification of the incremental SVM training algorithms of Cauwenberghs and Poggio. Experimental results for various settings are reported. In all cases our algorithm is considerably faster than competing methods such as Sequential Minimal Optimization or the Nearest Point Algorithm.
The paper provides the results of a performance comparison study of Two symbolic learningprograms, both based on the AQ15c learning algorithm. The first program uses a single representation space, while the second on...
详细信息
The paper provides the results of a performance comparison study of Two symbolic learningprograms, both based on the AQ15c learning algorithm. The first program uses a single representation space, while the second one utilizes constructive induction, which changes the representation space. The performance of the compared systems was analyzed using three empirical error rates, including the overall, commission and omission error rates. These were determined by applying the hold-out, 10-fold, and leave-one-out sampling methods. Both systems' performance was calculated for individual stages in a multi-stage knowledge-acquisition process. learning curves and their envelopes were prepared. The study was conducted using a set of 384 optimal designs of wind bracing in steel skeleton structures of tall buildings. The research methodology and the two learning systems used in the experiments are described, all numerical results are provided, and the conclusions of the research are given. Copyright (C) 1996 IJCAI Inc.
暂无评论