The proximal gradient algorithm for minimizing the sum of a smooth and nonsmooth convex function often converges linearly even without strong convexity. One common reason is that a multiple of the step length at each ...
详细信息
The proximal gradient algorithm for minimizing the sum of a smooth and nonsmooth convex function often converges linearly even without strong convexity. One common reason is that a multiple of the step length at each iteration may linearly bound the " error"-the distance to the solution set. We explain the observed linear convergence intuitively by proving the equivalence of such an error bound to a natural quadratic growth condition. Our approach generalizes to linear and quadratic convergence analysis for proximal methods (of Gauss-Newton type) for minimizing compositions of nonsmooth functions with smooth mappings. We observe incidentally that short step-lengths in the algorithm indicate near-stationarity, suggesting a reliable termination criterion.
We consider the tasks of feature selection and policy evaluation based on linear value function approximation in reinforcement learning problems. High-dimension feature vectors and limited number of samples can easily...
详细信息
We consider the tasks of feature selection and policy evaluation based on linear value function approximation in reinforcement learning problems. High-dimension feature vectors and limited number of samples can easily cause over-fitting and computation expensive. To prevent this problem, l(1)-regularized method obtains sparse solutions and thus improves generalization performance. We propose an efficient l(1)-regularized recursive least squares-based online algorithm with O(n(2)) complexity per time-step, termed l(1)-RC. With the help of nested optimization decomposition, l(1)-RC solves a series of standard optimization problems and avoids minimizing mean squares projected Bellman error with l(1)-regularization directly. In l(1)-RC, we propose RC with iterative refinement to minimize the operator error, and we propose an alternating direction method of multipliers with proximal operator to minimize the fixed-point error. The convergence of l(1)-RC is established based on ordinary differential equation method and some extensions are also given. In empirical computations, some state-of-the-art l(1)-regularized methods are chosen as the baselines, and l(1)-RC are tested in both policy evaluation and learning control benchmarks. The empirical results show the effectiveness and advantages ofl(1)-RC.
作者:
Bertsekas, Dimitri P.MIT
Dept Elect Engn & Comp Sci Informat & Decis Syst Lab Cambridge MA 02139 USA
We consider the minimization of a sum Sigma(m)(i=1) f(i)(x) consisting of a large number of convex component functions f(i). For this problem, incremental methods consisting of gradient or subgradient iterations appli...
详细信息
We consider the minimization of a sum Sigma(m)(i=1) f(i)(x) consisting of a large number of convex component functions f(i). For this problem, incremental methods consisting of gradient or subgradient iterations applied to single components have proved very effective. We propose new incremental methods, consisting of proximal iterations applied to single components, as well as combinations of gradient, subgradient, and proximal iterations. We provide a convergence and rate of convergence analysis of a variety of such methods, including some that involve randomization in the selection of components. We also discuss applications in a few contexts, including signal processing and inference/machine learning.
We consider a splitting proximal algorithm with penalization for minimizing a finite sum of proper, convex and lower semicontinuous functions subject to the set of minimizers of another proper, convex and lower semico...
详细信息
We consider a splitting proximal algorithm with penalization for minimizing a finite sum of proper, convex and lower semicontinuous functions subject to the set of minimizers of another proper, convex and lower semicontinuous function. We show convergences of the generated sequence of iterates to an optimal solution of the considered convex hierarchical minimization problem. Some numerical experiments on the regularized least squares problems are given to show the effectiveness of the obtained theoretical results.
We investigate the modeling and the numerical solution of machine learning problems with prediction functions which are linear combinations of elements of a possibly infinite dictionary of functions. We propose a nove...
详细信息
We investigate the modeling and the numerical solution of machine learning problems with prediction functions which are linear combinations of elements of a possibly infinite dictionary of functions. We propose a novel flexible composite regularization model, which makes it possible to incorporate various priors on the coefficients of the prediction function, including sparsity and hard constraints. We show that the estimators obtained by minimizing the regularized empirical risk are consistent in a statistical sense, and we design an error-tolerant composite proximal thresholding algorithm for computing such estimators. New results on the asymptotic behavior of the proximal forward-backward splitting method are derived and exploited to establish the convergence properties of the proposed algorithm. In particular, our method features a o(1 / m) convergence rate in objective values.
Perspective functions arise explicitly or implicitly in various forms in applied mathematics and in statistical data analysis. To date, no systematic strategy is available to solve the associated, typically nonsmooth,...
详细信息
Perspective functions arise explicitly or implicitly in various forms in applied mathematics and in statistical data analysis. To date, no systematic strategy is available to solve the associated, typically nonsmooth, optimization problems. In this paper, we fill this gap by showing that proximal methods provide an efficient framework to model and solve problems involving perspective functions. We study the construction of the proximity operator of a perspective function under general assumptions and present important instances in which the proximity operator can be computed explicitly or via straightforward numerical operations. These results constitute central building blocks in the design of proximal optimization algorithms. We showcase the versatility of the framework by designing novel proximal algorithms for state-of-the-art regression and variable selection schemes in high-dimensional statistics. (C) 2016 The Authors. Published by Elsevier Inc.
作者:
Moudafi, Abdellatif[a]CEREGMIA
Université des Antilles-Guyane Département Scientifique Interfacultaire Campus de Schoelcher 97230 Cedex Martinique (F.W.I.)
Two splitting procedures for solving equilibrium problems involving the sum of two bifunctions are proposed and their convergence is established under mild assumptions. (C) 2009 Elsevier Inc. All rights reserved.
Two splitting procedures for solving equilibrium problems involving the sum of two bifunctions are proposed and their convergence is established under mild assumptions. (C) 2009 Elsevier Inc. All rights reserved.
In the past few years, robustness has been one problem that was given much attention in the statistical literature. While it is now clear that no single robust regression procedure is best (by mean square error or oth...
详细信息
In the past few years, robustness has been one problem that was given much attention in the statistical literature. While it is now clear that no single robust regression procedure is best (by mean square error or other adequate criteria), the LAV (least absolute value) and the Huber-M estimators are currently attracting considerable attention when the errors have a contaminated Gaussian or long-tail distribution. Finding efficient algorithms to produce such estimates in the case of large data sets is still a field of active research. In this paper, we present algorithms based on the Spingarn Partial Inverse proximal approach that takes into account both primal and dual aspects of the related optimization problems. They can be viewed as decomposition methods. Known to be always globally convergent, such an alternative iterative approach leads to simple computational steps and updating rules. The result is a highly parallel algorithm particularly attractive for large-scale problems. Its efficient implementation on a parallel computer architecture is described. Remedies are introduced to ensure efficiency in the case of models with less than full ranks. Numerical simulations are considered and computational performance reported. Finally, we show how the method allows for easy handling of general convex constraints on the primal variables. We discuss in detail a variety of linear and nonlinear restrictions. The case of ridge LAV and Huber-M regression is specifically considered.
Latent variable models have been playing a central role in psychometrics and related fields. In many modern applications, the inference based on latent variable models involves one or several of the following features...
详细信息
Latent variable models have been playing a central role in psychometrics and related fields. In many modern applications, the inference based on latent variable models involves one or several of the following features: (1) the presence of many latent variables, (2) the observed and latent variables being continuous, discrete, or a combination of both, (3) constraints on parameters, and (4) penalties on parameters to impose model parsimony. The estimation often involves maximizing an objective function based on a marginal likelihood/pseudo-likelihood, possibly with constraints and/or penalties on parameters. Solving this optimization problem is highly non-trivial, due to the complexities brought by the features mentioned above. Although several efficient algorithms have been proposed, there lacks a unified computational framework that takes all these features into account. In this paper, we fill the gap. Specifically, we provide a unified formulation for the optimization problem and then propose a quasi-Newton stochastic proximal algorithm. Theoretical properties of the proposed algorithms are established. The computational efficiency and robustness are shown by simulation studies under various settings for latent variable model estimation.
We introduce a flexible optimization model for maximum likelihood-type estimation (M-estimation) that encompasses and generalizes a large class of existing statistical models, including Huber's concomitant M-estim...
详细信息
We introduce a flexible optimization model for maximum likelihood-type estimation (M-estimation) that encompasses and generalizes a large class of existing statistical models, including Huber's concomitant M-estimator, Owen's Huber/Berhu concomitant estimator, the scaled lasso, support vector machine regression, and penalized estimation with structured sparsity. The model, termed perspective M-estimation, leverages the observation that convex M-estimators with concomitant scale as well as various regularizers are instances of perspective functions, a construction that extends a convex function to a jointly convex one in terms of an additional scale variable. These nonsmooth functions are shown to be amenable to proximal analysis, which leads to principled and provably convergent optimization algorithms via proximal splitting. We derive novel proximity operators for several perspective functions of interest via a geometrical approach based on duality. We then devise a new proximal splitting algorithm to solve the proposed M-estimation problem and establish the convergence of both the scale and regression iterates it produces to a solution. Numerical experiments on synthetic and real-world data illustrate the broad applicability of the proposed framework.
暂无评论