In a Hilbert space H, we introduce a new class of proximal-gradient algorithms with finite convergence properties. These algorithms naturally occur as discrete temporal versions of an inertial differential inclusion w...
详细信息
In a Hilbert space H, we introduce a new class of proximal-gradient algorithms with finite convergence properties. These algorithms naturally occur as discrete temporal versions of an inertial differential inclusion which is stabilized under the joint action of three dampings: dry friction, viscous friction, and a geometric damping driven by the Hessian. The function f: H -> R to be minimized is supposed to be differentiable (not necessarily convex) and enters the algorithm via its gradient. The dry friction damping function phi: H -> R+ is convex with a sharp minimum at the origin (typically phi(x) = r parallel to x parallel to with r > 0). It enters the algorithm via its proximal mapping, which acts as a soft threshold operator on the velocities. It is the source of stabilization in a finite number of steps. The geometric damping driven by the Hessian intervenes in the dynamics in the form del(2) f (x(t))x(t). By treating this term as the time derivative of del f (x(t)) , this gives, in discretized form, first-order algorithms. The Hessian-driven damping allows one to control and to attenuate the oscillations. The convergence results tolerate the presence of errors, under the sole assumption of their asymptotic convergence to zero. Replacing the potential function f by its Moreau envelope, we extend the results to the case of a nonsmooth convex function f. In this case, the algorithm involves the proximal operators of f and phi separately. Several variants of this algorithm are considered, including the case of the Nesterov accelerated gradient method. We then consider the extension to the case of additive composite optimization, thus leading to splitting methods. Numerical experiments are given for lasso-type problems. The performance profiles, as a comparison tool, highlight the effectiveness of a variant of the Nesterov accelerated method with dry friction and Hessian-driven damping.
In a Hilbert space setting, we consider new first order optimization algorithms which are obtained by temporal discretization of a damped inertial autonomous dynamic involving dry friction. The function f to be minimi...
详细信息
In a Hilbert space setting, we consider new first order optimization algorithms which are obtained by temporal discretization of a damped inertial autonomous dynamic involving dry friction. The function f to be minimized is assumed to be differentiable (not necessarily convex). The dry friction potential function ?, which has a sharp minimum at the origin, enters the algorithm via its proximal mapping, which acts as a soft thresholding operator on the sum of the velocity and the gradient terms. After a finite number of steps, the structure of the algorithm changes, losing its inertial character to become the steepest descent method. The geometric damping driven by the Hessian of f makes it possible to control and attenuate the oscillations. The algorithm generates convergent sequences when f is convex, and in the nonconvex case when f satisfies the Kurdyka-Lojasiewicz property. The convergence results are robust with respect to numerical errors, and perturbations. The study is then extended to the case of a nonsmooth convex function f , in which case the algorithm involves the proximal operators of f and ? separately. Applications are given to the Lasso problem and nonsmooth d.c. programming.
In aHilbert spaceH, we introduce a newclass of first-order algorithms which naturally occur as discrete temporal versions of an inertial differential inclusion jointly involving viscous friction and dry friction. The ...
详细信息
In aHilbert spaceH, we introduce a newclass of first-order algorithms which naturally occur as discrete temporal versions of an inertial differential inclusion jointly involving viscous friction and dry friction. The function f : H -> Rto be minimized is supposed to be differentiable (not necessarily convex), and enters the algorithm via its gradient. The dry friction damping function phi : H -> R+ is convex with a sharp minimum at the origin, (typically phi(x) = r parallel to x parallel to with r > 0). It enters the algorithm via its proximal mapping, which acts as a soft threshold operator on the velocities. As a result, we obtain a new class of splitting algorithms involving separately the proximal and gradient steps. The sequence of iterates has a finite length, and therefore strongly converges towards an approximate critical point x(infinity) of f (typically parallel to del f (x(infinity))parallel to = r). Under a geometric property satisfied by the limit point x(infinity), we obtain geometric and finite rates of convergence. The convergence results tolerate the presence of errors, under the sole assumption of their asymptotic convergence towards zero. By replacing the function f by its Moreau envelope, we extend the results to the case of nonsmooth convex functions. In this case, the algorithm involves the proximal operators of f and f separately. Several variants of this algorithm are considered, including the case of the Nesterov accelerated gradient method. We then consider the extension in the case of additive composite optimization, thus leading to new splitting methods. Numerical experiments are given for Lasso-type problems. The performance profiles, as a comparison tool, demonstrate the efficiency of the Nesterov accelerated method with asymptotic vanishing damping combined with dry friction.
In a Hilbert space H\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddside...
详细信息
In a Hilbert space H\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal{H}$\end{document}, we study a dynamic inertial Newton method which aims to solve additively structured monotone equations involving the sum of potential and nonpotential terms. Precisely, we are looking for the zeros of an operator A=del f+B\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$A= \nabla f +B $\end{document}, where del f is the gradient of a continuously differentiable convex function f and B is a nonpotential monotone and cocoercive operator. Besides a viscous friction term, the dynamic involves geometric damping terms which are controlled respectively by the Hessian of the potential f and by a Newton-type correction term attached to B. Based on a fixed point argument, we show the well-posedness of the Cauchy problem. Then we show the weak convergence as t ->+infinity\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$t\to +\infty $\end{document} of the generated trajectories towards the zeros of del f+B\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\nabla f +B$\end{document}. The convergence analysis is based on the appropriate setting of the viscous and geometric damping parameters. The introduction of these geometric dampings makes it possible to control and attenuate t
In this work, motivated by the challenging task of learning a deep neural network, we consider optimization problems that consist of minimizing a finite-sum of non-convex and non-smooth functions, where the non-smooth...
详细信息
In this work, motivated by the challenging task of learning a deep neural network, we consider optimization problems that consist of minimizing a finite-sum of non-convex and non-smooth functions, where the non-smoothness appears as the maximum of non-convex functions with Lipschitz continuous gradient. Due to the large size of the sum, in practice, we focus here on stochastic first-order methods and propose the Stochastic proximal Linear Method (SPLM) that is based on minimizing an appropriate majorizer at each iteration and is guaranteed to almost surely converge to a critical point of the objective function, where we also proves its convergence rate in finding critical points.
Motivated by challenges in Computational Statistics such as Penalized Maximum Likelihood inference in statistical models with intractable likelihoods, we analyze the convergence of a stochastic perturbation of the Fas...
详细信息
ISBN:
(纸本)9781538615713
Motivated by challenges in Computational Statistics such as Penalized Maximum Likelihood inference in statistical models with intractable likelihoods, we analyze the convergence of a stochastic perturbation of the Fast Iterative Shrinkage-Thresholding Algorithm (FISTA), when the stochastic approximation relies on a biased Monte Carlo estimation as it happens when the points are drawn from a Markov chain Monte Carlo (MCMC) sampler. We first motivate this general framework and then show a convergence result for the perturbed FISTA algorithm. We discuss the convergence rate of this algorithm and the computational cost of the Monte Carlo approximation to reach a given precision. Finally, through a numerical example, we explore new directions for a better understanding of these proximal-gradient based stochastic optimization algorithms.
The proximalgradient and its variants is one of the most attractive first-order algorithm for minimizing the sum of two convex functions, with one being nonsmooth. However, it requires the differentiable part of the ...
详细信息
The proximalgradient and its variants is one of the most attractive first-order algorithm for minimizing the sum of two convex functions, with one being nonsmooth. However, it requires the differentiable part of the objective to have a Lipschitz continuous gradient, thus precluding its use in many applications. In this paper we introduce a framework which allows to circumvent the intricate question of Lipschitz continuity of gradients by using an elegant and easy to check convexity condition which captures the geometry of the constraints. This condition translates into a new descent lemma which in turn leads to a natural derivation of the proximal-gradient scheme with Bregman distances. We then identify a new notion of asymmetry measure for Bregman distances, which is central in determining the relevant step-size. These novelties allow to prove a global sublinear rate of convergence, and as a by-product, global pointwise convergence is obtained. This provides a new path to a broad spectrum of problems arising in key applications which were, until now, considered as out of reach via proximalgradient methods. We illustrate this potential by showing how our results can be applied to build new and simple schemes for Poisson inverse problems.
暂无评论