作者:
Zhang, NaLi, QiaSouth China Agr Univ
Coll Math & Informat Dept Appl Math Guangzhou 510642 Peoples R China Sun Yat Sen Univ
Sch Comp Sci & Engn Guangdong Prov Key Lab Computat Sci Guangzhou 510275 Peoples R China
In this paper, we consider a class of single-ratio fractional minimization problems, in which the numerator of the objective is the sum of a nonsmooth nonconvex function and a smooth nonconvex function while the denom...
详细信息
In this paper, we consider a class of single-ratio fractional minimization problems, in which the numerator of the objective is the sum of a nonsmooth nonconvex function and a smooth nonconvex function while the denominator is a nonsmooth convex function. In this work, we first derive its first-order necessary optimality condition, by using the first-order operators of the three functions involved. Then we develop first-order algorithms, namely, the proximity-gradientsubgradient algorithm (PGSA), PGSA with monotone line search (PGSA ML), and PGSA with nonmonotone line search (PGSA NL). It is shown that any accumulation point of the sequence generated by them is a critical point of the problem under mild assumptions. Moreover, we establish global convergence of the sequence generated by PGSA or PGSA ML and analyze its convergence rate, by further assuming the local Lipschitz continuity of the nonsmooth function in the numerator, the smoothness of the denominator, and the Kurdyka--\Lojasiewicz (KL) property of the objective. The proposed algorithms are applied to the sparse generalized eigenvalue problem associated with a pair of symmetric positive semidefinite matrices, and the corresponding convergence results are obtained according to their general convergence theorems. We perform some preliminary numerical experiments to demonstrate the efficiency of the proposed algorithms.
In this paper,we consider a block-structured convex optimization model,where in the objective the block variables are nonseparable and they are further linearly coupled in the *** the 2-block case,we propose a number ...
详细信息
In this paper,we consider a block-structured convex optimization model,where in the objective the block variables are nonseparable and they are further linearly coupled in the *** the 2-block case,we propose a number of first-order algorithms to solve this ***,the alternating direction method of multipliers(ADMM)is extended,assuming that it is easy to optimize the augmented Lagrangian function with one block of variables at each time while fixing the other *** prove that O(1/t)iteration complexity bound holds under suitable conditions,where t is the number of *** the subroutines of the ADMM cannot be implemented,then we propose new alternative algorithms to be called alternating proximal gradient method of multipliers,alternating gradient projection method of multipliers,and the hybrids *** suitable conditions,the O(1/t)iteration complexity bound is shown to hold for all the newly proposed ***,we extend the analysis for the ADMM to the general multi-block case.
In this article, we study momentum-based first-order optimization algorithms in which the iterations utilize information from the two previous steps and are subject to an additive white noise. This setup uses noise to...
详细信息
In this article, we study momentum-based first-order optimization algorithms in which the iterations utilize information from the two previous steps and are subject to an additive white noise. This setup uses noise to account for uncertainty in either gradient evaluation or iteration updates, and it includes Polyak's heavy-ball and Nesterov's accelerated methods as special cases. For strongly convex quadratic problems, we use the steady-state variance of the error in the optimization variable to quantify noise amplification and identify fundamental stochastic performance tradeoffs. Our approach utilizes the Jury stability criterion to provide a novel geometric characterization of conditions for linear convergence, and it reveals the relation between the noise amplification and convergence rate as well as their dependence on the condition number and the constant algorithmic parameters. This geometric insight leads to simple alternative proofs of standard convergence results and allows us to establish "uncertainty principle" of strongly convex optimization: for the two-step momentum method with linear convergence rate, the lower bound on the product between the settling time and noise amplification scales quadratically with the condition number. Our analysis also identifies a key difference between the gradient and iterate noise models: while the amplification of gradient noise can be made arbitrarily small by sufficiently decelerating the algorithm, the best achievable variance for the iterate noise model increases linearly with the settling time in the decelerating regime. Finally, we introduce two parameterized families of algorithms that strike a balance between noise amplification and settling time while preserving orderwise Pareto optimality for both noise models.
Many problems of substantial current interest in machine learning, statistics, and data science can be formulated as sparse and low-rank optimization problems. In this paper, we present the nonconvex exterior-point op...
详细信息
Many problems of substantial current interest in machine learning, statistics, and data science can be formulated as sparse and low-rank optimization problems. In this paper, we present the nonconvex exterior-point optimization solver (NExOS)-a first-order algorithm tailored to sparse and low-rank optimization problems. We consider the problem of minimizing a convex function over a nonconvex constraint set, where the set can be decomposed as the intersection of a compact convex set and a nonconvex set involving sparse or low-rank constraints. Unlike the convex relaxation approaches, NExOS finds a locally optimal point of the original problem by solving a sequence of penalized problems with strictly decreasing penalty parameters by exploiting the nonconvex geometry. NExOS solves each penalized problem by applying a first-order algorithm, which converges linearly to a local minimum of the corresponding penalized formulation under regularity conditions. Furthermore, the local minima of the penalized problems converge to a local minimum of the original problem as the penalty parameter goes to zero. We then implement and test NExOS on many instances from a wide variety of sparse and low-rank optimization problems, empirically demonstrating that our algorithm outperforms specialized methods.
The performance estimation problem methodology makes it possible to determine the exact worst-case performance of an optimization method. In this work, we generalize this framework to first-order methods involving lin...
详细信息
The performance estimation problem methodology makes it possible to determine the exact worst-case performance of an optimization method. In this work, we generalize this framework to first-order methods involving linear operators. This extension requires an explicit formulation of interpolation conditions for those linear operators. We consider the class of linear operators M : x H- Mx, where matrix M has bounded singular values, and the class of linear operators, where M is symmetric and has bounded eigenvalues. We describe interpolation conditions for these classes, i.e., necessary and sufficient conditions that, given a list of pairs {(xi, yi)}, characterize the existence of a linear operator mapping xito yifor all i. Using these conditions, we first identify the exact worst-case behavior of the gradient method applied to the composed objective h \circ M, and observe that it always corresponds to M being a scaling operator. We then investigate the Chambolle-Pock method applied to f + g \circ M, and improve the existing analysis to obtain a proof of the exact convergence rate of the primal-dual gap. In addition, we study how this method behaves on Lipschitz convex functions, and obtain a numerical convergence rate for the primal accuracy of the last iterate. We also show numerically that averaging iterates is beneficial in this setting.
Stochastic optimisation algorithms are the de facto standard for machine learning with large amounts of data. Handling only a subset of available data in each optimisation step dramatically reduces the per-iteration c...
详细信息
Stochastic optimisation algorithms are the de facto standard for machine learning with large amounts of data. Handling only a subset of available data in each optimisation step dramatically reduces the per-iteration computational costs, while still ensuring significant progress towards the solution. Driven by the need to solve large-scale optimisation problems as efficiently as possible, the last decade has witnessed an explosion of research in this area. Leveraging the parallels between machine learning and inverse problems has allowed harnessing the power of this research wave for solving inverse problems. In this survey, we provide a comprehensive account of the state-of-the-art in stochastic optimisation from the viewpoint of variational regularisation for inverse problems where the solution is modelled as minimising an objective function. We cover topics such as variance reduction, acceleration and higher-order methods, and compare theoretical results with practical behaviour. We focus on the potential and the challenges for stochastic optimisation that are unique to variational regularisation for inverse imaging problems and are not commonly encountered in machine learning. We conclude the survey with illustrative examples on linear inverse problems in imaging to examine the advantages and disadvantages that this new generation of algorithms brings to the field of inverse problems.
Proximal methods are an important class of algorithms for solving nonsmooth, constrained, large-scale or distributed optimization problems. Because of their flexibility and scalability, they are widely used in current...
详细信息
Proximal methods are an important class of algorithms for solving nonsmooth, constrained, large-scale or distributed optimization problems. Because of their flexibility and scalability, they are widely used in current applications in engineering, machine learning, and data science. The key idea of proximal algorithms is the decomposition of a large-scale optimization problem into several smaller, simpler problems, in which the basic operation is the evaluation of the proximal operator of a function. The proximal operator minimizes the function regularized by a squared Euclidean distance, and it generalizes the Euclidean projection onto a closed convex set. Since the cost of the evaluation of proximal operators often dominates the per-iteration complexity in a proximal algorithm, efficient evaluation of proximal operators is critical. To this end, generalized Bregman proximal operators based on non-Euclidean distances have been proposed and incorporated in many algorithms and applications. In the first part of this dissertation, we present primal–dual proximal splitting methods for convex optimization, in which generalized Bregman distances are used to define the primal and dual update steps. The proposed algorithms can be viewed as Bregman extensions of many well- known proximal methods. For these algorithms, we analyze the theoretical convergence and develop techniques to improve practical implementation. In the second part of the dissertation, we apply the Bregman proximal splitting algorithms to the centering problem in large-scale semidefinite programming with sparse coefficient matrices. The logarithmic barrier function for the cone of positive semidefinite completable sparse matrices is used as the distance-generating kernel. For this distance, the complexity of evaluating the Bregman proximal operator is shown to be roughly proportional to the cost of a sparse Cholesky factorization. This is much cheaper than the standard proximal operator with Euclidean dist
This paper generalizes the optimized gradient method (OGM) [Y. Drori and M. Teboulle, Math. Program., 145 (2014), pp. 451-482], [D. Kim and J. A. Fessler, Math. Program., 159 (2016), pp. 81-107], [D. Kim and J. A. Fes...
详细信息
This paper generalizes the optimized gradient method (OGM) [Y. Drori and M. Teboulle, Math. Program., 145 (2014), pp. 451-482], [D. Kim and J. A. Fessler, Math. Program., 159 (2016), pp. 81-107], [D. Kim and J. A. Fessler, J. Optim. Theory Appl., 172 (2017), pp. 187205] that achieves the optimal worst-case cost function bound of first-order methods for smooth convex minimization [Y. Drori, J. Complexity, 39 (2017), pp. 1-16]. Specifically, this paper studies a generalized formulation of OGM and analyzes its worst-case rates in terms of both the function value and the norm of the function gradient. This paper also develops a new algorithm called OGM-OG that is in the generalized family of OGM and that has the best known analytical worst-case bound with rate O(1/N-1.5) on the decrease of the gradient norm among fixed-step first-order methods. This paper also proves that Nesterov's fast gradient method [Y. Nesterov, Dokl. Akad. Nauk. USSR, 269 (1983), pp. 543-547], [Y. Nesterov, Math. Program., 103 (2005), pp. 127-152] has an O(1/N-1.5) worst-case gradient norm rate but with constant larger than OGM-OG. The proof is based on the worst-case analysis called the Performance Estimation Problem in [Y. Drori and M. Teboulle, Math. Program., 145 (2014), pp. 451-482].
The paper proposes a linesearch for a primal-dual method. Each iteration of the linesearch requires an update of only the dual (or primal) variable. For many problems, in particular for regularized least squares, the ...
详细信息
The paper proposes a linesearch for a primal-dual method. Each iteration of the linesearch requires an update of only the dual (or primal) variable. For many problems, in particular for regularized least squares, the linesearch does not require any additional matrix-vector multiplications. We prove convergence of the proposed method under standard assumptions. We also show an ergodic O(1/N) rate of convergence for our method. In the case when one or both of the prox-functions are strongly convex, we modify our basic method to get a better convergence rate. Finally, we propose a linesearch for a saddle-point problem with an additional smooth term. Several numerical experiments confirm the efficiency of our proposed methods.
This paper provides a new way of developing the fast iterative shrinkage/thresholding algorithm (FISTA) [A. Beck and M. Teboulle, SIAM T. Imaging Sci., 2 (2009), pp. 183-202] that is widely used for minimizing composi...
详细信息
This paper provides a new way of developing the fast iterative shrinkage/thresholding algorithm (FISTA) [A. Beck and M. Teboulle, SIAM T. Imaging Sci., 2 (2009), pp. 183-202] that is widely used for minimizing composite convex functions with a nonsmooth term such as the l(1) regularizer. In particular, this paper shows that FISTA corresponds to an optimized approach to accelerating the proximal gradient method with respect to a worst-case bound of the cost function. This paper then proposes a new algorithm that is derived by instead optimizing the step coefficients of the proximal gradient method with respect to a worst-case bound of the composite gradient mapping. The proof is based on the worst-case analysis called the performance estimation problem in [Y. Drori and M. Teboulle, Math. Program., 145 (2014), pp. 451-482].
暂无评论