The accelerated proximal methods (APM) have become one of the most important opti-mization tools for large-scale convex composite minimization problems, due to their wide range of applications and the optimal converge...
详细信息
The accelerated proximal methods (APM) have become one of the most important opti-mization tools for large-scale convex composite minimization problems, due to their wide range of applications and the optimal convergence rate in first-order algorithms. However, most existing theoretical results of APM are obtained by assuming that the gradient or-acle is exact and the proximal mapping must be exactly solved, which may not hold in practice. This work presents a theoretical study of APM by allowing to use inexact gradi-ent oracle and approximate proximal mapping. Specifically, we analyze inexact APM by improving the approximate duality gap technique (ADGT) which was originally designed for convergence analysis for first-order methods with both exact gradient oracle and prox-imal mapping. Our approach has several advantages: 1) we provide a unified convergence analysis that allows both inexact gradient oracle and approximate proximal mapping;2) our proof is generic that naturally recovers the convergence rates of both accelerated and non-accelerated proximal methods, on top of which the advantages and the disadvantages of acceleration can be easily derived;3) we derive the same convergence bound as previous methods in terms of inexact gradient oracle, but a tighter convergence bound in terms of approximate proximal mapping.
The accelerated proximal methods (APM) have become one of the most important optimization tools for large-scale convex composite minimization problems, due to their wide range of applications and the optimal convergen...
详细信息
The accelerated proximal methods (APM) have become one of the most important optimization tools for large-scale convex composite minimization problems, due to their wide range of applications and the optimal convergence rate in first-order algorithms. However, most existing theoretical results of APM are obtained by assuming that the gradient oracle is exact and the proximal mapping must be exactly solved, which may not hold in practice. This work presents a theoretical study of APM by allowing to use inexact gradient oracle and approximate proximal mapping. Specifically, we analyze inexact APM by improving the approximate duality gap technique (ADGT) which was originally designed for convergence analysis for first-order methods with both exact gradient oracle and proximal mapping. Our approach has several advantages: 1) we provide a unified convergence analysis that allows both inexact gradient oracle and approximate proximal mapping; 2) our proof is generic that naturally recovers the convergence rates of both accelerated and non-accelerated proximal methods, on top of which the advantages and the disadvantages of acceleration can be easily derived; 3) we derive the same convergence bound as previous methods in terms of inexact gradient oracle, but a tighter convergence bound in terms of approximate proximal mapping.
We propose a new family of inexact sequential quadratic approximation (SQA) methods, which we call the inexact regularized proximal Newton (IRPN) method, for minimizing the sum of two closed proper convex functions, o...
详细信息
We propose a new family of inexact sequential quadratic approximation (SQA) methods, which we call the inexact regularized proximal Newton (IRPN) method, for minimizing the sum of two closed proper convex functions, one of which is smooth and the other is possibly non-smooth. Our proposed method features strong convergence guarantees even when applied to problems with degenerate solutions while allowing the inner minimization to be solved inexactly. Specifically, we prove that when the problem possesses the so-called Luo-Tseng error bound (EB) property, IRPN converges globally to an optimal solution, and the local convergence rate of the sequence of iterates generated by IRPN is linear, superlinear, or even quadratic, depending on the choice of parameters of the algorithm. Prior to this work, such EB property has been extensively used to establish the linear convergence of various first-order methods. However, to the best of our knowledge, this work is the first to use the Luo-Tseng EB property to establish the superlinear convergence of SQA-type methods for non-smooth convexminimization. As a consequence of our result, IRPN is capable of solving regularized regression or classification problems under the high-dimensional setting with provable convergence guarantees. We compare our proposed IRPN with several empirically efficient algorithms by applying them to the 1-regularized logistic regression problem. Experiment results show the competitiveness of our proposed method.
In this paper, we aim at unifying, simplifying, and improving the convergence rate analysis of Lagrangian-based methods for convex optimization problems. We first introduce the notion of nice primal algorithmic map, w...
详细信息
In this paper, we aim at unifying, simplifying, and improving the convergence rate analysis of Lagrangian-based methods for convex optimization problems. We first introduce the notion of nice primal algorithmic map, which plays a central role in the unification and in the simplification of the analysis of most Lagrangian-based methods. Equipped with a nice primal algorithmic map, we then introduce a versatile generic scheme, which allows for the design and analysis of faster Lagrangian (FLAG) methods with new provably sublinear rate of convergence expressed in terms of function values and feasibility violation of the original (nonergodic) generated sequence. To demonstrate the power and versatility of our approach and results, we show that most well-known iconic Lagrangian-based schemes admit a nice primal algorithmic map and hence share the new faster rate of convergence results within their corresponding FLAG.
In modern-data analysis applications, the abundance of data makes extracting meaningful infor- mation from it challenging, in terms of computation, storage, and interpretability. In this setting, exploiting sparsity i...
详细信息
In modern-data analysis applications, the abundance of data makes extracting meaningful infor- mation from it challenging, in terms of computation, storage, and interpretability. In this setting, exploiting sparsity in data has been essential to the development of scalable methods to problems in machine learning, statistics and signal processing. However, in various applications, the input variables exhibit structure beyond simple sparsity. This motivated the introduction of structured sparsity models, which capture such sophisticated structures, leading to significant performance gains and better interpretability. Structured sparsity approaches have been successfully applied in a variety of domains including computer vision, text and audio processing, medical imaging, and bioinformatics. The goal of this thesis is to improve on these methods and expand their success to a wider range of applications. We thus develop novel methods to incorporate general structure a priori in learning problems, which balance computational and statistical efficiency trade-offs. To achieve this, our results bring together tools from discrete and convex optimization. Applying structured sparsity approaches in general is challenging because structures encountered in practice are naturally combinatorial. An effective approach to circumvent this computational challenge is to employ continuous convex relaxations. We thus start by introducing a new class of structured sparsity models, able to capture a large range of structures, which admit tight convex relaxations amenable to efficient optimization. We then present an in-depth study of the geometric and statistical properties of convex relaxations of general combinatorial structures. In particular, we characterize which structure is lost by imposing convexity and which is preserved. We then focus on the optimization of the convexcomposite problems that result from the convex relaxations of structured sparsity models. We develop efficient al
暂无评论