In this paper, we present a novel convex method for the graph-structured sparse recovery. While various structured sparsities can be represented as the graph-structured sparsity, graph-structured sparse recovery remai...
详细信息
ISBN:
(纸本)9781665405409
In this paper, we present a novel convex method for the graph-structured sparse recovery. While various structured sparsities can be represented as the graph-structured sparsity, graph-structured sparse recovery remains to be a challenging nonconvex problem. To solve this difficulty, we propose a convex penalty function which automatically identifies the relevant subgraph of an underlying graph. We design a graph-structured recovery model using the proposed penalty, and develop its first-order iterative solver which consists only of simple operations such as closed-form proximity operators and difference operator on the graph. Numerical experiments show that the proposed method has better estimation accuracy than the existing convex regularizations using fixed structures.
Several recent works address the impact of inexact oracles in the convergence analysis of modern first-order optimization techniques, e.g. Bregman Proximal Gradient and Prox-Linear methods as well as their accelerated...
详细信息
Several recent works address the impact of inexact oracles in the convergence analysis of modern first-order optimization techniques, e.g. Bregman Proximal Gradient and Prox-Linear methods as well as their accelerated variants, extending their field of applicability. In this paper, we consider situations where the oracle's inexactness can be chosen upon demand, more precision coming at a computational price counterpart. Our main motivations arise from oracles requiring the solving of auxiliary subproblems or the inexact computation of involved quantities, e.g. a mini-batch stochastic gradient as a full-gradient estimate. We propose optimal inexactness schedules according to presumed oracle cost models and patterns of worst-case guarantees, covering among others convergence results of the aforementioned methods under the presence of inexactness. Specifically, we detail how to choose the level of inexactness at each iteration to obtain the best trade-off between convergence and computational investments. Furthermore, we highlight the benefits one can expect by tuning those oracles' accuracy instead of keeping it constant throughout. Finally, we provide extensive numerical experiments that support the practical interest of our approach, both in offline and online settings, applied to the Fast Gradient algorithm.
We consider a class of statistical estimation problems in which we are given a random data matrix ${\boldsymbol{X}}\in{\mathbb{R}}{n\times d}$ (and possibly some labels ${\boldsymbol{y}}\in{\mathbb{R}}{n}$) and would ...
详细信息
We consider a class of statistical estimation problems in which we are given a random data matrix ${\boldsymbol{X}}\in{\mathbb{R}}<^>{n\times d}$ (and possibly some labels ${\boldsymbol{y}}\in{\mathbb{R}}<^>{n}$) and would like to estimate a coefficient vector ${\boldsymbol{\theta }}\in{\mathbb{R}}<^>{d}$ (or possibly a constant number of such vectors). Special cases include low-rank matrix estimation and regularized estimation in generalized linear models (e.g. sparse regression). firstorder methods proceed by iteratively multiplying current estimates by ${\boldsymbol{X}}$ or its transpose. Examples include gradient descent or its accelerated variants. Under the assumption that the data matrix ${\boldsymbol{X}}$ is standard Gaussian, Celentano, Montanari, Wu (2020, Conference on Learning Theory, pp. 1078-1141. PMLR) proved that for any constant number of iterations (matrix vector multiplications), the optimal firstorderalgorithm is a specific approximate message passing algorithm (known as 'Bayes AMP'). The error of this estimator can be characterized in the high-dimensional asymptotics $n,d\to \infty $, $n/d\to \delta $, and provides a lower bound to the estimation error of any firstorderalgorithm. Here we present a simpler proof of the same result, and generalize it to broader classes of data distributions and of firstorderalgorithms, including algorithms with non-separable nonlinearities. Most importantly, the new proof is simpler, does not require to construct an equivalent tree-structured estimation problem, and is therefore susceptible of a broader range of applications.
暂无评论