In this paper, we provide a rigorous proof of convergence of the Adaptive Moment Estimate (Adam) algorithm for a wide class of optimization objectives. Despite the popularity and efficiency of the Adam algorithm in tr...
详细信息
ISBN:
(纸本)9781713899921
In this paper, we provide a rigorous proof of convergence of the Adaptive Moment Estimate (Adam) algorithm for a wide class of optimization objectives. Despite the popularity and efficiency of the Adam algorithm in training deep neural networks, its theoretical properties are not yet fully understood, and existing convergence proofs require unrealistically strong assumptions, such as globally bounded gradients, to show the convergence to stationary points. In this paper, we show that Adam provably converges to epsilon-stationary points with O(epsilon(-4)) gradient complexity under far more realistic conditions. The key to our analysis is a new proof of boundedness of gradients along the optimization trajectory of Adam, under a generalized smoothness assumption according to which the local smoothness (i.e., Hessian norm when it exists) is bounded by a sub-quadratic function of the gradient norm. Moreover, we propose a variance-reduced version of Adam with an accelerated gradient complexity of O(epsilon(-3)).
Chance constrained optimization problems allow to model problems where constraints involving stochastic components should only be violated with a small probability. Evolutionary algorithms have been applied to this sc...
详细信息
Competitive facility location problems involve two different firms competing for customers, each aiming to increase its market share. Typically, there exists a hierarchy between the firms based on factors such as size...
详细信息
Hypervolume subset selection (HSS) has received significant attention since it has a strong connection with evolutionary multi-objective optimization (EMO), such as environment selection and post-processing to identif...
详细信息
We address constraint-coupled optimization for a system composed of multiple cooperative agents communicating over a time-varying network. We propose a distributed proximal minimization algorithm that is guaranteed to...
详细信息
We address constraint-coupled optimization for a system composed of multiple cooperative agents communicating over a time-varying network. We propose a distributed proximal minimization algorithm that is guaranteed to converge to an optimal solution of the optimization problem, under suitable convexity and connectivity assumptions. The performance of the introduced algorithm is shown on a numerical example of a charging scheduling problem for a fleet of plug-in electric vehicles.
Selecting an appropriate step size is critical in Gradient Descent algorithms used to train Neural Networks for Deep Learning tasks. A small value of the step size leads to slow convergence, while a large value can le...
详细信息
This paper is devoted to the study of approximate solutions for a multiobjective interval-valued optimization problem based on an interval order. We establish new existence theorems of approximate solutions for such a...
详细信息
Subset selection is a fundamental problem in combinatorial optimization, which has a wide range of applications such as influence maximization and sparse regression. The goal is to select a subset of limited size from...
详细信息
We propose a linear programming method that is based on active-set changes and proximal-point iterations. The method solves a sequence of least-distance problems using a warm-started quadratic programming solver that ...
详细信息
We propose a linear programming method that is based on active-set changes and proximal-point iterations. The method solves a sequence of least-distance problems using a warm-started quadratic programming solver that can reuse internal matrix factorizations from the previously solved least-distance problem. We show that the proposed method terminates in a finite number of iterations and that it outperforms state-of-the-art LP solvers in scenarios where an extensive number of small/medium scale LPs need to be solved rapidly, occurring in, for example, multi-parametric programming algorithms. In particular, we show how the proposed method can accelerate operations such as redundancy removal, computation of Chebyshev centers and solving linear feasibility problems.
The Local Model State Space Network (LMSSN) is a recently developed black box algorithm in nonlinear system identification. It has proven to be an appropriate tool on benchmark problems as well as for real-world proce...
详细信息
The Local Model State Space Network (LMSSN) is a recently developed black box algorithm in nonlinear system identification. It has proven to be an appropriate tool on benchmark problems as well as for real-world processes. A severe shortcoming though is the long computation time that is necessary for model training. Therefore, a different optimization strategy, the adaptive moment estimation (ADAM) method with mini batches is used for the LMSSN and compared to the current Quasi-Newton (QN) optimization method. It is shown on a numerical Hammerstein example and on a well known Wiener-Hammerstein benchmark that the use of ADAM and mini batches does not limit the performance of the LMSSN algorithm and speeds up the nonlinear optimization per investigated split by more than 30 times. The price to be paid, however, is higher parameter variance (less interpretability) and more tedious hyperparameter tuning.
暂无评论