This article deals with nonconvex stochastic optimization problems in deep learning. Appropriate learningrates, based on theory, for adaptive-learning-rate optimization algorithms (e.g., Adam and AMSGrad) to approxim...
详细信息
This article deals with nonconvex stochastic optimization problems in deep learning. Appropriate learningrates, based on theory, for adaptive-learning-rate optimization algorithms (e.g., Adam and AMSGrad) to approximate the stationary points of such problems are provided. These rates are shown to allow faster convergence than previously reported for these algorithms. Specifically, the algorithms are examined in numerical experiments on text and image classification and are shown in experiments to perform better with constant learningrates than algorithms using diminishing learningrates.
This paper presents a unified algorithmic framework for nonconvex stochastic optimization, which is needed to train deep neural networks. The unified algorithm includes the existing adaptive-learning-rateoptimization...
详细信息
This paper presents a unified algorithmic framework for nonconvex stochastic optimization, which is needed to train deep neural networks. The unified algorithm includes the existing adaptive-learning-rate optimization algorithms, such as adaptive Moment Estimation (Adam), adaptive Mean Square Gradient (AMSGrad), Adam with weighted gradient and dynamic bound of learningrate (GWDC), AMSGrad with weighted gradient and dynamic bound of learningrate (AMSGWDC), and Adapting stepsizes by the belief in observed gradients (AdaBelief). The paper also gives convergence analyses of the unified algorithm for constant and diminishing learningrates. When using a constant learningrate, the algorithm can approximate a stationary point of a nonconvex stochastic optimization problem. When using a diminishing rate, it converges to a stationary point of the problem. Hence, the analyses lead to the finding that the existing adaptive-learning-rate optimization algorithms can be applied to nonconvex stochastic optimization in deep neural networks in theory. Additionally, this paper provides numerical results showing that the unified algorithm can train deep neural networks in practice. Moreover, it provides numerical comparisons for unconstrained minimization using benchmark functions of the unified algorithm with certain heuristic intelligent optimizationalgorithms. The numerical comparisons show that a teaching-learning-based optimizationalgorithm and the unified algorithm perform well.
暂无评论