In ICLR's (2018) best paper "On the Convergence of Adam and Beyond", the author points out the shortcomings in Adam's convergence proof, proposes an AMSGRAD algorithm that can guarantee convergence a...
详细信息
ISBN:
(纸本)9781728137988
In ICLR's (2018) best paper "On the Convergence of Adam and Beyond", the author points out the shortcomings in Adam's convergence proof, proposes an AMSGRAD algorithm that can guarantee convergence as the number of iterations increases. However, through some comparative experiments, this paper finds that there are two problems in the convergence process of AMSGRAD algorithm. Firstly, the AMSGRAD algorithm is easy to oscillate;Secondly, the AMSGRAD algorithm converges slowly. After analysis, the above two problems can be solved by the following ways. When g(t-1) g(t) > 0, this paper adds the momentum term in Momentum algorithm to the AMSGRAD algorithm to accelerate convergence. When g(t-1) g(t) <= 0, this paper use SGD algorithm instead of AMSGRAD algorithm to update the model weights. In order to eliminate some negative effects of the previous parameter gradient on the current parameter gradient and reduce the oscillation amplitude of the objective function, the first-order and second-order moment estimations of the parameter gradient are recalculated when g(t-1) g(t) <= 0. Therefore, this paper proposes the acadg algorithm, which not only can improve the convergence speed, suppress the oscillation amplitude of the objective function, but also can improve the accuracy of training and test data sets.
暂无评论