作者:
Zhang, HaoLu, ShuxiaHebei Univ
Coll Math & Informat Sci Key Lab Machine Learning & Computat Intelligence H Baoding 071002 Peoples R China
This paper focuses on the non-convex, non-smooth composite optimization problem. It consists of a non-convex loss function and a non-smooth regularizer function that admits a proximal mapping. However, the method is s...
详细信息
This paper focuses on the non-convex, non-smooth composite optimization problem. It consists of a non-convex loss function and a non-smooth regularizer function that admits a proximal mapping. However, the method is still limited in handling objective functions that involve non-smooth regularizer. How to determine the step size for solving composite optimization problems can be a challenge. To address this gap, we propose a recursivegradient descent algorithm using generalized hyper-gradient descent, named ProxSarah-GHD, which utilizes variance reduction techniques and provides update rules for adaptive step sizes. To improve its generalization in proximal gradient descent, a generalized variant of hyper-gradient descent, named Generalized Hyper-gradient Descent (GHD), is proposed in this paper. We prove that ProxSarah-GHD attains a linear convergence rate. Moreover, we provide the oracle complexity of ProxSarah-GHD as O (epsilon-3) and O (\/n epsilon-2 + n) in the online setting and finite-sum setting, respectively. In addition, to avoid the trouble of manually adjusting the batch size, we develop a novel Exponentially Increasing Mini-batch scheme for ProxSarah-GHD, named ProxSarah-GHD-EIM. The theoretical analysis that shows ProxSarah-GHD-EIM achieves a linear convergence rate is also provided, and shows that its total complexity is O (epsilon-4 + epsilon-2) and O(n + epsilon-4 + epsilon-2) in the online setting and finite-sum setting, respectively. Numerical experiments on standard datasets verify the superiority of the ProxSarah-GHD over other methods. We further analyze the sensitivity of the ProxSarah-GHD-EIM to its hyperparameters, conducting experiments on standard datasets.
暂无评论