检索结果-内蒙古大学图书馆

On the convergence of the gradient descent method with stochastic fixed-point rounding errors under the polyak-Łojasiewicz inequality

引用

COMPUTATIONAL OPTIMIZATION AND APPLICATIONS 2025年第3期90卷 753-799页

作者： Xia, Lu Massei, Stefano Hochstenbach, Michiel E. Eindhoven Univ Technol Dept Math & Comp Sci NL-5600 MB Eindhoven Netherlands Univ Pisa Dept Math I-56127 Pisa Italy

In the training of neural networks with low-precision computation and fixed-point arithmetic, rounding errors often cause stagnation or are detrimental to the convergence of the optimizers. This study provides insights into the choice of appropriate stochastic rounding strategies to mitigate the adverse impact of roundoff errors on the convergence of the gradient descent method, for problems satisfying the polyak-& lstrok;ojasiewicz inequality. Within this context, we show that a biased stochastic rounding strategy may be even beneficial in so far as it eliminates the vanishing gradient problem and forces the expected roundoff error in a descent direction. Furthermore, we obtain a bound on the convergence rate that is stricter than the one achieved by unbiased stochastic rounding. The theoretical analysis is validated by comparing the performances of various rounding strategies when optimizing several examples using low-precision fixed-point arithmetic.

关键词： Fixed-point arithmetic Rounding error analysis Gradient descent Low-precision Stochastic rounding polyak-& lstrok ojasiewicz inequality

来源：评论

学校读者我要写书评

暂无评论

Compressed gradient tracking algorithms for distributed nonconvex optimization

引用

AUTOMATICA 2025年 177卷

作者： Xu, Lei Yi, Xinlei Wen, Guanghui Shi, Yang Johansson, Karl H. Yang, Tao Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110819 Peoples R China Univ Victoria Dept Mech Engn Victoria BC V8W 2Y2 Canada Tongji Univ Coll Elect & Informat Engn Shanghai 201804 Peoples R China MIT Lab Informat & Decis Syst Cambridge MA 02139 USA Southeast Univ Sch Math Dept Syst Sci Nanjing 210096 Peoples R China KTH Royal Inst Technol Sch Elect Engn & Comp Sci S-10044 Stockholm Sweden

In this paper, we study the distributed nonconvex optimization problem, aiming to minimize the average value of the local nonconvex cost functions using local information exchange. To reduce the communication overhead, we introduce three general classes of compressors, i.e., compressors with bounded relative compression error, compressors with globally bounded absolute compression error, and compressors with locally bounded absolute compression error. By integrating them, respectively, with the distributed gradient tracking algorithm, we then propose three corresponding compressed distributed nonconvex optimization algorithms. Motivated by the state-of-the-art BEER algorithm proposed in Zhao et al. (2022), which is an efficient compressed algorithm integrating gradient tracking with biased and contractive compressors, our first proposed algorithm extends this algorithm to accommodate both biased and non-contractive compressors For each algorithm, we design a novel Lyapunov function to demonstrate its sublinear convergence to a stationary point if the local cost functions are smooth. Furthermore, when the global cost function satisfies the polyak-& lstrok;ojasiewicz (P-& lstrok;) condition, we show that our proposed algorithms linearly converge to a global optimal point. It is worth noting that, for compressors with bounded relative compression error and globally bounded absolute compression error, our proposed algorithms' parameters do not require prior knowledge of the P-& lstrok;constant. (c) 2025 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.

关键词： Communication compression Gradient tracking algorithm Linear convergence Nonconvex optimization polyak-& lstrok ojasiewicz condition Sublinear convergence

来源：评论

学校读者我要写书评

暂无评论

Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth Soft-Thresholding

引用

IEEE TRANSACTIONS ON SIGNAL PROCESSING 2024年 72卷 3272-3286页

作者： Shah, Shaik Basheeruddin Pradhan, Pradyumna Pu, Wei Randhi, Ramunaidu Rodrigues, Miguel R. D. Eldar, Yonina C. Weizmann Inst Sci IL-7610001 Rehovot Israel Indian Inst Petr & Energy Dept Humanities & Sci Visakhapatnam 530003 India Univ Elect Sci & Technol Chengdu 610054 Peoples R China UCL Dept Elect & Elect Engn London WC1H 9BT England

Solving linear inverse problems plays a crucial role in numerous applications. Algorithm unfolding based, model-aware data-driven approaches have gained significant attention for effectively addressing these problems. Learned iterative soft-thresholding algorithm (LISTA) and alternating direction method of multipliers compressive sensing network (ADMM-CSNet) are two widely used such approaches, based on ISTA and ADMM algorithms, respectively. In this work, we study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs, for finite-layer unfolded networks such as LISTA and ADMM-CSNet with smooth soft-thresholding in an over-parameterized (OP) regime. We achieve this by leveraging a modified version of the polyak-& lstrok;ojasiewicz, denoted PL*, condition. Satisfying the PL* condition within a specific region of the loss landscape ensures the existence of a global minimum and exponential convergence from initialization using gradient descent based methods. Hence, we provide conditions, in terms of the network width and the number of training samples, on these unfolded networks for the PL* condition to hold, by deriving the Hessian spectral norm. Additionally, we show that the threshold on the number of training samples increases with the increase in the network width. Furthermore, we compare the threshold on training samples of unfolded networks with that of a standard fully-connected feed-forward network (FFNN) with smooth soft-thresholding non-linearity. We prove that unfolded networks have a higher threshold value than FFNN. Consequently, one can expect a better expected error for unfolded networks than FFNN.

关键词： Training Vectors Optimization Convergence Computational modeling Signal processing algorithms Mathematical models Optimization guarantees algorithm unfolding LISTA ADMM-CSNet polyak-& lstrok ojasiewicz condition

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：