检索结果-内蒙古大学图书馆

Double Stochasticity Gazes Faster: Snap-Shot Decentralized Stochastic Gradient Tracking Methods 41

学校读者我要写书评

暂无评论

Double Stochasticity Gazes Faster: Snap-Shot Decentralized S...

41st International Conference on machine learning, ICML 2024

作者： Di, Hao Ye, Haishan Chang, Xiangyu Dai, Guang Tsang, Ivor W. Center for Intelligent Decision-Making and Machine Learning School of Management Xi'an Jiaotong University China SGIT AI Lab State Grid Corporation of China China College of Computing and Data Science NTU Singapore Singapore

In decentralized optimization, m agents form a network and only communicate with their neighbors, which gives advantages in data ownership, privacy, and scalability. At the same time, decentralized stochastic gradient descent (SGD) methods, as popular decentralized algorithms for training large-scale machine learning models, have shown their superiority over centralized counterparts. Distributed stochastic gradient tracking (DSGT) (Pu & Nedić, 2021) has been recognized as the popular and state-of-the-art decentralized SGD method due to its proper theoretical guarantees. However, the theoretical analysis of DSGT (Koloskova et al., 2021) shows that its iteration complexity is (equation presented) where the doubly stochastic matrix W represents the network topology and CW is a parameter that depends on W. Thus, it indicates that the convergence property of DSGT is heavily affected by the topology of the communication network. To overcome the weakness of DSGT, we resort to the snapshot gradient tracking skill and propose two novel algorithms, snap-shot DSGT (SS DSGT) and accelerated snap-shot DSGT (ASS DSGT). We further justify that SS DSGT exhibits a lower iteration complexity compared to DSGT in the general communication network topology. Additionally, ASS DSGT matches DSGT's iteration complexity (equation presented) under the same conditions as DSGT. Numerical experiments validate SS DSGT's superior performance in the general communication network topology and exhibit better practical performance of ASS DSGT on the specified W compared to DSGT. Copyright 2024 by the author(s)

关键词： Network topology

Lepskii Principle for Distributed Kernel Ridge Regression

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Lin, Shao-Bo Center for Intelligent Decision-Making and Machine Learning School of Management Xi'an Jiaotong University Xi'An China

Parameter selection without communicating local data is quite challenging in distributed learning, exhibing an inconsistency between theoretical analysis and practical application of it in tackling distributively stored data. Motivated by the recently developed Lepskii principle and non-privacy communication protocol for kernel learning, we propose a Lepskii principle to equip distributed kernel ridge regression (DKRR) and consequently develop an adaptive DKRR with Lepskii principle (Lep-AdaDKRR for short) by using a double weighted averaging synthesization scheme. We deduce optimal learning rates for Lep-AdaDKRR and theoretically show that Lep-AdaDKRR succeeds in adapting to the regularity of regression functions, effective dimension decaying rate of kernels and different metrics of generalization, which fills the gap of the mentioned inconsistency between theory and application. © 2024, CC0.

关键词： Adversarial machine learning

Weighted Spectral Filters for Kernel Interpolation on Spheres: Estimates of Prediction Accuracy for Noisy Data

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Liu, Xiaotong Wang, Jinxin Wang, Di Lin, Shao-Bo The Center for Intelligent Decision-Making and Machine Learning School of Management Xi’an Jiaotong University Xi’an710049 China

Spherical radial-basis-based kernel interpolation abounds in image sciences including geophysical image reconstruction, climate trends description and image rendering due to its excellent spatial localization property and perfect approximation performance. However, in dealing with noisy data, kernel interpolation frequently behaves not so well due to the large condition number of the kernel matrix and instability of the interpolation process. In this paper, we introduce a weighted spectral filter approach to reduce the condition number of the kernel matrix and then stabilize kernel interpolation. The main building blocks of the proposed method are the well developed spherical positive quadrature rules and high-pass spectral filters. Using a recently developed integral operator approach for spherical data analysis, we theoretically demonstrate that the proposed weighted spectral filter approach succeeds in breaking through the bottleneck of kernel interpolation, especially in fitting noisy data. We provide optimal approximation rates of the new method to show that our approach does not compromise the predicting accuracy. Furthermore, we conduct both toy simulations and two real-world data experiments with synthetically added noise in geophysical image reconstruction and climate image processing to verify our theoretical assertions and show the feasibility of the weighted spectral filter approach. © 2024, CC0.

关键词： Image reconstruction

INTEGRAL OPERATOR APPROACHES FOR SCATTERED DATA FITTING ON SPHERES

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Lin, Shao-Bo Center for Intelligent Decision-Making and Machine Learning School of Management Xi'an Jiaotong University Xi'An710049 China

This paper focuses on scattered data fitting problems on spheres. We study the approximation performance of a class of weighted spectral filter algorithms (WSFA), including Tikhonov regularization, Landweber iteration, spectral cut-off, and iterated Tikhonov, in fitting noisy data with possibly unbounded random noise. For theoretical analysis, we develop an integral operator approach that can be regarded as an extension of the widely used sampling inequality approach and norming set method in the community of scattered data fitting. After providing an equivalence between the operator differences and quadrature rules, we succeed in deriving tight bounds for operator differences, explicit operator representations for WSFA and consequently optimal error estimates. Our derived error estimates do not suffer from the saturation phenomenon for Tikhonov regularization, native-space-barrier for existing error analysis and adapts to different embedding spaces. Based on the operator representations, we develop a Lepskii-type principle to determine the filter parameter of WSFA and a divide-and-conquer scheme to to reduce the computational burden and provide optimal approximation rates for corresponding algorithms. Copyright © 2024, The Authors. All rights reserved.

关键词： Choquet integral

Optimal Decentralized Composite Optimization for Convex Functions

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Ye, Haishan Chang, Xiangyu Center for Intelligent Decision-Making and Machine Learning School of Management Xi’an Jiaotong University China

In this paper, we focus on the decentralized composite optimization for convex functions. Because of advantages such as robust to the network and no communication bottle-neck in the central server, the decentralized optimization has attracted much research attention in signal processing, control, and optimization communities. Many optimal algorithms have been proposed for the objective function is smooth and (strongly)-convex in the past years. However, it is still an open question whether one can design an optimal algorithm when there is a non-smooth regularization term. In this paper, we fill the gap between smooth decentralized optimization and decentralized composite optimization and propose the first algorithm which can achieve both the optimal computation and communication complexities. Our experiments also validate the effectiveness and efficiency of our algorithm both in computation and communication. © 2023, CC BY-NC-ND.

关键词： Optimization algorithms

Kernel-Based Distributed Q-learning: A Scalable Reinforcement learning Approach for Dynamic Treatment Regimes

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Wang, Di Wang, Yao Lin, Shao-Bo Center for Intelligent Decision-Making and Machine Learning School of Management Xi’an Jiaotong University Xi’an China

In recent years, large amounts of electronic health records (EHRs) concerning chronic diseases have been collected to facilitate medical diagnosis. Modeling the dynamic properties of EHRs related to chronic diseases can be efficiently done using dynamic treatment regimes (DTRs). While reinforcement learning (RL) is a widely used method for creating DTRs, there is ongoing research in developing RL algorithms that can effectively handle large amounts of data. In this paper, we present a scalable kernel-based distributed Q-learning algorithm for generating DTRs. We perform both theoretical assessments and numerical analysis for the proposed approach. The results demonstrate that our algorithm significantly reduces the computational complexity associated with the state-of-the-art deep reinforcement learning methods, while maintaining comparable generalization performance in terms of accumulated rewards across stages, such as survival time or cumulative survival probability. Copyright © 2023, The Authors. All rights reserved.

关键词： Electronic health record

Optimal Approximation and learning Rates for Deep Convolutional Neural Networks

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Lin, Shao-Bo Center for Intelligent Decision-Making and Machine Learning School of Management Xi'an Jiaotong University Xi’an710049 China

This paper focuses on approximation and learning performance analysis for deep convolutional neural networks with zero-padding and max-pooling. We prove that, to approximate r-smooth function, the approximation rates of deep convolutional neural networks with depth L are of order (L2/log L)−2r/d, which is optimal up to a logarithmic factor. Furthermore, we deduce almost optimal learning rates for implementing empirical risk minimization over deep convolutional neural networks. Copyright © 2023, The Authors. All rights reserved.

关键词： Convolutional neural networks

Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient 41

学校读者我要写书评

暂无评论

Double Variance Reduction: A Smoothing Trick for Composite O...

41st International Conference on machine learning, ICML 2024

作者： Di, Hao Ye, Haishan Zhang, Yueling Chang, Xiangyu Dai, Guang Tsang, Ivor W. Center for Intelligent Decision-Making and Machine Learning School of Management Xi'an Jiaotong University China SGIT AI Lab State Grid Corporation of China China International Business School Beijing Foreign Studies University Beijing China Singapore College of Computing and Data Science NTU Singapore

Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods. However, in composite optimization problems, ZO methods encounter an additional variance called the coordinate-wise variance, which stems from the random gradient estimation. To reduce this variance, prior works require estimating all partial derivatives, essentially approximating FO information. This approach demands O(d) function evaluations (d is the dimension size), which incurs substantial computational costs and is prohibitive in high-dimensional scenarios. This paper proposes the Zeroth-order Proximal Double Variance Reduction (ZPDVR) method, which utilizes the averaging trick to reduce both sampling and coordinate-wise variances. Compared to prior methods, ZPDVR relies solely on random gradient estimates, calls the stochastic zeroth-order oracle (SZO) in expectation O(1) times per iteration, and achieves the optimal O(d(n+κ) log(1/ϵ)) SZO query complexity in the strongly convex and smooth setting, where κ represents the condition number and ϵ is the desired accuracy. Empirical results validate ZPDVR's linear convergence and demonstrate its superior performance over other related methods. Copyright 2024 by the author(s)

关键词： Stochastic systems

Adaptive Parameter Selection for Kernel Ridge Regression

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Lin, Shao-Bo Center for Intelligent Decision-Making and Machine Learning School of Management Xi'an Jiaotong University Xi'An710049 China

This paper focuses on parameter selection issues of kernel ridge regression (KRR). Due to special spectral properties of KRR, we find that delicate subdivision of the parameter interval shrinks the difference between two successive KRR estimates. Based on this observation, we develop an early-stopping type parameter selection strategy for KRR according to the so-called Lepskii-type principle. Theoretical verifications are presented in the framework of learning theory to show that KRR equipped with the proposed parameter selection strategy succeeds in achieving optimal learning rates and adapts to different norms, providing a new record of parameter selection for kernel methods. © 2023, CC0.

关键词： Regression analysis