This paper focuses on the large-scale optimization which is very popular in the big data era. The gradient sketching is an important technique in the large-scale optimization. Specifically, the random coordinate desce...
详细信息
This paper focuses on the large-scale optimization which is very popular in the big data era. The gradient sketching is an important technique in the large-scale optimization. Specifically, the random coordinate descent algorithm is a kind of gradient sketching method with the random sampling matrix as the sketching matrix. In this paper, we propose a novel gradient sketching called GSGD (Gaussian Sketched Gradient Descent). Compared with the classical gradient sketching methods such as the random coordinate descent and SEGA (Hanzely et al., 2018), our GSGD does not require the importance sampling but can achieve a fast convergence rate matching the ones of these methods with importance sampling. Furthermore, if the objective function has a non-smooth regularization term, our GSGD can also exploit the implicit structure information of the regularization term to achieve a fast convergence rate. Finally, our experimental results substantiate the effectiveness and efficiency of our algorithm. Copyright 2024 by the author(s)
In decentralized optimization, m agents form a network and only communicate with their neighbors, which gives advantages in data ownership, privacy, and scalability. At the same time, decentralized stochastic gradient...
详细信息
In decentralized optimization, m agents form a network and only communicate with their neighbors, which gives advantages in data ownership, privacy, and scalability. At the same time, decentralized stochastic gradient descent (SGD) methods, as popular decentralized algorithms for training large-scale machinelearning models, have shown their superiority over centralized counterparts. Distributed stochastic gradient tracking (DSGT) (Pu & Nedić, 2021) has been recognized as the popular and state-of-the-art decentralized SGD method due to its proper theoretical guarantees. However, the theoretical analysis of DSGT (Koloskova et al., 2021) shows that its iteration complexity is (equation presented) where the doubly stochastic matrix W represents the network topology and CW is a parameter that depends on W. Thus, it indicates that the convergence property of DSGT is heavily affected by the topology of the communication network. To overcome the weakness of DSGT, we resort to the snapshot gradient tracking skill and propose two novel algorithms, snap-shot DSGT (SS DSGT) and accelerated snap-shot DSGT (ASS DSGT). We further justify that SS DSGT exhibits a lower iteration complexity compared to DSGT in the general communication network topology. Additionally, ASS DSGT matches DSGT's iteration complexity (equation presented) under the same conditions as DSGT. Numerical experiments validate SS DSGT's superior performance in the general communication network topology and exhibit better practical performance of ASS DSGT on the specified W compared to DSGT. Copyright 2024 by the author(s)
Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods. However, in composite optimization p...
详细信息
Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods. However, in composite optimization problems, ZO methods encounter an additional variance called the coordinate-wise variance, which stems from the random gradient estimation. To reduce this variance, prior works require estimating all partial derivatives, essentially approximating FO information. This approach demands O(d) function evaluations (d is the dimension size), which incurs substantial computational costs and is prohibitive in high-dimensional scenarios. This paper proposes the Zeroth-order Proximal Double Variance Reduction (ZPDVR) method, which utilizes the averaging trick to reduce both sampling and coordinate-wise variances. Compared to prior methods, ZPDVR relies solely on random gradient estimates, calls the stochastic zeroth-order oracle (SZO) in expectation O(1) times per iteration, and achieves the optimal O(d(n+κ) log(1/ϵ)) SZO query complexity in the strongly convex and smooth setting, where κ represents the condition number and ϵ is the desired accuracy. Empirical results validate ZPDVR's linear convergence and demonstrate its superior performance over other related methods. Copyright 2024 by the author(s)
There have been frequent incidents of water intake blockage due to marine organisms, which pose a serious threat to the normal operation of nuclear power plants across the world. In order to avoid biological hazards f...
详细信息
Video action segmentation aims to identify and localize actions. Existing models have achieved impressive performance with pre-extracted frame-level features, but this may limit zero-shot learning and cross-dataset in...
详细信息
ISBN:
(数字)9798350349399
ISBN:
(纸本)9798350349405
Video action segmentation aims to identify and localize actions. Existing models have achieved impressive performance with pre-extracted frame-level features, but this may limit zero-shot learning and cross-dataset inference, especially for new actions or scenes. To overcome this problem, we propose a novel end-to-end network designed for robust performance across both familiar and novel action segmentation scenarios. Our approach combines a plug-and-play visual prompt module enhancing CLIP features’ temporal understanding, and a learnable text prompt that enriches label semantics and refines the model’s focus, significantly boosting performance. Our results demonstrate that CLIP features can assist in action segmentation tasks, and prompts can improve task effectiveness. Furthermore, our findings show that CLIP features contain information that i3d features do not. We evaluate the proposed method on several video datasets, including Georgia Tech Egocentric Activities (GTEA), 50Salads, and Breakfast, and the results show that the proposed model outperforms existing SOTA models.
Dense retrieval has achieved impressive advances in first-stage retrieval from a large-scale document collection, which is built on bi-encoder architecture to produce single vector representation of query and document...
详细信息
People usually believe that network pruning not only reduces the computational cost of deep networks, but also prevents overfitting by decreasing model capacity. However, our work surprisingly discovers that network p...
详细信息
With the deployment of large-scale antenna arrays, the already limited time-frequency resources are becoming increasingly scarce. In this study, we propose a novel Laplacian Pyramid Channel Completion Network (LPCCNet...
详细信息
Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods. However, in composite optimization p...
Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods. However, in composite optimization problems, ZO methods encounter an additional variance called the coordinate-wise variance, which stems from the random gradient estimation. To reduce this variance, prior works require estimating all partial derivatives, essentially approximating FO information. This approach demands O(d) function evaluations (d is the dimension size), which incurs substantial computational costs and is prohibitive in high-dimensional scenarios. This paper proposes the Zeroth-order Proximal Double Variance Reduction (ZPDVR) method, which utilizes the averaging trick to reduce both sampling and coordinate-wise variances. Compared to prior methods, ZPDVR relies solely on random gradient estimates, calls the stochastic zeroth-order oracle (SZO) in expectation O(1) times per iteration, and achieves the optimal O(d(n+κ) log(1/ε)) SZO query complexity in the strongly convex and smooth setting, where κ represents the condition number and ε is the desired accuracy. Empirical results validate ZPDVR's linear convergence and demonstrate its superior performance over other related methods.
Visual Dialog requires an agent to engage in a conversation with humans grounded in an image. Many studies on Visual Dialog focus on the understanding of the dialog history or the content of an image, while a consider...
详细信息
暂无评论