检索结果-内蒙古大学图书馆

41st International Conference on machine learning, ICML 2024

作者： Wang, Yilong Ye, Haishan Dai, Guang Tsang, Ivor W. Center for Intelligent Decision-Making and Machine Learning School of Management Xi'an Jiaotong University China SGIT AI Lab State Grid Corporation of China China Singapore College of Computing and Data Science NTU Singapore

This paper focuses on the large-scale optimization which is very popular in the big data era. The gradient sketching is an important technique in the large-scale optimization. Specifically, the random coordinate descent algorithm is a kind of gradient sketching method with the random sampling matrix as the sketching matrix. In this paper, we propose a novel gradient sketching called GSGD (Gaussian Sketched Gradient Descent). Compared with the classical gradient sketching methods such as the random coordinate descent and SEGA (Hanzely et al., 2018), our GSGD does not require the importance sampling but can achieve a fast convergence rate matching the ones of these methods with importance sampling. Furthermore, if the objective function has a non-smooth regularization term, our GSGD can also exploit the implicit structure information of the regularization term to achieve a fast convergence rate. Finally, our experimental results substantiate the effectiveness and efficiency of our algorithm. Copyright 2024 by the author(s)

关键词： Big data

来源：评论

学校读者我要写书评

暂无评论

Double Stochasticity Gazes Faster: Snap-Shot Decentralized Stochastic Gradient Tracking Methods 41

Double Stochasticity Gazes Faster: Snap-Shot Decentralized S...

引用

41st International Conference on machine learning, ICML 2024

作者： Di, Hao Ye, Haishan Chang, Xiangyu Dai, Guang Tsang, Ivor W. Center for Intelligent Decision-Making and Machine Learning School of Management Xi'an Jiaotong University China SGIT AI Lab State Grid Corporation of China China College of Computing and Data Science NTU Singapore Singapore

In decentralized optimization, m agents form a network and only communicate with their neighbors, which gives advantages in data ownership, privacy, and scalability. At the same time, decentralized stochastic gradient descent (SGD) methods, as popular decentralized algorithms for training large-scale machine learning models, have shown their superiority over centralized counterparts. Distributed stochastic gradient tracking (DSGT) (Pu & Nedić, 2021) has been recognized as the popular and state-of-the-art decentralized SGD method due to its proper theoretical guarantees. However, the theoretical analysis of DSGT (Koloskova et al., 2021) shows that its iteration complexity is (equation presented) where the doubly stochastic matrix W represents the network topology and CW is a parameter that depends on W. Thus, it indicates that the convergence property of DSGT is heavily affected by the topology of the communication network. To overcome the weakness of DSGT, we resort to the snapshot gradient tracking skill and propose two novel algorithms, snap-shot DSGT (SS DSGT) and accelerated snap-shot DSGT (ASS DSGT). We further justify that SS DSGT exhibits a lower iteration complexity compared to DSGT in the general communication network topology. Additionally, ASS DSGT matches DSGT's iteration complexity (equation presented) under the same conditions as DSGT. Numerical experiments validate SS DSGT's superior performance in the general communication network topology and exhibit better practical performance of ASS DSGT on the specified W compared to DSGT. Copyright 2024 by the author(s)

关键词： Network topology

来源：评论

学校读者我要写书评

暂无评论

Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient 41

Double Variance Reduction: A Smoothing Trick for Composite O...

引用

41st International Conference on machine learning, ICML 2024

作者： Di, Hao Ye, Haishan Zhang, Yueling Chang, Xiangyu Dai, Guang Tsang, Ivor W. Center for Intelligent Decision-Making and Machine Learning School of Management Xi'an Jiaotong University China SGIT AI Lab State Grid Corporation of China China International Business School Beijing Foreign Studies University Beijing China Singapore College of Computing and Data Science NTU Singapore

Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods. However, in composite optimization problems, ZO methods encounter an additional variance called the coordinate-wise variance, which stems from the random gradient estimation. To reduce this variance, prior works require estimating all partial derivatives, essentially approximating FO information. This approach demands O(d) function evaluations (d is the dimension size), which incurs substantial computational costs and is prohibitive in high-dimensional scenarios. This paper proposes the Zeroth-order Proximal Double Variance Reduction (ZPDVR) method, which utilizes the averaging trick to reduce both sampling and coordinate-wise variances. Compared to prior methods, ZPDVR relies solely on random gradient estimates, calls the stochastic zeroth-order oracle (SZO) in expectation O(1) times per iteration, and achieves the optimal O(d(n+κ) log(1/ϵ)) SZO query complexity in the strongly convex and smooth setting, where κ represents the condition number and ϵ is the desired accuracy. Empirical results validate ZPDVR's linear convergence and demonstrate its superior performance over other related methods. Copyright 2024 by the author(s)

关键词： Stochastic systems

来源：评论

学校读者我要写书评

暂无评论

Deep learning for Prediction of Population of Acetes in Avoiding Biological Hazards for Nuclear Power Plants 14

Deep Learning for Prediction of Population of Acetes in Avoi...

引用

14th International Conference on intelligent Human-machine Systems and Cybernetics, IHMSC 2022

作者： Dai, Li Zhang, Rongyong Huang, Suyuan Liu, Junyi Li, Qi Zhang, Zhen Jiang, Xinshu Qin, Zengchang China Nuclear Power Engineering Co. Ltd Beijing China Beihang University Intelligent Computing and Machine Learning Lab School of Automation Science and Electrical Engineering Beijing China

ISBN: (数字)9781665461696

ISBN: (纸本)9781665461696

There have been frequent incidents of water intake blockage due to marine organisms, which pose a serious threat to the normal operation of nuclear power plants across the world. In order to avoid biological hazards for Nuclear Power Plants, we investigated the disaster-caused marine organism. In this work, we focus on the acetes, which is the main cause of the accident. By investigating the biological characteristics of acetes, we have established a mathematical model of the population dynamics of acetes. We have also utilized two deep learning methods, LSTM and Transformer, to predict the population density of acetes. Finally, we have also compared the two methods. As a result, we find that LSTM performs better and it can be used for data-based dynamical modeling in future work. © 2022 IEEE.

关键词： Population statistics

来源：评论

学校读者我要写书评

暂无评论

Caseg: Clip-Based Action Segmentation With Learnable Text Prompt

Caseg: Clip-Based Action Segmentation With Learnable Text Pr...

引用

IEEE International Conference on Image Processing

作者： Suyuan Huang Haoxin Zhang Yanyu Xu Yan Gao Yao Hu Zengchang Qin Intelligent Computing and Machine Learning Lab School of ASEE Beihang University Xiaohongshu Inc. Institute of High Performance Computing A*Star Guangzhou Zhongsuan Cloud Technology Co.. Ltd.

ISBN: (数字)9798350349399

ISBN: (纸本)9798350349405

Video action segmentation aims to identify and localize actions. Existing models have achieved impressive performance with pre-extracted frame-level features, but this may limit zero-shot learning and cross-dataset inference, especially for new actions or scenes. To overcome this problem, we propose a novel end-to-end network designed for robust performance across both familiar and novel action segmentation scenarios. Our approach combines a plug-and-play visual prompt module enhancing CLIP features’ temporal understanding, and a learnable text prompt that enriches label semantics and refines the model’s focus, significantly boosting performance. Our results demonstrate that CLIP features can assist in action segmentation tasks, and prompts can improve task effectiveness. Furthermore, our findings show that CLIP features contain information that i3d features do not. We evaluate the proposed method on several video datasets, including Georgia Tech Egocentric Activities (GTEA), 50Salads, and Breakfast, and the results show that the proposed model outperforms existing SOTA models.

关键词： Image segmentation Visualization Zero-shot learning Surveillance Semantics Refining Human-robot interaction

来源：评论

学校读者我要写书评

暂无评论

Multi-View Document Representation learning for Open-Domain Dense Retrieval

arXiv

引用

arXiv 2022年

作者： Zhang, Shunyu Liang, Yaobo Gong, Ming Jiang, Daxin Duan, Nan Intelligent Computing and Machine Learning Lab School of ASEE Beihang University China Microsoft Research Asia China Microsoft STC Asia

Dense retrieval has achieved impressive advances in first-stage retrieval from a large-scale document collection, which is built on bi-encoder architecture to produce single vector representation of query and document. However, a document can usually answer multiple potential queries from different views. So the single vector representation of a document is hard to match with multi-view queries, and faces a semantic mismatch problem. This paper proposes a multi-view document representation learning framework, aiming to produce multi-view embeddings to represent documents and enforce them to align with different queries. First, we propose a simple yet effective method of generating multiple embeddings through viewers. Second, to prevent multi-view embeddings from collapsing to the same one, we further propose a global-local loss with annealed temperature to encourage the multiple viewers to better align with different potential queries. Experiments show our method outperforms recent works and achieves state-of-the-art results. Copyright © 2022, The Authors. All rights reserved.

关键词： Embeddings

来源：评论

学校读者我要写书评

暂无评论

Sparse Double Descent: Where Network Pruning Aggravates Overfitting

arXiv

引用

arXiv 2022年

作者： He, Zheng Xie, Zeke Zhu, Quanzhi Qin, Zengchang Intelligent Computing and Machine Learning Lab School of ASEE Beihang University Beijing China The University of Tokyo Japan RIKEN Center for AIP Japan

People usually believe that network pruning not only reduces the computational cost of deep networks, but also prevents overfitting by decreasing model capacity. However, our work surprisingly discovers that network pruning sometimes even aggravates overfitting. We report an unexpected sparse double descent phenomenon that, as we increase model sparsity via network pruning, test performance first gets worse (due to overfitting), then gets better (due to relieved overfitting), and gets worse at last (due to forgetting useful information). While recent studies focused on the deep double descent with respect to model overparameterization, they failed to recognize that sparsity may also cause double descent. In this paper, we have three main contributions. First, we report the novel sparse double descent phenomenon through extensive experiments. Second, for this phenomenon, we propose a novel learning distance interpretation that the curve of 2learning distance of sparse models (from initialized parameters to final parameters) may correlate with the sparse double descent curve well and reflect generalization better than minima flatness. Third, in the context of sparse double descent, a winning ticket in the lottery ticket hypothesis surprisingly may not always win. Copyright © 2022, The Authors. All rights reserved.

关键词： machine learning

来源：评论

学校读者我要写书评

暂无评论

Deep learning Super-Resolution-Based Channel Completion for Massive MISO Systems

引用

IEEE Signal Processing Letters 2025年 32卷 2254-2258页

作者： Zu, Keke He, Yuhan Chen, Hongyang Zheng, Yu Haardt, Martin Yangtze Delta Region Institute (Quzhou) University of Electronic Science and Technology of China Zhejiang China Shenzhen Key Laboratory of Advanced Machine Learning and Applications Shenzhen University Shenzhen China Research Center for Graph Computing Zhejiang Lab Hangzhou China JD Intelligent Cities Research Beijing China Communications Research Lab Ilmenau University of Technology Ilmenau Germany

With the deployment of large-scale antenna arrays, the already limited time-frequency resources are becoming increasingly scarce. In this study, we propose a novel Laplacian Pyramid Channel Completion Network (LPCCNet) designed for channel completion, thereby reducing the demand for time-frequency resources in massive MIMO systems. Compared with existing network models, the proposed LPCCNet, by employing a progressive upsampling architecture, effectively mitigates aliasing effects, suppresses error propagation, and achieves a substantial reduction in computational complexity. The simulation results show that LPCCNet achieves a superior channel completion quality compared to existing methods, particularly in rapidly time-varying scenarios. © 1994-2012 IEEE.

关键词： Channel estimation Feature extraction Image reconstruction Interpolation Training Superresolution Filters Vectors Convolutional neural networks Time-frequency analysis

来源：评论

学校读者我要写书评

暂无评论

Double variance reduction: a smoothing trick for composite optimization problems without first-order gradient 24

Double variance reduction: a smoothing trick for composite o...

引用

Proceedings of the 41st International Conference on machine learning

作者： Hao Di Haishan Ye Yueling Zhang Xiangyu Chang Guang Dai Ivor W. Tsang Center for Intelligent Decision-Making and Machine Learning School of Management Xi'an Jiaotong University China and SGIT AI Lab State Grid Corporation of China International Business School Beijing Foreign Studies University Beijing China Center for Intelligent Decision-Making and Machine Learning School of Management Xi'an Jiaotong University China SGIT AI Lab State Grid Corporation of China CFAR and IHPC Agency for Science Technology and Research (A*STAR) Singapore and College of Computing and Data Science NTU Singapore

Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods. However, in composite optimization problems, ZO methods encounter an additional variance called the coordinate-wise variance, which stems from the random gradient estimation. To reduce this variance, prior works require estimating all partial derivatives, essentially approximating FO information. This approach demands O(d) function evaluations (d is the dimension size), which incurs substantial computational costs and is prohibitive in high-dimensional scenarios. This paper proposes the Zeroth-order Proximal Double Variance Reduction (ZPDVR) method, which utilizes the averaging trick to reduce both sampling and coordinate-wise variances. Compared to prior methods, ZPDVR relies solely on random gradient estimates, calls the stochastic zeroth-order oracle (SZO) in expectation O(1) times per iteration, and achieves the optimal O(d(n+κ) log(1/ε)) SZO query complexity in the strongly convex and smooth setting, where κ represents the condition number and ε is the desired accuracy. Empirical results validate ZPDVR's linear convergence and demonstrate its superior performance over other related methods.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog

arXiv

引用

arXiv 2022年

作者： Zhang, Shunyu Jiang, Xiaoze Yang, Zequn Wan, Tao Qin, Zengchang Intelligent Computing and Machine Learning Lab School of ASEE Beihang University China School of BSME Beijing Advanced Innovation Center for Biomedical Engineering Beihang University China

Visual Dialog requires an agent to engage in a conversation with humans grounded in an image. Many studies on Visual Dialog focus on the understanding of the dialog history or the content of an image, while a considerable amount of commonsense-required questions are ignored. Handling these scenarios depends on logical reasoning that requires commonsense priors. How to capture relevant commonsense knowledge complementary to the history and the image remains a key challenge. In this paper, we propose a novel model by Reasoning with Multi-structure Commonsense Knowledge (RMK). In our model, the external knowledge is represented with sentence-level facts and graph-level facts, to properly suit the scenario of the composite of dialog history and image. On top of these multi-structure representations, our model can capture relevant knowledge and incorporate them into the vision and semantic features, via graph-based interaction and transformer-based fusion. Experimental results and analysis on VisDial v1.0 and VisDialCK datasets show that our proposed model effectively outperforms comparative methods. Copyright © 2022, The Authors. All rights reserved.

关键词： Graphic methods

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：