检索结果-内蒙古大学图书馆

36th Annual Conference on Learning Theory (COLT)

作者： Agarwal, Alekh Jin, Yujia Zhang, Tong Google Res Mountain View CA 94043 USA Stanford Univ Stanford CA 94305 USA

We study time-inhomogeneous episodic reinforcement learning (RL) under general function approximation and sparse rewards. We design a new algorithm, Variance-weighted Optimistic Q-Learning (VOQL), based on Q-learning and bound its regret assuming closure under Bellman backups, and bounded Eluder dimension for the regression function class. As a special case, VOQL achieves (O) over tilde (d root TH + d(6)H(5)) regret over T episodes for a horizon H MDP under (d-dimensional) linear function approximation, which is asymptotically optimal. Our algorithm incorporates weighted regression-based upper and lower bounds on the optimal value function to obtain this improved regret. The algorithm is computationally efficient given a regression oracle over the function class, making this the first computationally tractable and statistically optimal approach for linear MDPs.

关键词： Reinforcement learning nonlinear function approximation model-free algorithms eluder dimension

来源：评论

学校读者我要写书评

暂无评论

Forward Actor-Critic for nonlinear function approximation in Reinforcement Learning 16

Forward Actor-Critic for Nonlinear Function Approximation in...

引用

16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS)

作者： Veeriah, Vivek van Seijen, Harm Sutton, Richard S. Univ Alberta Dept Comp Sci Edmonton AB Canada Univ Alberta Edmonton AB Canada

ISBN: (纸本)9781510855076

Multi-step methods are important in reinforcement learning (RL). Eligibility traces, the usual way of handling them, works well with linear function approximators. Recently, van Seijen (2016) had introduced a delayed learning approach, without eligibility traces, for handling the multi-step lambda-return with nonlinear function approximators. However, this was limited to action-value methods. In this paper, we extend this approach to handle n-step returns, generalize this approach to policy gradient methods and empirically study the effect of such delayed updates in control tasks. Specifically, we introduce two novel forward actor-critic methods and empirically investigate our proposed methods with the conventional actor-critic method on mountain car and pole-balancing tasks. From our experiments, we observe that forward actor-critic dramatically outperforms the conventional actor-critic in these standard control tasks. Notably, this forward actor-critic method has produced a new class of multi-step RL algorithms without eligibility traces.

关键词： Reinforcement Learning Actor-Critic Policy Gradient nonlinear function approximation Incremental Learning

来源：评论

学校读者我要写书评

暂无评论

Forward Actor-Critic for nonlinear function approximation in Reinforcement Learning 17

Forward Actor-Critic for Nonlinear Function Approximation in...

引用

International Conference on Autonomous Agents and Multiagent Systems

作者： Vivek Veeriah Harm van Seijen Richard S. Sutton Dept. of Computing Science University of Alberta

ISBN: (纸本)9781510855076

Multi-step methods are important in reinforcement learning (RL). Eligibility traces, the usual way of handling them, works well with linear function approximators. Recently, van Seijen (2016) had introduced a delayed learning approach, without eligibility traces, for handling the multi-step λ-return with nonlinear function approximators. However, this was limited to action-value methods In this paper, we extend this approach to handle n-step returns, generalize this approach to policy gradient methods and empirically study the effect of such delayed updates in control tasks. Specifically, we introduce two novel forward actorcritic methods and empirically investigate our proposed methods with the conventional actor-critic method on mountain car and pole-balancing tasks. From our experiments, we observe that forward actor-critic dramatically outperforms the conventional actor-critic in these standard control tasks. Notably, this forward actor-critic method has produced a new class of multi-step RL algorithms without eligibility traces.

关键词： Reinforcement Learning Actor-Critic Policy Gradient nonlinear function approximation Incremental Learning learning (artificial intelligence) Learning eligibility nonlinear functions control task New classes function approximation

来源：评论

学校读者我要写书评

暂无评论

Service placement strategies in mobile edge computing based on an improved genetic algorithm

引用

PERVASIVE AND MOBILE COMPUTING 2024年 105卷

作者： Zheng, Ruijuan Xu, Junwei Wang, Xueqi Liu, Muhua Zhu, Junlong Henan Univ Sci & Technol Sch Informat Engn Luoyang 471023 Henan Peoples R China

In mobile edge computing (MEC), quality of service (QoS) is closely related to optimizing service placement strategies, which is crucial to providing efficient services that meet user needs. However, due to the mobility of users and the energy consumption limit of edge servers, the existing policies make it difficult to ensure the QoS level of users. In this paper, a novel genetic algorithm based on a simulated annealing algorithm is proposed to balance the QoS of users and the energy consumption of edge servers. Finally, the effectiveness of the algorithm is verified by experiments. The results show that the QoS value obtained by the proposed algorithm is closer to the maximum value, which has significant advantages in improving QoS value and resource utilization. In addition, in software development related to mobile edge computing, our algorithm helps improve the program's running speed.

关键词： Energy consumption Genetic algorithm nonlinear function approximation Mobile edge computing Service placement

来源：评论

学校读者我要写书评

暂无评论

Adaptive temporal-difference learning via deep neural network function approximation: a non-asymptotic analysis

引用

COMPLEX & INTELLIGENT SYSTEMS 2025年第2期11卷 1-19页

作者： Wang, Guoyong Fu, Tiange Zheng, Ruijuan Zhao, Xuhui Zhu, Junlong Zhang, Mingchuan Luoyang Inst Sci & Technol Sch Informat Engn Luoyang 471023 Peoples R China Longmen Lab Luoyang 471023 Peoples R China Henan Univ Sci & Technol Sch Informat Engn Luoyang 471023 Peoples R China

Although deep reinforcement learning has achieved notable practical achievements, its theoretical foundations have been scarcely explored until recent times. Nonetheless, the rate of convergence for current neural temporal-difference (TD) learning algorithms is constrained, largely due to their high sensitivity to stepsize choices. In order to mitigate this issue, we propose an adaptive neural TD algorithm (AdaBNTD) inspired by the superior performance of adaptive gradient techniques in training deep neural networks. Simultaneously, we derive non-asymptotic bounds for AdaBNTD within the Markovian observation framework. In particular, AdaBNTD is capable of converging to the global optimum of the mean square projection Bellman error (MSPBE) with a convergence rate of O(1/K)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\mathcal {O}}}(1/\sqrt{K})$$\end{document}, where K denotes the iteration count. Besides, the effectiveness AdaBNTD is also verified through several reinforcement learning benchmark domains.

关键词： Adaptive methods Non-asymptotic convergence nonlinear function approximation Reinforcement learning Temporal-difference learning

来源：评论

学校读者我要写书评

暂无评论

A data-driven implicit deep adaptive neuro-fuzzy inference system capable of manifold learning for function approximation

引用

APPLIED SOFT COMPUTING 2024年 155卷

作者： Salimi-Badr, Armin Shahid Beheshti Univ Fac Comp Sci & Engn Tehran Iran

Fuzzy Neural Networks (FNN) have the ability of decision-making based on constructing semi-ellipsoidal clusters in the input space as the antecedent parts of their fuzzy rules. To determine the output value for each input instance, FNNs consider its membership degree to different sub-regions of the input space. However, forming such meaningful sub-regions is not possible in all applications due to the nonlinear interactions among input variables and their low information gain. Indeed, the samples could be distributed on a manifold in the input space. Therefore, to cover the input space, we need lots of rules, each representing a small region of input space. This issue decreases the generalization ability of the model along with its explainability. Consequently, to efficiently form fuzzy rules, first, it is necessary to unfold the manifold by mapping the samples to an appropriate embedding space. Next, the fuzzy rules in the form of semi-ellipsoidal regions should be constructed in this extracted feature space. Deep Fuzzy Neural Networks address this problem by representation learning through stacking multiple cascade mapping layers. In this paper, we propose a novel approach for nonlinear function approximation and time-series prediction problems, based on using the kernel trick to implicitly learn the mapping function to the new feature space. Moreover, to initialize the fuzzy rules, a KNN-based method using the kernel trick is proposed. A hierarchical Levenberg-Marquardt approach is applied to learn the model's parameters. The performance and structure of the proposed method are studied and compared with some other relevant methods in synthetic and real-world benchmarks. Based on these experiments, the proposed method has the best performance with the most parsimonious architecture. According to these experiments, the test RMSE of the proposed method is 0.002 for Mc-Glass chaotic time-series prediction, 0.015 for a nonlinear dynamic system identification, 0.0345 f

关键词： Kernel trick Manifold learning Fuzzy neural networks Representation learning nonlinear function approximation

来源：评论

学校读者我要写书评

暂无评论

A novel learning algorithm based on computing the rules' desired outputs of a TSK fuzzy neural network with non-separable fuzzy rules

引用

NEUROCOMPUTING 2022年第0期470卷 139-153页

作者： Salimi-Badr, Armin Ebadzadeh, Mohammad Mehdi Shahid Beheshti Univ Fac Comp Sci & Engn Tehran Iran Amirkabir Univ Technol Dept Comp Engn Tehran Iran

In this paper, a novel learning approach to train fuzzy neural networks' parameters based on calculating the desired outputs of their rules, is proposed. We describe the desired outputs of fuzzy rules as values that make the output error equal to the minimum. To find these desired outputs, a new constrained convex optimization problem is introduced and solved. Afterward, the parameters of fuzzy rules are trained to reduce the error between the current rules' outputs and the estimated desired ones. Therefore, the proposed learning method avoids direct output error backpropagation, which leads to vanishing gradient and consequently getting stuck in a local optimum. Therefore, the proposed method does not need any sophisticated initialization method. This learning method is successfully utilized to train a new Takagi-Sugeno-Kang (TSK) Fuzzy Neural Network with correlated fuzzy rules. The proposed paradigm, including the proposed TSK correlation-aware architecture along with the learning method, is successfully applied to six real-world time-series predictions, regression problems, and nonlinear system identification. According to the experimental results, the performance of our proposed method outperforms other methods with a more parsimonious structure. (c) 2021 Elsevier B.V. All rights reserved.

关键词： Takagi-Sugeno-Kang (TSK) fuzzy neural networks Non-separable fuzzy rules Constrained convex optimization problem Correlation-aware architecture Gradient descent nonlinear function approximation Time-series prediction

来源：评论

学校读者我要写书评

暂无评论

Stacked Broad Learning System: From Incremental Flatted Structure to Deep Model

引用

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS 2021年第1期51卷 209-222页

作者： Liu, Zhulin Chen, C. L. Philip Feng, Shuang Feng, Qiying Zhang, Tong South China Univ Technol Sch Comp Sci & Engn Guangzhou 510641 Guangdong Peoples R China Univ Macau Fac Sci & Technol Macau Peoples R China Beijing Normal Univ Sch Appl Math Zhuhai 519087 Peoples R China

The broad learning system (BLS) has been proved to be effective and efficient lately. In this article, several deep variants of BLS are reviewed, and a new adaptive incremental structure, Stacked BLS, is proposed. The proposed model is a novel incremental stacking of BLS. This invariant inherits the efficiency and effectiveness of BLS that the structure and weights of lower layers of BLS are fixed when the new blocks are added. The incremental stacking algorithm computes not only the connection weights between the newly stacking blocks but also the connection weights of the enhancement nodes within the BLS block. The Stacked BLS is considered as the increment of "layers" and "neurons" dynamically during the training for multilayer neural networks. The proposed architecture along with the training algorithms that utilizes the residual characteristic is very versatile in comparison with traditional fixed architecture. Finally, experimental results on UCI datasets, MNIST dataset, NORB dataset, CIFAR-10 dataset, SVHN dataset, and CIFAR-100 dataset indicate that the proposed method outperforms the selected state-of-the-art methods on both accuracy and training speed, such as deep residual networks. The results also imply that the proposed structure could highly reduce the number of nodes and the training time of the original BLS in the classification task of some datasets.

关键词： Broad learning system (BLS) deep learning functional link neural networks nonlinear function approximation universal approximation

来源：评论

学校读者我要写书评

暂无评论

Rollout algorithm for light-weight physical-layer authentication in cognitive radio networks

引用

IET COMMUNICATIONS 2020年第18期14卷 3128-3134页

作者： Yan, Shengnan Wang, Xiaoding Xu, Li Fujian Normal Univ Coll Math & Informat Fuzhou 350117 Fujian Peoples R China Fujian Normal Univ Key Lab Network Secur & Cryptol Fuzhou 350117 Fujian Peoples R China

Cognitive radio networks (CRNs) are vulnerable to spoofing attacks due to their wireless and cognitive nature. Since the traditional cryptographic authentication can hardly prevent such attacks in CRNs, the physical-layer authentication has been investigated for recent years. To achieve a light-weight physical-layer authentication, a rollout partially observable Markov decision process-based algorithm, named RoPOMDP, is proposed in this study. In general, RoPOMDP formulates the physical-layer authentication as a zero-sum game, based on which a hypothesis test upon channel vectors is developed. That allows us to design the gains for both spoofers and receivers based on Bayesian risks for the game, in which the spoofing attack probability is predicted by a non-linear function approximation utilising v-support vector regression. Then, a RoPOMDP is employed to estimate the optimal threshold for the test statistic such that spoofing attacks can be detected. The theoretical analysis and simulations indicate that: (i) RoPOMDP improves the spoofing detection accuracy;(ii) as a light-weight algorithm, the complexity of RoPOMDP is lower than contemporary ones.

关键词： Markov processes regression analysis cognitive radio telecommunication security function approximation authorisation Bayes methods rollout algorithm light-weight physical-layer authentication cognitive radio networks CRNs wireless nature cognitive nature cryptographic authentication RoPOMDP spoofing attack probability nonlinear function approximation Markov decision process-based algorithm v-support vector regression

来源：评论

学校读者我要写书评

暂无评论

The True Online Continuous Learning Automation (TOCLA) in a continuous control benchmarking of actor-critic algorithms

The True Online Continuous Learning Automation (TOCLA) in a ...

引用

IEEE Symposium Series on Computational Intelligence (IEEE SSCI)

作者： Frost, Gordon Vallejo, Marta Heriot Watt Univ Sch Engn & Phys Sci Edinburgh Midlothian Scotland

ISBN: (纸本)9781728125473

Reinforcement learning problems are often discretised, use linear function approximation, or perform batch updates. However, many applications that can benefit from reinforcement learning contain continuous variables and are inherently non-linear, for example, the control of aerospace or maritime robotic vehicles. Recent work has brought focus onto online temporal difference methods, specifically for using non-linear function approximation. In this paper, we evaluate the Forward Actor-Critic against the regular Actor-Critic, and Continuous Actor-Critic Learning Automation. We also propose and evaluate a new algorithm called True Online Continuous Learning Automation (TOCLA) which combines these two approaches. The chosen benchmark problem was the MountainCarContinuous-v0 environment from OpenAI Gym, which represents a further step in complexity over the benchmark used to test the Forward Actor Critic in previous works. Our results demonstrate the superiority of TOCLA in terms of its sensitivity to hyper-parameter selection compared with the Forward Actor Critic, Continuous Actor-Critic Learning Automation, and Actor Critic algorithms.

关键词： Reinforcement Learning Actor-Critic TOCLA CACLA Forward Actor-Critic nonlinear function approximation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：