检索结果-内蒙古大学图书馆

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Schaefer, Anton Maximilian Udluft, Steffen Zimmermann, Hans-Georg Univ Ulm Dept Optimisat & Operat Res D-89069 Ulm Germany Corp Technol Seimens AG Dept Learning Syst Informat & Commun D-81739 Munich Germany

ISBN: (纸本)9781424407064

In this paper we introduce a new model-based approach for a data-efficient modelling and control of reinforcement learning problems in discrete time. Our architecture is based on a recurrent neural network (RNN) with dynamically consistent overshooting, which we extend by an additional control network. The latter has the particular task to learn the optimal policy. This approach has the advantage that by using a neural network we can easily deal with high-dimensions and consequently are able to break Bellman's curse of dimensionality. Further due to the high system-identification quality of RNN our method is highly data-efficient. Because of its properties we refer to our new model as recurrent control neural network (RCNN). The network is tested on a standard reinforcement learning problem, namely the cart-pole balancing, where it shows especially in terms of data-efficiency outstanding results.

关键词： Recurrent neural networks

来源：评论

学校读者我要写书评

暂无评论

ADP-Based Spacecraft Attitude Control Under Actuator Misalignment and Pointing Constraints

引用

ieee TRANSACTIONS ON INDUSTRIAL ELECTRONICS 2022年第9期69卷 9342-9352页

作者： Yang, Haoyang Hu, Qinglei Dong, Hongyang Zhao, Xiaowei Beihang Univ Sch Automat Sci & Elect Engn Beijing 100191 Peoples R China Univ Warwick Sch Engn Intelligent Control & Smart Energy ICSE Res Grp Coventry CV4 7AL W Midlands England

This article is devoted to real-time optimal attitude reorientation control of rigid spacecraft control. Particularly, two typical practical problems-actuator misalignment and forbidden pointing constraints are considered. Within the framework of adaptive dynamic programming (ADP), a novel constrained optimal attitude control scheme is *** this design, a special reward function is developed to characterize the environment feedback and deal with the pointing constraints. Notably, a novel argument term is introduced to the reward function for overcoming the inevitable difficulty in the actuator misalignment. By virtue of the Lyapunov stability theory, the ultimate boundedness of state error and the optimality of the proposed method can be guaranteed. Finally, the effectiveness and performance of the developed ADP-based controller are evaluated by not only numerical simulations but also experimental tests with a hardware-in-loop platform.

关键词： Actuators Space vehicles Attitude control Optimal control Torque Payloads Symmetric matrices Actuator misalignment adaptive dynamic programming (ADP) attitude control pointing constraints reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

reinforcement control via action dependent heuristic dynamic programming

Reinforcement control via action dependent heuristic dynamic...

引用

1997 ieee International Conference on Neural Networks (ICNN 97)

作者： Tang, KW Srikant, G Department of Electrical Engineering SUNY Stony Brook NY 11794-2350 United States

ISBN: (纸本)0780341236

Heuristic dynamic programming (HDP) is the simplest kind of adaptive Critic which is a powerful form of reinforcement control [1]. It can be used to maximize or minimize any utility function, such as total energy or trajectory error, of a system over time in a noisy environment. Unlike supervised learning, adaptive critic design does not require the desired control signals be known. Instead, feedback is obtained based on a critic network which learns the relationship between a set of control signals and the corresponding strategic utility function. It is an approximation of dynamic programming [2]. Action Dependent Heuristic dynamic Programing (ADHDP) system involves two subnetworks, the Action network and the Critic network. Each of these networks includes a feed forward and a feedback component. A flow chart for the interaction of these components is included. To further illustrate the algorithm, we use ADHDP for the control of a simple, 2-D planar robot.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Hamiltonian-Driven adaptive dynamic programming Based on Extreme learning Machine 14th

引用

14th International symposium on Neural Networks (ISNN)

作者： Yang, Yongliang Wunsch, Donald Guo, Zhishan Yin, Yixin Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China Missouri Univ Sci & Technol Dept Elect & Comp Engn Rolla MO 65409 USA Missouri Univ Sci & Technol Dept Comp Sci Rolla MO 65409 USA

ISBN: (纸本)9783319590721;9783319590714

In this paper, a novel frame work of reinforcement learning for continuous time dynamical system is presented based on the Hamiltonian functional and extreme learning machine. The idea of solution search in the optimization is introduced to find the optimal control policy in the optimal control problem. The optimal control search consists of three steps: evaluation, comparison and improvement of arbitrary admissible policy. The Hamiltonian functional plays an important role in the above framework, under which only one critic is required in the adaptive critic structure. The critic network is implemented by the extreme learning machine. Finally, simulation study is conducted to verify the effectiveness of the presented algorithm.

关键词： reinforcement learning adaptive dynamic programming Extreme learning machine Hamiltonian functional Optimization

来源：评论

学校读者我要写书评

暂无评论

Offline and Online adaptive Critic Control Designs With Stability Guarantee Through Value Iteration

引用

ieee TRANSACTIONS ON CYBERNETICS 2022年第12期52卷 13262-13274页

作者： Ha, Mingming Wang, Ding Liu, Derong Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China Beijing Univ Technol Fac Informat Technol Beijing 100124 Peoples R China Beijing Univ Technol Beijing Key Lab Computat Intelligence & Intellige Beijing 100124 Peoples R China Univ Illinois Dept Elect & Comp Engn Chicago IL 60607 USA

This article is concerned with the stability of the closed-loop system using various control policies generated by value iteration. Some stability properties involving admissibility criteria, the attraction domain, and so forth, are investigated. An offline integrated value iteration (VI) scheme with a stability guarantee is developed by combining the advantages of VI and policy iteration, which is convenient to obtain admissible control policies. Also, based on the concept of attraction domain, an online adaptive dynamic programming algorithm using immature control policies is developed. Remarkably, it is ensured that the state trajectory under the online algorithm converges to the origin. Particularly, for linear systems, the online ADP algorithm with a general scheme possesses more enhanced stability property. The theoretical results reveal that the stability of the linear system can be guaranteed even if the control policy sequence includes finite unstable elements. The numerical results verify the effectiveness of the present algorithms.

关键词： Stability criteria Asymptotic stability Numerical stability Power system stability Heuristic algorithms Cost function Trajectory adaptive dynamic programming asymptotic stability online adaptive critic control policy iteration (PI) reinforcement learning (RL) value iteration (VI)

来源：评论

学校读者我要写书评

暂无评论

Coordinated reinforcement learning for decentralized optimal control

Coordinated reinforcement learning for decentralized optimal...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Yagan, Daniel Tharn, Chen-Khong Natl Univ Singapore Dept Elect & Comp Engn Singapore 117548 Singapore

ISBN: (纸本)9781424407064

We consider a multi-agent system where the overall performance is affected by the joint actions or policies of agents. However, each agent only observes a partial view of the global state condition. This model is known as a Decentralized Partially-Observable Markov Decision Process (DEC-POMDP), which can be considered more applicable in real-world applications such as communication networks. It is known that the exact solution to a DEC-POMDP is NEXP-complete and memory requirements grow exponentially even for finite-horizon problems. In this paper, we propose to address these issues by using an online model-free technique and by exploiting the locality of interaction among agents in order to approximate the joint optimal policy. Simulation results show the effectiveness and convergence of the proposed algorithm in the context of resource allocation for multi-agent wireless multihop networks.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Continuous-Time reinforcement learning Control: A Review of Theoretical Results, Insights on Performance, and Needs for New Designs

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2024年第8期35卷 10199-10219页

作者： Wallace, Brent A. Si, Jennie Arizona State Univ Dept Elect Comp & Energy Engn Tempe AZ 85287 USA

This exposition discusses continuous-time reinforcement learning (CT-RL) for the control of affine nonlinear systems. We review four seminal methods that are the centerpieces of the most recent results on CT-RL control. We survey the theoretical results of the four methods, highlighting their fundamental importance and successes by including discussions on problem formulation, key assumptions, algorithm procedures, and theoretical guarantees. Subsequently, we evaluate the performance of the control designs to provide analyses and insights on the feasibility of these design methods for applications from a control designer's point of view. Through systematic evaluations, we point out when theory diverges from practical controller synthesis. We, furthermore, introduce a new quantitative analytical framework to diagnose the observed discrepancies. Based on the analyses and the insights gained through quantitative evaluations, we point out potential future research directions to unleash the potential of CT-RL control algorithms in addressing the identified challenges.

关键词： Optimal control Mathematical models Heuristic algorithms Convergence Tuning Recurrent neural networks Power system stability adaptive approximate dynamic programming (ADP) continuous-time (CT) optimal control policy iteration (PI) reinforcement learning (RL) value iteration (VI)

来源：评论

学校读者我要写书评

暂无评论

A3DQN: adaptive Anderson Acceleration for Deep Q-Networks

A3DQN: Adaptive Anderson Acceleration for Deep Q-Networks

引用

ieee symposium Series on Computational Intelligence (ieee SSCI)

作者： Ermis, Melike Yang, Insoon Seoul Natl Univ Automat & Syst Res Inst Dept Elect & Comp Engn Seoul 08826 South Korea

ISBN: (纸本)9781728125473

reinforcement learning (RL) has been used for an agent to learn efficient decision-making strategies through its interactions with an environment. However, slow convergence and sample inefficiency of RL algorithms make them impractical for complex real-world problems. In this paper, we present an acceleration scheme, called Anderson acceleration (AA), for RL, where the value function in the next iteration is calculated using a linear combination of value functions in the previous iterations. Since the original AA method suffers from instability, we consider adaptive Anderson acceleration (A3) as a stabilized variant of AA, which contains both adaptive regularization to handle instability and safeguarding to enhance performance. We first apply A3 to value iteration for Q-functions and show its convergence property. To extend the idea of A3 to model-free deep RL, we devise a simple variant of deep Q-networks (DQN). Our experiments on the Atari 2600 benchmark demonstrate that the proposed method outperforms double DQN in terms of both final performance and learning speed.

关键词： reinforcement learning dynamic programming Markov decision processes Optimization

来源：评论

学校读者我要写书评

暂无评论

adaptive Configuration with Deep reinforcement learning in Software-Defined Time-Sensitive Networking

Adaptive Configuration with Deep Reinforcement Learning in S...

引用

ieee/IFIP Network Operations and Management symposium (NOMS)

作者： Guo, Mengjie Shou, Guochu Liu, Yaqiong Hu, Yihong Beijing Univ Posts & Telecommun Sch Informat & Commun Engn Beijing Peoples R China

ISBN: (纸本)9798350327939;9798350327946

Time-sensitive networking (TSN) is very appealing to industrial networks due to its support for deterministic transmission based on Ethernet. The implementation of determinism typically demands for precise configuration on each output port of a TSN switch, which is complex and time-consuming. Moreover, many emerging industrial applications bring dynamic scenarios (e.g., in real-time Internet of Things), thus the configurations should change adaptively as application requirements change to provide continued determinism. In this paper, we propose a deep reinforcement learning (DRL) based adaptive configuration scheme in Software-defined time-sensitive networking (SD-TSN). The SD-TSN is a network architecture that integrates the determinism guarantees of TSN and flexible network management of software-defined networking (SDN). Based on the capability of SD-TSN, the proposed configuration scheme exploits DRL to learn from interacting with the environment for adaptive configuration. Experimental results demonstrate the effectiveness of our scheme in dynamic scenarios.

关键词： Software-defined time-sensitive networking (SD-TSN) deep reinforcement learning (DRL) adaptive configuration

来源：评论

学校读者我要写书评

暂无评论

Robust dynamic programming for discounted infinite-horizon Markov decision processes with uncertain stationary transition matrice

Robust dynamic programming for discounted infinite-horizon M...

引用

ieee International symposium on Approximate dynamic programming and reinforcement learning

作者： Li, Baohua Si, Jennie Arizona State Univ Dept Elect Engn Tempe AZ 85287 USA

ISBN: (纸本)9781424407064

In this paper, finite-state, Saite-action, discounted infinite-horizon-cost Markov decision processes (MDPs) with uncertain stationary transition matrices are discussed in the deterministic policy space. Uncertain stationary parametric transition matrices are clearly classified into independent and correlated cases. It is pointed out in this paper that the optimahty criterion of uniform minimization of the maximum expected total discounted cost functions for all initial states, or robust uniform optimality criterion, is not appropriate for solving MDPs with correlated transition matrices. A new optimahty criterion of minimizing the maximum quadratic total value function is proposed which includes the previous criterion as a special case. Based on the new optimality criterion, robust policy iteration is developed to compute an optimal policy in the deterministic stationary policy space. Under some assumptions, the solution is guaranteed to be optimal or near-optimal in the deterministic policy space.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：