检索结果-内蒙古大学图书馆

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Vishnuteja Nanduri Department of Industrial & Manufacturing Engineering University of Wisconsin Milwaukee Milwaukee WI USA

Climate change is one of the most important challenges faced by the world this century. In the U.S., the electric power industry is the largest emitter of CO 2 , contributing to the climate crisis. Federal emissions control bills in the form of cap-and-trade programs are currently idling in the U.S. Congress. In the mean time, ten states in the northeastern U.S. have adopted a regional cap-and-trade program to reduce CO 2 levels and also to increase investments in cleaner technologies. Many of the states in which the cap-and-trade programs are active operate under a restructured market paradigm, where generators compete to supply power. This research presents a bi-level game-theoretic model to capture competition between generators in cap-and-trade markets and restructured electricity markets. The solution to the game-theoretic model is obtained using a reinforcement learning based algorithm.

关键词： Generators Electricity supply industry Games Electricity Companies Meteorology Power systems

来源：评论

学校读者我要写书评

暂无评论

Bias-corrected Q-learning to control max-operator bias in Q-learning

Bias-corrected Q-learning to control max-operator bias in Q-...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Donghun Lee Boris Defourny Warren B. Powell Department of Computer Science Princeton University Princeton NJ USA Operations Research and Financial Engineering Princeton University Princeton NJ USA

We identify a class of stochastic control problems with highly random rewards and high discount factor which induce high levels of statistical error in the estimated action-value function. This produces significant levels of max-operator bias in Q-learning, which can induce the algorithm to diverge for millions of iterations. We present a bias-corrected Q-learning algorithm with asymptotically unbiased resistance against the max-operator bias, and show that the algorithm asymptotically converges to the optimal policy, as Q-learning does. We show experimentally that bias-corrected Q-learning performs well in a domain with highly random rewards where Q-learning and other related algorithms suffer from the max-operator bias.

关键词： Convergence Reactive power Random variables Standards dynamic programming learning (artificial intelligence) Educational institutions

来源：评论

学校读者我要写书评

暂无评论

A novel approach for constructing basis functions in approximate dynamic programming for feedback control

A novel approach for constructing basis functions in approxi...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Jian Wang Zhenhua Huang Xin Xu College of Mechatronics and Automation National University of Defense Tech Changsha P. R. China

This paper presents a novel approach for constructing basis functions in approximate dynamic programming (ADP) through the locally linear embedding (LLE) process. It considers the experience (sample) data as a high-dimensional space and the basis functions to be solved as a low-dimensional space. Through mapping the high-dimensional data into a single global coordinate system of lower dimensionality, the solved basis functions in low-dimensional space have the property that nearby experience data in the high dimensional space remain nearby and similarly co-located with respect to one in the low dimensional space. Thus, the obtained basis functions can precisely approximate the real value/action-value function. The simulation results show that the basis functions obtained by LLE can represent the final policy with a higher precision.

关键词： learning (artificial intelligence) Function approximation dynamic programming Equations Linear approximation Vectors

来源：评论

学校读者我要写书评

暂无评论

Online adaptive learning of optimal control solutions using integral reinforcement learning

Online adaptive learning of optimal control solutions using ...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Kyriakos G. Vamvoudakis Draguna Vrabie Frank L. Lewis Automation and Robotics Research Institute University of Texas Arlington Fort Worth TX USA

In this paper we introduce an online algorithm that uses integral reinforcement knowledge for learning the continuous-time optimal control solution for nonlinear systems with infinite horizon costs and partial knowledge of the system dynamics. This algorithm is a data based approach to the solution of the Hamilton-Jacobi-Bellman equation and it does not require explicit knowledge on the system's drift dynamics. The adaptive algorithm is based on policy iteration, and it is implemented on an actor/critic structure. Both actor and critic neural networks are adapted simultaneously a persistence of excitation condition is required to guarantee convergence of the critic to the actual optimal value function. Novel tuning algorithms are given for both critic and actor networks, with extra terms in the actor tuning law being required to guarantee closed-loop dynamical stability. The convergence to the optimal controller is proven, and stability of the system is also guaranteed. Simulation examples support the theoretical result.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Scalarized multi-objective reinforcement learning: Novel design techniques

Scalarized multi-objective reinforcement learning: Novel des...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Kristof Van Moffaert Madalina M. Drugan Ann Nowé Department of Computer Science Vrije Universiteit Brussel Brussels Belgium

In multi-objective problems, it is key to find compromising solutions that balance different objectives. The linear scalarization function is often utilized to translate the multi-objective nature of a problem into a standard, single-objective problem. Generally, it is noted that such as linear combination can only find solutions in convex areas of the Pareto front, therefore making the method inapplicable in situations where the shape of the front is not known beforehand, as is often the case. We propose a non-linear scalarization function, called the Chebyshev scalarization function, as a basis for action selection strategies in multi-objective reinforcement learning. The Chebyshev scalarization method overcomes the flaws of the linear scalarization function as it can (i) discover Pareto optimal solutions regardless of the shape of the front, i.e. convex as well as non-convex , (ii) obtain a better spread amongst the set of Pareto optimal solutions and (iii) is not particularly dependent on the actual weights used.

关键词： Pareto optimization learning (artificial intelligence) Chebyshev approximation Benchmark testing Measurement Shape

来源：评论

学校读者我要写书评

暂无评论

A combined hierarchical reinforcement learning based approach for multi-robot cooperative target searching in complex unknown environments

A combined hierarchical reinforcement learning based approac...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Yifan Cai Simon X. Yang Xin Xu The School of Engineering University of Guelph Guelph Ontario Canada The College of Mechatronics and Automation National University of Defense Technology Changsha Hunan Province China

Effective cooperation of multi-robots in unknown environments is essential in many robotic applications, such as environment exploration and target searching. In this paper, a combined hierarchical reinforcement learning approach, together with a designed cooperation strategy, is proposed for the real-time cooperation of multi-robots in completely unknown environments. Unlike other algorithms that need an explicit environment model or select parameters by trial and error, the proposed cooperation method obtains all the required parameters automatically through learning. By integrating segmental options with the traditional MAXQ algorithm, the cooperation hierarchy is built. In new tasks, the designed cooperation method can control the multi-robot system to complete the task effectively. The simulation results demonstrate that the proposed scheme is able to effectively and efficiently lead a team of robots to cooperatively accomplish target searching tasks in completely unknown environments.

关键词： Robot kinematics learning (artificial intelligence) Real-time systems Algorithm design and analysis dynamic programming Robot sensing systems

来源：评论

学校读者我要写书评

暂无评论

adaptive optimal control for nonlinear discrete-time systems

Adaptive optimal control for nonlinear discrete-time systems

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Chunbin Qin Huaguang Zhang Yanhong Luo School of Information Science and Engineering Northeastern University Shenyang China Basic Experiment Teaching Center Henan University Kaifeng China

This paper proposes an on-line near-optimal control scheme based on capabilities of neural networks (NNs), in function approximation, to attain the on-line solution of optimal control problem for nonlinear discrete-time systems. First, to solve the Hamilton-Jacobi-Bellman (HJB) equation forward-in-time appearing in the optimal control problem, two neural networks are used to approximate the cost function and to compute the optimal control policy, respectively. And then, according to the Bellman's optimality principle and the adaptive technology, the on-line weight updating laws for the critic network and action network are derived, respectively. Further, considering NNs approximative errors, the stability analysis of the closed-loop system is demonstrated by Lyapunov theory. At last, a numerical example is provided to demonstrate the effectiveness of the proposed method.

关键词： Artificial neural networks Equations Optimal control Mathematical model dynamic programming Approximation methods Discrete-time systems

来源：评论

学校读者我要写书评

暂无评论

Impact of signal transmission delays on power system damping control using heuristic dynamic programming

Impact of signal transmission delays on power system damping...

引用

2014 ieee symposium Series on Computational Intelligence, ieee SSCI 2014 - 2014 ieee symposium on Computational Intelligence Applications in Smart Grid, CIASG 2014

作者： Tang, Yufei Zhong, Xiangnan Ni, Zhen Yan, Jun He, Haibo Department of Electrical Computer and Biomedical Engineering University of Rhode Island KingstonRI02881 United States

ISBN: (纸本)9781479945474

In this paper, the impact of signal transmission delays on static VAR compensator (SVC) based power system damping control using reinforcement learning is investigated. The SVC is used to damp low-frequency oscillation between interconnected power systems under fault conditions, where measured signals from remote areas are first collected and then transmitted to the controller as the inputs. Inevitable signal transmission delays are introduced into such design that will degrade the dynamic performance of SVC and in the worst case, cause system instability. The adopted reinforcement learning algorithm, called goal representation heuristic dynamic programming (GrHDP), is employed to design the SVC controller. Impact of signal transmission delays on the adopted controller is investigated with fully transient model based time-domain simulation in Matlab/Simulink environment. The simulation results on a four-machine two-area benchmark system with SVC demonstrate the effectiveness of the adopted algorithm on damping control and the impact of signal transmission delays. © 2014 ieee.

关键词： Controllers

来源：评论

学校读者我要写书评

暂无评论

Decentralized Stabilization for a Class of Continuous-Time Nonlinear Interconnected Systems Using Online learning Optimal Control Approach

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2014年第2期25卷 418-428页

作者： Liu, Derong Wang, Ding Li, Hongliang Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

In this paper, using a neural-network-based online learning optimal control approach, a novel decentralized control strategy is developed to stabilize a class of continuous-time nonlinear interconnected large-scale systems. First, optimal controllers of the isolated subsystems are designed with cost functions reflecting the bounds of interconnections. Then, it is proven that the decentralized control strategy of the overall system can be established by adding appropriate feedback gains to the optimal control policies of the isolated subsystems. Next, an online policy iteration algorithm is presented to solve the Hamilton-Jacobi-Bellman equations related to the optimal control problem. Through constructing a set of critic neural networks, the cost functions can be obtained approximately, followed by the control policies. Furthermore, the dynamics of the estimation errors of the critic networks are verified to be uniformly and ultimately bounded. Finally, a simulation example is provided to illustrate the effectiveness of the present decentralized control scheme.

关键词： adaptive dynamic programming decentralized control large-scale systems neural networks nonlinear interconnected systems optimal control policy iteration reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Approximate reinforcement learning: An overview

Approximate reinforcement learning: An overview

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Lucian Buşoniu Damien Ernst Bart De Schutter Robert Babuška Delft Center of Systems & Control Delft University of Technnology Netherlands FRS-FNRS Systems and Modeling Unit University of Liège Belgium

reinforcement learning (RL) allows agents to learn how to optimally interact with complex environments. Fueled by recent advances in approximation-based algorithms, RL has obtained impressive successes in robotics, artificial intelligence, control, operations research, etc. However, the scarcity of survey papers about approximate RL makes it difficult for newcomers to grasp this intricate field. With the present overview, we take a step toward alleviating this situation. We review methods for approximate RL, starting from their dynamic programming roots and organizing them into three major classes: approximate value iteration, policy iteration, and policy search. Each class is subdivided into representative categories, highlighting among others offline and online algorithms, policy gradient methods, and simulation-based techniques. We also compare the different categories of methods, and outline possible ways to enhance the reviewed algorithms.

关键词： Approximation algorithms Equations Function approximation Trajectory Markov processes Mathematical model

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：