检索结果-内蒙古大学图书馆

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Haibin Duan Xiufen Yu School of Automation Science and Electrical Engineering Beihang University Beijing China Center for Space Science and Applied Research Chinese Academy and Sciences Beijing China

Ant colony optimization was originally presented under the inspiration during collective behavior study results on real ant system, and it has strong robustness and easy to combine with other methods in optimization. Although ant colony optimization for the heuristic solution of hard combinational optimization problems enjoy a rapidly growing popularity, but little research is conducted on the optimum configuration strategy for the adjustable parameters in the ant colony optimization, and the performance of ant colony optimization depends on the appropriate setting of parameters which requires both human experience and luck to some extend. Memetic algorithm is a population-based heuristic search approach which can be used to solve combinatorial optimization problem based on cultural evolution. Based on the introduction of these two meta-heuristic algorithms, a novel kind of adjustable parameters configuration strategy based on memetic algorithm is developed in this paper, and the feasibility and effectiveness of this approach are also verified through the famous traveling salesman problem (TSP). This hybrid approach is also valid for other types of combinational optimization problems

关键词： Ant colony optimization Traveling salesman problems Cultural differences Feedback Genetic algorithms Biological cells dynamic programming learning Automation Robustness

来源：评论

学校读者我要写书评

暂无评论

Clipping in Neurocontrol by adaptive dynamic programming

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2014年第10期25卷 1909-1920页

作者： Fairbank, Michael Prokhorov, Danil Alonso, Eduardo City Univ London Sch Informat Dept Comp Sci London EC1V OHB England Toyota Res Inst NA Ann Arbor MI 48105 USA

In adaptive dynamic programming, neurocontrol, and reinforcement learning, the objective is for an agent to learn to choose actions so as to minimize a total cost function. In this paper, we show that when discretized time is used to model the motion of the agent, it can be very important to do clipping on the motion of the agent in the final time step of the trajectory. By clipping, we mean that the final time step of the trajectory is to be truncated such that the agent stops exactly at the first terminal state reached, and no distance further. We demonstrate that when clipping is omitted, learning performance can fail to reach the optimum, and when clipping is done properly, learning performance can improve significantly. The clipping problem we describe affects algorithms that use explicit derivatives of the model functions of the environment to calculate a learning gradient. These include backpropagation through time for control and methods based on dual heuristic programming. However, the clipping problem does not significantly affect methods based on heuristic dynamic programming, temporal differences learning, or policy-gradient learning algorithms.

关键词： Backpropagation through time (BPTT) clipping dual heuristic programming (DHP) neurocontrol value-gradient learning

来源：评论

学校读者我要写书评

暂无评论

reinforcement-learning-based Magneto-hydrodynamic Control of Hypersonic Flows

Reinforcement-Learning-based Magneto-hydrodynamic Control of...

引用

ieee symposium on adaptive dynamic programming and reinforcement learning, (adprl)

作者： Nilesh V. Kulkarni Minh Q. Phan NASA Ames Research Center QSS Group Inc. Moffett Field CA USA Dartmouth College Hanover NH USA

ISBN: (纸本)1424407060

In this work, we design a policy-iteration-based Q-learning approach for on-line optimal control of ionized hypersonic flow at the inlet of a scramjet engine. Magneto-hydrodynamics (MHD) has been recently proposed as a means for flow control in various aerospace problems. This mechanism corresponds to applying external magnetic fields to ionized flows towards achieving desired flow behavior. The applications range from external flow control for producing forces and moments on the air-vehicle to internal flow control designs, which compress and extract electrical energy from the flow. The current work looks at the later problem of internal flow control. The baseline controller and Q-function parameterizations are derived from an off-line mixed predictive-control and dynamic-programming-based design. The nominal optimal neural network Q-function and controller are updated on-line to handle modeling errors in the off-line design. The on-line implementation investigates key concerns regarding the conservativeness of the update methods. Value-iteration-based update methods have been shown to converge in a probabilistic sense. However, simulations results illustrate that realistic implementations of these methods face significant training difficulties, often failing in learning the optimal controller on-line. The present approach, therefore, uses a policy-iteration-based update, which has time-based convergence guarantees. Given the special finite-horizon nature of the problem, three novel on-line update algorithms are proposed. These algorithms incorporate different mix of concepts, which include bootstrapping, and forward and backward dynamic programming update rules. Simulation results illustrate success of the proposed update algorithms in re-optimizing the performance of the MHD generator during system operation

关键词： Optimal control Engines Magnetohydrodynamics Aerospace control Magnetic fields Force control Control design Neural networks Error correction Convergence

来源：评论

学校读者我要写书评

暂无评论

Finite-Approximation-Error-Based Discrete-Time Iterative adaptive dynamic programming

引用

ieee TRANSACTIONS ON CYBERNETICS 2014年第12期44卷 2820-2833页

作者： Wei, Qinglai Wang, Fei-Yue Liu, Derong Yang, Xiong Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

In this paper, a new iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal control problems for infinite horizon discrete-time nonlinear systems with finite approximation errors. First, a new generalized value iteration algorithm of ADP is developed to make the iterative performance index function converge to the solution of the Hamilton-Jacobi-Bellman equation. The generalized value iteration algorithm permits an arbitrary positive semi-definite function to initialize it, which overcomes the disadvantage of traditional value iteration algorithms. When the iterative control law and iterative performance index function in each iteration cannot accurately be obtained, for the first time a new "design method of the convergence criteria" for the finite-approximation-error-based generalized value iteration algorithm is established. A suitable approximation error can be designed adaptively to make the iterative performance index function converge to a finite neighborhood of the optimal performance index function. Neural networks are used to implement the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the developed method.

关键词： adaptive critic designs adaptive dynamic programming (ADP) approximate dynamic programming approximation error neural networks neuro-dynamic programming nonlinear systems optimal control reinforcement learning value iteration

来源：评论

学校读者我要写书评

暂无评论

adaptive dynamic programming for terminally constrained finite-horizon optimal control problems

Adaptive dynamic programming for terminally constrained fini...

引用

ieee Annual Conference on Decision and Control

作者： L. Andrews J. R. Klotz R. Kamalapurkar W. E. Dixon Department of Mechanical and Aerospace Engineering University of Florida Gainesville FL USA

ISBN: (纸本)9781467360890

adaptive dynamic programming is applied to control-affine nonlinear systems with uncertain drift dynamics to obtain a near-optimal solution to a finite-horizon optimal control problem with hard terminal constraints. A reinforcement learning-based actor-critic framework is used to approximately solve the Hamilton-Jacobi-Bellman equation, wherein critic and actor neural networks (NN) are used for approximate learning of the optimal value function and control policy, while enforcing the optimality condition resulting from the hard terminal constraint. Concurrent learning-based update laws relax the restrictive persistence of excitation requirement. A Lyapunov-based stability analysis guarantees uniformly ultimately bounded convergence of the enacted control policy to the optimal control policy.

关键词： control strategy dynamic programming Optimal control critic Best value Self tuning

来源：评论

学校读者我要写书评

暂无评论

Impact of signal transmission delays on power system damping control using heuristic dynamic programming

Impact of signal transmission delays on power system damping...

引用

ieee symposium on Computational Intelligence Applications In Smart Grid (CIASG)

作者： Yufei Tang Xiangnan Zhong Zhen Ni Jun Yan Haibo He Department of Electrical University of Rhode Island Kingston RI USA

ISBN: (纸本)9781479945450

In this paper, the impact of signal transmission delays on static VAR compensator (SVC) based power system damping control using reinforcement learning is investigated. The SVC is used to damp low-frequency oscillation between interconnected power systems under fault conditions, where measured signals from remote areas are first collected and then transmitted to the controller as the inputs. Inevitable signal transmission delays are introduced into such design that will degrade the dynamic performance of SVC and in the worst case, cause system instability. The adopted reinforcement learning algorithm, called goal representation heuristic dynamic programming (GrHDP), is employed to design the SVC controller. Impact of signal transmission delays on the adopted controller is investigated with fully transient model based time-domain simulation in Matlab/Simulink environment. The simulation results on a four-machine two-area benchmark system with SVC demonstrate the effectiveness of the adopted algorithm on damping control and the impact of signal transmission delays.

关键词： Delays Static VAr compensators Power system stability Rotors Damping Benchmark testing Control systems

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning Output Feedback NN Control Using Deterministic learning Technique

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2014年第3期25卷 635-641页

作者： Xu, Bin Yang, Chenguang Shi, Zhongke Northwestern Polytech Univ Sch Automat Xian 710072 Peoples R China Univ Plymouth Sch Comp & Math Plymouth PL4 8AA Devon England Beijing Inst Technol Sch Automat Beijing 100086 Peoples R China

In this brief, a novel adaptive-critic-based neural network (NN) controller is investigated for nonlinear pure-feedback systems. The controller design is based on the transformed predictor form, and the actor-critic NN control architecture includes two NNs, whereas the critic NN is used to approximate the strategic utility function, and the action NN is employed to minimize both the strategic utility function and the tracking error. A deterministic learning technique has been employed to guarantee that the partial persistent excitation condition of internal states is satisfied during tracking control to a periodic reference orbit. The uniformly ultimate boundedness of closed-loop signals is shown via Lyapunov stability analysis. Simulation results are presented to demonstrate the effectiveness of the proposed control.

关键词： Approximate dynamic programming discrete-time system output feedback control pure-feedback system radial basis function neural network (RBF NN)

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：