The convergence properties for reinforcement learning approaches, such as temporal differences and Q-learning, have been established under moderate assumptions for discrete state and actionspaces. In practice, howeve...
详细信息
The convergence properties for reinforcement learning approaches, such as temporal differences and Q-learning, have been established under moderate assumptions for discrete state and actionspaces. In practice, however, many systems have either continuousactionspaces or a large number of discrete elements. This paper presents an approximate dynamic programming approach to reinforcement learning for continuousaction set-point regulator problems, which learns near-optimal control policies based on scalar performance measures. The continuous-actionspace (CAS) algorithm uses derivative-free line search methods to obtain the optimal action in the continuousspace. The theoretical convergence properties of the algorithm are presented. Several heuristic stopping criteria are investigated and practical application is illustrated by two example problems -the inverted pendulum balancing problem and the power system stabilization problem.
暂无评论