Testing inverter-based resources (IBRs) is of utmost importance. This paper proposes a novel power hardware-in-the-loop (PHIL) interface control (PHIL-IC) employing a reinforcement-learning approach based on adaptive ...
详细信息
Testing inverter-based resources (IBRs) is of utmost importance. This paper proposes a novel power hardware-in-the-loop (PHIL) interface control (PHIL-IC) employing a reinforcement-learning approach based on adaptive dynamicprogramming (ADP, also known as approximatedynamicprogramming) to enhance the PHIL-simulation-based testing of IBRs by virtue of an ADP-based method. It deploys output feedback control because of "unavailable" or "uncertain" dynamics of the entire systems (states and disturbances) linked to IBRs, power amplifiers, all the components associated with the PHIL-simulation-based testing, and their delays;it optimally designs PHIL-IC while considering all uncertainties and unavailable information about all the systems involved. To this end, the proposed ADP-based PHIL-IC utilizes a new hybrid iteration (HI) method, which differs from the traditional ADP strategies;compared with the policy iteration method, the HI algorithm does not require prior knowledge of an admissible control policy. Moreover, with a quadratic rate of convergence, the proposed HI method converges much faster than the value iteration method. Therefore, the proposed HI method saves significant learning time and iterations compared to the value iteration method. Comparing the results of the PHIL-simulation-based testing utilizing the proposed method with those of the proportional-resonant controller (as the conventional PHIL-IC) and the robust PHIL-IC based on mu synthesis (as the current state-of-the-art PHIL-IC) reveals the effectiveness and practicality of the proposed method. Those comparative results are generated by the ideal transformer model (also known as voltage-type interface) commonly used in the PHIL-simulation-based testing and practical cases of the Thevenin equivalent impedance (resistive, resistive-inductive, and inductive ones) of the model of interest associated with the power networks.
To fully integrate robots into household settings, they must be capable of autonomously planning and executing diverse tasks. However, task and motion planning for multistep manipulation tasks remains an open challeng...
详细信息
The proceedings contain 49 papers. The topics discussed include: fitted Q iteration with CMACs;reinforcement-learning-based magneto-hydrodynamic control hypersonic flows;a novel fuzzy reinforcementlearning approach i...
详细信息
ISBN:
(纸本)1424407060
The proceedings contain 49 papers. The topics discussed include: fitted Q iteration with CMACs;reinforcement-learning-based magneto-hydrodynamic control hypersonic flows;a novel fuzzy reinforcementlearning approach in two-level intelligent control of 3-DOF robot manipulators;knowledge transfer using local features;particle swarm optimization adaptive dynamicprogramming;discrete-time nonlinear HJB solution using approximation dynamicprogramming: convergence proof;dual representations for dynamicprogramming and reinforcementlearning;an optimal ADP algorithm for a high-dimensional stochastic control problem;convergence of model-based temporal difference learning for control;the effect of bootstrapping in multi-automata reinforcementlearning;and a theoretical analysis of cooperative behavior in multi-agent Q-learning.
This paper describes backpropagation through an LSTM recurrent neural network model/critic, for reinforcementlearning tasks in partially observable domains. This combines the advantage of LSTM's strength at learn...
详细信息
ISBN:
(纸本)9781424407064
This paper describes backpropagation through an LSTM recurrent neural network model/critic, for reinforcementlearning tasks in partially observable domains. This combines the advantage of LSTM's strength at learning long-term temporal dependencies to infer states in partially observable tasks, with the advantage of being able to learn high-dimensional and/or continuous actions with backpropagation's focused credit assignment mechanism.
This paper proposes an approximatedynamicprogramming strategy for responsive traffic signal control. It is the first attempt that optimizes signal control objective dynamically through adaptive approximation of valu...
详细信息
ISBN:
(纸本)9781424407064
This paper proposes an approximatedynamicprogramming strategy for responsive traffic signal control. It is the first attempt that optimizes signal control objective dynamically through adaptive approximation of value function. The proposed value function approximation is separable and exogenous factor independent. The algorithm updates the approximated value function progressively in operation, while preserving the structural property of the control problem. The convergence and performance of the algorithm have been tested in a range of experiments. It has been concluded that the new strategy is as good as the best existing control strategies while being efficient and simple in computation. It also has the potential of being extended to multi-phase signal control at isolate junction and to decentralized network operation.
We are interested in finding the most effective combination between off-line and on-line/real-time training in approximatedynamicprogramming. We introduce our approach of combining proven off-line methods of trainin...
详细信息
ISBN:
(纸本)9781424407064
We are interested in finding the most effective combination between off-line and on-line/real-time training in approximatedynamicprogramming. We introduce our approach of combining proven off-line methods of training for robustness with a group of on-line methods. Training for robustness is carried out on reasonably accurate models with the multi- stream Kalman filter method [1], whereas on-line adaptation is performed either with the help of a critic or by methods resembling reinforcementlearning. We also illustrate importance of using recurrent neural networks for both controller/actor and critic.
We describe an approach towards reducing the curse of dimensionality for deterministic dynamicprogramming with continuous actions by randomly sampling actions while computing a steady state value function and policy....
详细信息
ISBN:
(纸本)9781424407064
We describe an approach towards reducing the curse of dimensionality for deterministic dynamicprogramming with continuous actions by randomly sampling actions while computing a steady state value function and policy. This approach results in globally optimized actions, without searching over a discretized multidimensional grid. We present results on finding time invariant control laws for two, four, and six dimensional deterministic swing up problems with up to 480 million discretized states.
We define a new type of policy, the knowledge gradient policy, in the context of an offline learning problem. We show how to compute the knowledge gradient policy efficiently and demonstrate through Monte Carlo simula...
详细信息
ISBN:
(纸本)9781424407064
We define a new type of policy, the knowledge gradient policy, in the context of an offline learning problem. We show how to compute the knowledge gradient policy efficiently and demonstrate through Monte Carlo simulations that it performs as well or better than a number of existing learning policies.
We investigate the dual approach to dynamicprogramming and reinforcementlearning, based on maintaining an explicit representation of stationary distributions as opposed to value functions. A significant advantage of...
详细信息
ISBN:
(纸本)9781424407064
We investigate the dual approach to dynamicprogramming and reinforcementlearning, based on maintaining an explicit representation of stationary distributions as opposed to value functions. A significant advantage of the dual approach is that it allows one to exploit well developed techniques for representing, approximating and estimating probability distribu tions, without running the risks associated with divergent value function estimation. A second advantage is that some distinct algorithms for the average reward and discounted reward case in the primal become unified under the dual. In this paper, we present a modified dual of the standard linear program that guarantees a globally normalized state visit distribution is obtained. With this reformulation, we then derive novel dual forms of dynamicprogramming, including policy evaluation, policy iteration and value iteration. Moreover, we derive dual formulations of temporal difference learning to obtain new forms of Sarsa and Q-learning. Finally, we scale these techniques up to large domains by introducing approximation, and develop new approximate off-policy learning algorithms that avoid the divergence problems associated with the primal approach. We show that the dual view yields a viable alternative to standard value function based techniques and opens new avenues for solving dynamicprogramming and reinforcementlearning problems.
暂无评论