In this paper, we present two solutions for achieving the optimal control of PHEVs on short trips. We prove, mathematically, that a greedy control policy is optimal for those short trips where the battery State-of-Cha...
详细信息
In this paper, we present two solutions for achieving the optimal control of PHEVs on short trips. We prove, mathematically, that a greedy control policy is optimal for those short trips where the battery State-of-Charge (SoC) will not drop below its minimum threshold level. A closed-form greedy control solution is derived from the PHEV powertrain model. Furthermore, we provide a Q-learning based approach which has the capability of in-vehicle learning and is model-free. Our algorithm, combining the Neuro-dynamicprogramming (NDP) with estimated future trip information, can robustly converge to the optimal policy on both fixed and randomly selected drive cycles.
This paper presents the development of an intelligent dynamic energy management system (I-DEMS) for a smart microgrid. An evolutionary adaptivedynamicprogramming and reinforcementlearning framework is introduced fo...
详细信息
This paper presents the development of an intelligent dynamic energy management system (I-DEMS) for a smart microgrid. An evolutionary adaptivedynamicprogramming and reinforcementlearning framework is introduced for evolving the I-DEMS online. The I-DEMS is an optimal or near-optimal DEMS capable of performing grid-connected and islanded microgrid operations. The primary sources of energy are sustainable, green, and environmentally friendly renewable energy systems (RESs), e.g., wind and solar;however, these forms of energy are uncertain and nondispatchable. Backup battery energy storage and thermal generation were used to overcome these challenges. Using the I-DEMS to schedule dispatches allowed the RESs and energy storage devices to be utilized to their maximum in order to supply the critical load at all times. Based on the microgrid's system states, the I-DEMS generates energy dispatch control signals, while a forward-looking network evaluates the dispatched control signals over time. Typical results are presented for varying generation and load profiles, and the performance of I-DEMS is compared with that of a decision tree approach-based DEMS (D-DEMS). The robust performance of the I-DEMS was illustrated by examining microgrid operations under different battery energy storage conditions.
A method for hybridizing supervised learning with adaptivedynamicprogramming was developed to increase the speed, quality, and robustness of on-line neural network learning from an imperfect teacher. reinforcement l...
详细信息
ISBN:
(纸本)9781509038473
A method for hybridizing supervised learning with adaptivedynamicprogramming was developed to increase the speed, quality, and robustness of on-line neural network learning from an imperfect teacher. reinforcementlearning is used to modify and enhance the original supervisory signal before learning occurs. This paper describes the method of hybridization and presents a model problem in which a human supervisor teaches a simulated car to drive around a race track. Simulation results show successful learning and improvements in convergence time, error rate, and stability over either component method alone.
A reinforcementlearning-based adaptive energy management (RLAEM) is proposed for a hybrid electric tracked vehicle (HETV) in this paper. A control oriented model of the HETV is first established, in which the state-o...
详细信息
A reinforcementlearning-based adaptive energy management (RLAEM) is proposed for a hybrid electric tracked vehicle (HETV) in this paper. A control oriented model of the HETV is first established, in which the state-of-charge (SOC) of battery and the speed of generator are the state variables, and the engine's torque is the control variable. Subsequently, a transition probability matrix is learned from a specific driving schedule of the HETV. The proposed RLAEM decides appropriate power split between the battery and engine-generator set (EGS) to minimize the fuel consumption over different driving schedules. With the RLAEM, not only is driver's power requirement guaranteed, but also the fuel economy is improved as well. Finally, the RLAEM is compared with the stochastic dynamicprogramming (SDP)-based energy management for different driving schedules. The simulation results demonstrate the adaptability, optimality, and learning ability of the RLAEM and its capacity of reducing the computation time.
This paper is concerned with a novel generalized policy iteration algorithm for solving optimal control problems for discrete-time nonlinear systems. The idea is to use an iterative adaptivedynamicprogramming algori...
详细信息
This paper is concerned with a novel generalized policy iteration algorithm for solving optimal control problems for discrete-time nonlinear systems. The idea is to use an iterative adaptivedynamicprogramming algorithm to obtain iterative control laws which make the iterative value functions converge to the optimum. Initialized by an admissible control law, it is shown that the iterative value functions are monotonically nonincreasing and converge to the optimal solution of Hamilton-Jacobi-Bellman equation, under the assumption that a perfect function approximation is employed. The admissibility property is analyzed, which shows that any of the iterative control laws can stabilize the nonlinear system. Neural networks are utilized to implement the generalized policy iteration algorithm, by approximating the iterative value function and computing the iterative control law, respectively, to achieve approximate optimal control. Finally, numerical examples are presented to verify the effectiveness of the present generalized policy iteration algorithm.
Model-based dual heuristic dynamicprogramming (MB-DHP) is a popular approach in approximating optimal solutions in control problems. Yet, it usually requires offline training for the model network, and thus resulting...
详细信息
Model-based dual heuristic dynamicprogramming (MB-DHP) is a popular approach in approximating optimal solutions in control problems. Yet, it usually requires offline training for the model network, and thus resulting in extra computational cost. In this brief, we propose a model-free DHP (MF-DHP) design based on finite-difference technique. In particular, we adopt multilayer perceptron with one hidden layer for both the action and the critic networks design, and use delayed objective functions to train both the action and the critic networks online over time. We test both the MF-DHP and MB-DHP approaches with a discrete time example and a continuous time example under the same parameter settings. Our simulation results demonstrate that the MF-DHP approach can obtain a control performance competitive with that of the traditional MB-DHP approach while requiring less computational resources.
This paper presents an adaptive and intelligent power control approach for microgrid systems in the gridconnected operation mode. The proposed critic-based adaptive control system contains a neuro-fuzzy controller and...
详细信息
This paper presents an adaptive and intelligent power control approach for microgrid systems in the gridconnected operation mode. The proposed critic-based adaptive control system contains a neuro-fuzzy controller and a fuzzy critic agent. The fuzzy critic agent employs a reinforcementlearning algorithm based on neuro-dynamicprogramming. The system feedback is made available to the critic agent's input as the controller's action in the previous state. The evaluation or reinforcement signal produced by the critic agent together with the back-propagation of error is then used for online tuning of the output layer weights of the neuro-fuzzy controller. The proposed controller shows superior results compared with the traditional PI control. The transient response time is significantly reduced, power oscillations are eliminated, and fast convergence is achieved. The simple design and improved dynamic behavior of the proposed controller make it a promising nominee for power control of microgrid systems.
In this paper,a novel partially model-free adaptivedynamicprogramming(ADP) algorithm is presented to solve online the nonzero-sum differential games of continuous-time linear systems with unknown drift ***,by using ...
详细信息
ISBN:
(纸本)9781467397155
In this paper,a novel partially model-free adaptivedynamicprogramming(ADP) algorithm is presented to solve online the nonzero-sum differential games of continuous-time linear systems with unknown drift ***,by using the integral reinforcementlearning technique,the partially model-free ADP algorithm is developed to solve online the set of coupled algebraic Riccati equation(ARE) underlying the game problem without the requirement of the complete knowledge of the system *** then,the convergence of the partially model-free ADP algorithm is proved by demonstrating that it is mathematically equivalent to the extended Kleiman's algorithm,previously proposed in the literature,that solves in an offline sense the set of coupled algebraic Riccati equation using the complete knowledge of the system ***,one example is given to demonstrate the efficiency of the proposed algorithm.
暂无评论