Ant colony optimization was originally presented under the inspiration during collective behavior study results on real ant system, and it has strong robustness and easy to combine with other methods in optimization. ...
详细信息
Ant colony optimization was originally presented under the inspiration during collective behavior study results on real ant system, and it has strong robustness and easy to combine with other methods in optimization. Although ant colony optimization for the heuristic solution of hard combinational optimization problems enjoy a rapidly growing popularity, but little research is conducted on the optimum configuration strategy for the adjustable parameters in the ant colony optimization, and the performance of ant colony optimization depends on the appropriate setting of parameters which requires both human experience and luck to some extend. Memetic algorithm is a population-based heuristic search approach which can be used to solve combinatorial optimization problem based on cultural evolution. Based on the introduction of these two meta-heuristic algorithms, a novel kind of adjustable parameters configuration strategy based on memetic algorithm is developed in this paper, and the feasibility and effectiveness of this approach are also verified through the famous traveling salesman problem (TSP). This hybrid approach is also valid for other types of combinational optimization problems
In adaptivedynamicprogramming, neurocontrol, and reinforcementlearning, the objective is for an agent to learn to choose actions so as to minimize a total cost function. In this paper, we show that when discretized...
详细信息
In adaptivedynamicprogramming, neurocontrol, and reinforcementlearning, the objective is for an agent to learn to choose actions so as to minimize a total cost function. In this paper, we show that when discretized time is used to model the motion of the agent, it can be very important to do clipping on the motion of the agent in the final time step of the trajectory. By clipping, we mean that the final time step of the trajectory is to be truncated such that the agent stops exactly at the first terminal state reached, and no distance further. We demonstrate that when clipping is omitted, learning performance can fail to reach the optimum, and when clipping is done properly, learning performance can improve significantly. The clipping problem we describe affects algorithms that use explicit derivatives of the model functions of the environment to calculate a learning gradient. These include backpropagation through time for control and methods based on dual heuristic programming. However, the clipping problem does not significantly affect methods based on heuristic dynamicprogramming, temporal differences learning, or policy-gradient learning algorithms.
In this work, we design a policy-iteration-based Q-learning approach for on-line optimal control of ionized hypersonic flow at the inlet of a scramjet engine. Magneto-hydrodynamics (MHD) has been recently proposed as ...
详细信息
ISBN:
(纸本)1424407060
In this work, we design a policy-iteration-based Q-learning approach for on-line optimal control of ionized hypersonic flow at the inlet of a scramjet engine. Magneto-hydrodynamics (MHD) has been recently proposed as a means for flow control in various aerospace problems. This mechanism corresponds to applying external magnetic fields to ionized flows towards achieving desired flow behavior. The applications range from external flow control for producing forces and moments on the air-vehicle to internal flow control designs, which compress and extract electrical energy from the flow. The current work looks at the later problem of internal flow control. The baseline controller and Q-function parameterizations are derived from an off-line mixed predictive-control and dynamic-programming-based design. The nominal optimal neural network Q-function and controller are updated on-line to handle modeling errors in the off-line design. The on-line implementation investigates key concerns regarding the conservativeness of the update methods. Value-iteration-based update methods have been shown to converge in a probabilistic sense. However, simulations results illustrate that realistic implementations of these methods face significant training difficulties, often failing in learning the optimal controller on-line. The present approach, therefore, uses a policy-iteration-based update, which has time-based convergence guarantees. Given the special finite-horizon nature of the problem, three novel on-line update algorithms are proposed. These algorithms incorporate different mix of concepts, which include bootstrapping, and forward and backward dynamicprogramming update rules. Simulation results illustrate success of the proposed update algorithms in re-optimizing the performance of the MHD generator during system operation
In this paper, a new iterative adaptivedynamicprogramming (ADP) algorithm is developed to solve optimal control problems for infinite horizon discrete-time nonlinear systems with finite approximation errors. First, ...
详细信息
In this paper, a new iterative adaptivedynamicprogramming (ADP) algorithm is developed to solve optimal control problems for infinite horizon discrete-time nonlinear systems with finite approximation errors. First, a new generalized value iteration algorithm of ADP is developed to make the iterative performance index function converge to the solution of the Hamilton-Jacobi-Bellman equation. The generalized value iteration algorithm permits an arbitrary positive semi-definite function to initialize it, which overcomes the disadvantage of traditional value iteration algorithms. When the iterative control law and iterative performance index function in each iteration cannot accurately be obtained, for the first time a new "design method of the convergence criteria" for the finite-approximation-error-based generalized value iteration algorithm is established. A suitable approximation error can be designed adaptively to make the iterative performance index function converge to a finite neighborhood of the optimal performance index function. Neural networks are used to implement the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the developed method.
adaptivedynamicprogramming is applied to control-affine nonlinear systems with uncertain drift dynamics to obtain a near-optimal solution to a finite-horizon optimal control problem with hard terminal constraints. A...
详细信息
ISBN:
(纸本)9781467360890
adaptivedynamicprogramming is applied to control-affine nonlinear systems with uncertain drift dynamics to obtain a near-optimal solution to a finite-horizon optimal control problem with hard terminal constraints. A reinforcementlearning-based actor-critic framework is used to approximately solve the Hamilton-Jacobi-Bellman equation, wherein critic and actor neural networks (NN) are used for approximate learning of the optimal value function and control policy, while enforcing the optimality condition resulting from the hard terminal constraint. Concurrent learning-based update laws relax the restrictive persistence of excitation requirement. A Lyapunov-based stability analysis guarantees uniformly ultimately bounded convergence of the enacted control policy to the optimal control policy.
In this paper, the impact of signal transmission delays on static VAR compensator (SVC) based power system damping control using reinforcementlearning is investigated. The SVC is used to damp low-frequency oscillatio...
详细信息
ISBN:
(纸本)9781479945450
In this paper, the impact of signal transmission delays on static VAR compensator (SVC) based power system damping control using reinforcementlearning is investigated. The SVC is used to damp low-frequency oscillation between interconnected power systems under fault conditions, where measured signals from remote areas are first collected and then transmitted to the controller as the inputs. Inevitable signal transmission delays are introduced into such design that will degrade the dynamic performance of SVC and in the worst case, cause system instability. The adopted reinforcementlearning algorithm, called goal representation heuristic dynamicprogramming (GrHDP), is employed to design the SVC controller. Impact of signal transmission delays on the adopted controller is investigated with fully transient model based time-domain simulation in Matlab/Simulink environment. The simulation results on a four-machine two-area benchmark system with SVC demonstrate the effectiveness of the adopted algorithm on damping control and the impact of signal transmission delays.
In this brief, a novel adaptive-critic-based neural network (NN) controller is investigated for nonlinear pure-feedback systems. The controller design is based on the transformed predictor form, and the actor-critic N...
详细信息
In this brief, a novel adaptive-critic-based neural network (NN) controller is investigated for nonlinear pure-feedback systems. The controller design is based on the transformed predictor form, and the actor-critic NN control architecture includes two NNs, whereas the critic NN is used to approximate the strategic utility function, and the action NN is employed to minimize both the strategic utility function and the tracking error. A deterministic learning technique has been employed to guarantee that the partial persistent excitation condition of internal states is satisfied during tracking control to a periodic reference orbit. The uniformly ultimate boundedness of closed-loop signals is shown via Lyapunov stability analysis. Simulation results are presented to demonstrate the effectiveness of the proposed control.
暂无评论