In this paper, finite horizon stochastic optimal control issue has been studied for linear networked control system (LNCS) in the presence of network imperfections such as network-induced delays and packet losses by u...
详细信息
ISBN:
(纸本)9781467359252
In this paper, finite horizon stochastic optimal control issue has been studied for linear networked control system (LNCS) in the presence of network imperfections such as network-induced delays and packet losses by using adaptivedynamicprogramming (ADP) approach. Due to an uncertainty in system dynamics resulting from network imperfections, the stochastic optimal control design uses a novel adaptive estimator (AE) to solve the optimal regulation of uncertain LNCS in a forward-in-time manner in contrast with backward-in-time Riccati equation-based optimal control with known system dynamics. Tuning law for unknown parameters of AE has been derived. Lyapunov theory is used to show that all the signals are uniformly ultimately bounded (UUB) with ultimate bounds being a function of initial values and final time. In addition, the estimated control input converges to optimal control input within finite horizon. Simulation results are included to show the effectiveness of the proposed scheme.
adaptivedynamicprogramming is applied to control-affine nonlinear systems with uncertain drift dynamics to obtain a near-optimal solution to a finite-horizon optimal control problem with hard terminal constraints. A...
详细信息
ISBN:
(纸本)9781467360890
adaptivedynamicprogramming is applied to control-affine nonlinear systems with uncertain drift dynamics to obtain a near-optimal solution to a finite-horizon optimal control problem with hard terminal constraints. A reinforcementlearning-based actor-critic framework is used to approximately solve the Hamilton-Jacobi-Bellman equation, wherein critic and actor neural networks (NN) are used for approximate learning of the optimal value function and control policy, while enforcing the optimality condition resulting from the hard terminal constraint. Concurrent learning-based update laws relax the restrictive persistence of excitation requirement. A Lyapunov-based stability analysis guarantees uniformly ultimately bounded convergence of the enacted control policy to the optimal control policy.
Despite the plethora of reinforcementlearning algorithms in machine learning and control, the majority of the work in this area relies on discrete time formulations of stochastic dynamics. In this work we present a n...
详细信息
ISBN:
(纸本)9781467359252
Despite the plethora of reinforcementlearning algorithms in machine learning and control, the majority of the work in this area relies on discrete time formulations of stochastic dynamics. In this work we present a new policy gradient algorithm for reinforcementlearning in continuous state action spaces and continuous time for free energy-like cost functions. The derivation is based on successive application of Girsanov's theorem and the use of the Radon Nikodym derivative as formulated for Markov diffusion processes. The resulting policy gradient is reward weighted. The use of Radon Nikodym extends analysis and results to more general models of stochasticity in which jump diffusions processes are considered. We apply the resulting algorithm in two simple examples for learning attractor landscapes in rhythmic and discrete movements.
This paper proposes an on-line near-optimal control scheme based on capabilities of neural networks (NNs), in function approximation, to attain the on-line solution of optimal control problem for nonlinear discrete-ti...
详细信息
ISBN:
(纸本)9781467359252
This paper proposes an on-line near-optimal control scheme based on capabilities of neural networks (NNs), in function approximation, to attain the on-line solution of optimal control problem for nonlinear discrete-time systems. First, to solve the Hamilton-Jacobi-Bellman (HJB) equation forward-intime appearing in the optimal control problem, two neural networks are used to approximate the cost function and to compute the optimal control policy, respectively. And then, according to the Bellman's optimality principle and the adaptive technology, the on-line weight updating laws for the critic network and action network are derived, respectively. Further, considering NNs approximative errors, the stability analysis of the closed-loop system is demonstrated by Lyapunov theory. At last, a numerical example is provided to demonstrate the effectiveness of the proposed method.
The electricity market have provided a complex economic environment, and consequently have increased the requirement for advancement of learning methods. In the agent-based modeling and simulation framework of this ec...
详细信息
ISBN:
(纸本)9781467359252
The electricity market have provided a complex economic environment, and consequently have increased the requirement for advancement of learning methods. In the agent-based modeling and simulation framework of this economic system, the generation company's decision-making is modeled using reinforcementlearning. Existing learning methods that models the generation company's strategic bidding behavior are not adapted to the non-stationary and non-Markovian environment involving multidimensional and continuous state and action spaces. This paper proposes a reinforcementlearning method to overcome these limitations. The proposed method discovers the input space structure through the self-organizing map, exploits learned experience through Roth-Erev reinforcementlearning and the explores through the actor critic map. Simulation results from experiments show that the proposed method outperforms Simulated Annealing Q-learning and Variant Roth-Erev reinforcementlearning. The proposed method is a step towards more realistic agent learning in Agent-based Computational Economics.
We identify a class of stochastic control problems with highly random rewards and high discount factor which induce high levels of statistical error in the estimated action-value function. This produces significant le...
详细信息
ISBN:
(纸本)9781467359252
We identify a class of stochastic control problems with highly random rewards and high discount factor which induce high levels of statistical error in the estimated action-value function. This produces significant levels of max-operator bias in Q-learning, which can induce the algorithm to diverge for millions of iterations. We present a bias-corrected Q-learning algorithm with asymptotically unbiased resistance against the max-operator bias, and show that the algorithm asymptotically converges to the optimal policy, as Q-learning does. We show experimentally that bias-corrected Q-learning performs well in a domain with highly random rewards where Q-learning and other related algorithms suffer from the max-operator bias.
In the case that a robot controller is trained by means of evolutionary computation, the robot will be able to behave sufficiently in the environment where the robot has been trained. However, if the robot is put in a...
详细信息
ISBN:
(纸本)9781467359252
In the case that a robot controller is trained by means of evolutionary computation, the robot will be able to behave sufficiently in the environment where the robot has been trained. However, if the robot is put in an environment which is more complex than a training environment, it cannot behave sufficiently and is required to be trained again so as it fits to the complex environment. Based on this fact, we build a training environment for a robot controller with the partial components of a more complex environment than the training environment and aim to obtain a controller which makes a robot be able to act in the complex environment by only training the controller at a simpler environment. We clarify a way of building a training environment which functions effectively for training a robot controller and discuss how much training is necessary in the training environment for a robot to be able to behave under a more complex environment.
In multi-objective problems, it is key to find compromising solutions that balance different objectives. The linear scalarization function is often utilized to translate the multi-objective nature of a problem into a ...
详细信息
ISBN:
(纸本)9781467359252
In multi-objective problems, it is key to find compromising solutions that balance different objectives. The linear scalarization function is often utilized to translate the multi-objective nature of a problem into a standard, single-objective problem. Generally, it is noted that such as linear combination can only find solutions in convex areas of the Pareto front, therefore making the method inapplicable in situations where the shape of the front is not known beforehand, as is often the case. We propose a non-linear scalarization function, called the Chebyshev scalarization function, as a basis for action selection strategies in multi-objective reinforcementlearning. The Chebyshev scalarization method overcomes the flaws of the linear scalarization function as it can (i) discover Pareto optimal solutions regardless of the shape of the front, i. e. convex as well as non-convex, (ii) obtain a better spread amongst the set of Pareto optimal solutions and (iii) is not particularly dependent on the actual weights used.
This paper compares three strategies in using reinforcementlearning algorithms to let an artificial agent learn to play the game of Othello. The three strategies that are compared are: learning by self-play, learning...
详细信息
ISBN:
(纸本)9781467359252
This paper compares three strategies in using reinforcementlearning algorithms to let an artificial agent learn to play the game of Othello. The three strategies that are compared are: learning by self-play, learning from playing against a fixed opponent, and learning from playing against a fixed opponent while learning from the opponent's moves as well. These issues are considered for the algorithms Q-learning, Sarsa and TD-learning. These three reinforcementlearning algorithms are combined with multi-layer perceptrons and trained and tested against three fixed opponents. It is found that the best strategy of learning differs per algorithm. Q-learning and Sarsa perform best when trained against the fixed opponent they are also tested against, whereas TD-learning performs best when trained through self-play. Surprisingly, Q-learning and Sarsa outperform TD-learning against the stronger fixed opponents, when all methods use their best strategy. learning from the opponent's moves as well leads to worse results compared to learning only from the learning agent's own moves.
There has been a growing interest in the study of adaptive/approximate dynamicprogramming (ADP) in recent years. The ADP technique provides a powerful tool to understand and improve the principled technologies of mac...
详细信息
ISBN:
(纸本)9781467359252
There has been a growing interest in the study of adaptive/approximate dynamicprogramming (ADP) in recent years. The ADP technique provides a powerful tool to understand and improve the principled technologies of machine intelligence system. As one of the ADP algorithms based on adaptive critic neural networks (NNs), the direct heuristic dynamicprogramming (direct HDP) has demonstrated some successful applications in solving realistic engineering control problems. In this study, based on a three-network architecture in which the reinforcement signal is approximated by an additional NN, a novel integrated design method for intensified direct HDP is developed. The new design approach is implemented by using multiple PID neural networks (PIDNNs), which effectively takes into account structural knowledge of system states and control that are usually present in a physical system. By using a Lyapunov stability approach, a uniformly ultimately boundedness (UUB) result is proved for our PIDNNs-based intensified direct HDP learning controller. Furthermore, the learning and control performances of the proposed design is tested using the popular cart-pole example to illustrate the key ideas of this paper.
暂无评论