The proceedings contain 49 papers. The topics discussed include: fitted Q iteration with CMACs;reinforcement-learning-based magneto-hydrodynamic control hypersonic flows;a novel fuzzy reinforcementlearning approach i...
详细信息
ISBN:
(纸本)1424407060
The proceedings contain 49 papers. The topics discussed include: fitted Q iteration with CMACs;reinforcement-learning-based magneto-hydrodynamic control hypersonic flows;a novel fuzzy reinforcementlearning approach in two-level intelligent control of 3-DOF robot manipulators;knowledge transfer using local features;particle swarm optimization adaptive dynamicprogramming;discrete-time nonlinear HJB solution using approximation dynamicprogramming: convergence proof;dual representations for dynamicprogramming and reinforcementlearning;an optimal ADP algorithm for a high-dimensional stochastic control problem;convergence of model-based temporal difference learning for control;the effect of bootstrapping in multi-automata reinforcementlearning;and a theoretical analysis of cooperative behavior in multi-agent Q-learning.
This paper describes backpropagation through an LSTM recurrent neural network model/critic, for reinforcementlearning tasks in partially observable domains. This combines the advantage of LSTM's strength at learn...
详细信息
ISBN:
(纸本)9781424407064
This paper describes backpropagation through an LSTM recurrent neural network model/critic, for reinforcementlearning tasks in partially observable domains. This combines the advantage of LSTM's strength at learning long-term temporal dependencies to infer states in partially observable tasks, with the advantage of being able to learn high-dimensional and/or continuous actions with backpropagation's focused credit assignment mechanism.
This paper proposes an approximatedynamicprogramming strategy for responsive traffic signal control. It is the first attempt that optimizes signal control objective dynamically through adaptive approximation of valu...
详细信息
ISBN:
(纸本)9781424407064
This paper proposes an approximatedynamicprogramming strategy for responsive traffic signal control. It is the first attempt that optimizes signal control objective dynamically through adaptive approximation of value function. The proposed value function approximation is separable and exogenous factor independent. The algorithm updates the approximated value function progressively in operation, while preserving the structural property of the control problem. The convergence and performance of the algorithm have been tested in a range of experiments. It has been concluded that the new strategy is as good as the best existing control strategies while being efficient and simple in computation. It also has the potential of being extended to multi-phase signal control at isolate junction and to decentralized network operation.
We are interested in finding the most effective combination between off-line and on-line/real-time training in approximatedynamicprogramming. We introduce our approach of combining proven off-line methods of trainin...
详细信息
ISBN:
(纸本)9781424407064
We are interested in finding the most effective combination between off-line and on-line/real-time training in approximatedynamicprogramming. We introduce our approach of combining proven off-line methods of training for robustness with a group of on-line methods. Training for robustness is carried out on reasonably accurate models with the multi- stream Kalman filter method [1], whereas on-line adaptation is performed either with the help of a critic or by methods resembling reinforcementlearning. We also illustrate importance of using recurrent neural networks for both controller/actor and critic.
We describe an approach towards reducing the curse of dimensionality for deterministic dynamicprogramming with continuous actions by randomly sampling actions while computing a steady state value function and policy....
详细信息
ISBN:
(纸本)9781424407064
We describe an approach towards reducing the curse of dimensionality for deterministic dynamicprogramming with continuous actions by randomly sampling actions while computing a steady state value function and policy. This approach results in globally optimized actions, without searching over a discretized multidimensional grid. We present results on finding time invariant control laws for two, four, and six dimensional deterministic swing up problems with up to 480 million discretized states.
We propose a provably optimal approximatedynamicprogramming algorithm for a class of multistage stochastic problems, taking into account that the probability distribution of the underlying stochastic process is not ...
详细信息
ISBN:
(纸本)9781424407064
We propose a provably optimal approximatedynamicprogramming algorithm for a class of multistage stochastic problems, taking into account that the probability distribution of the underlying stochastic process is not known and the state space is too large to be explored entirely. The algorithm and its proof of convergence rely on the fact that the optimal value functions of the problems within the problem class are concave and piecewise linear. The algorithm is a combination of Monte Carlo simulation, pure exploitation, stochastic approximation and a projection operation. Several applications, in areas like energy, control, inventory and finance, fall under the framework.
In this paper, we suggest and analyze the use of approximatereinforcementlearning techniques for a new category of challenging benchmark problems from the field of Operations Research. We demonstrate that interpreti...
详细信息
ISBN:
(纸本)9781424407064
In this paper, we suggest and analyze the use of approximatereinforcementlearning techniques for a new category of challenging benchmark problems from the field of Operations Research. We demonstrate that interpreting and solving the task of job-shop scheduling as a multi-agent learning problem is beneficial for obtaining near-optimal solutions and can very well compete with alternative solution approaches. The evaluation of our algorithms focuses on numerous established Operations Research benchmark problems.
Quite some research has been done on reinforcementlearning in continuous environments, but the research on problems where the actions can also be chosen from a continuous space is much more limited. We present a new ...
详细信息
ISBN:
(纸本)9781424407064
Quite some research has been done on reinforcementlearning in continuous environments, but the research on problems where the actions can also be chosen from a continuous space is much more limited. We present a new class of algorithms named Continuous Actor Critic learning Automaton (CACLA) that can handle continuous states and actions. The resulting algorithm is straightforward to implement. An experimental comparison is made between this algorithm and other algorithms that can handle continuous action spaces. These experiments show that CACLA performs much better than the other algorithms, especially when it is combined with a Gaussian exploration method.
There are fundamental difficulties when only using a supervised learning philosophy to predict financial stock short-term movements. We present a reinforcement-oriented forecasting framework in which the solution is c...
详细信息
暂无评论