reinforcementlearning for control in stochastic processes has received significant attention in the last few years. Several data-efficient methods, even for continuous state spaces, have been proposed, however most o...
详细信息
This paper presents a theoretical and empirical analysis of Expected Sarsa, a variation on Sarsa, the classic onpolicy temporal-difference method for model-free reinforcementlearning. Expected Sarsa exploits knowledg...
详细信息
Ramp metering has been developed as a traffic management strategy to alleviate congestion on freeways. Most ramp metering control algorithms are concerned without queuing consideration, because it's still a tough ...
详细信息
Some three decades ago, certain computational intelligence methods of reinforcementlearning were recognized as implementing an approximation of Bellman's dynamicprogramming method, which is known in the controls...
详细信息
ISBN:
(纸本)9781424435494
Some three decades ago, certain computational intelligence methods of reinforcementlearning were recognized as implementing an approximation of Bellman's dynamicprogramming method, which is known in the controls community as an important tool for designing optimal control policies for nonlinear plants and sequential decision making. Significant theoretical and practical developments have occurred within this arena, mostly in the past decade, with the methodology now usually referred to as adaptivedynamicprogramming (ADP). The objective of this paper is to provide a retrospective of selected threads of such developments. In addition, a commentary is offered concerning present status of ADP, and threads for future research and development within the controls field are suggested.
Production scheduling is critical for manufacturing system. Dispatching rules are usually applied dynamically to schedule the job in the dynamic job-shop. The paper presents an adaptive iterative scheduling algorithm ...
详细信息
ISBN:
(纸本)9781424447947
Production scheduling is critical for manufacturing system. Dispatching rules are usually applied dynamically to schedule the job in the dynamic job-shop. The paper presents an adaptive iterative scheduling algorithm that operates dynamically to schedule the job in the dynamic job-shop. In order to get adaptive behavior, the reinforcementlearning system is done with the phased Q-learning by defining the intermediate state pattern. We convert the scheduling problem into reinforcementlearning problems by constructing a multi-phase dynamicprogramming process, including the definition of state representation, actions and the reward function. We use five heuristic rules, CNP-CR, CNP-FCFS, CNP-EFT, CNP-EDD and CNP-SPT, as actions and the scheduling objective: minimization of maximum completion time. So a complex dynamic scheduling problem can be divided into a sequential sub-problem easier to solve. We also analyze the time and the solution and present some experimental results.
Living organisms learn by acting on their environment, observing the resulting reward stimulus, and adjusting their actions accordingly to improve the reward. This action-based or reinforcementlearning can capture no...
详细信息
ISBN:
(纸本)9781424454402
Living organisms learn by acting on their environment, observing the resulting reward stimulus, and adjusting their actions accordingly to improve the reward. This action-based or reinforcementlearning can capture notions of optimal behavior occurring in natural systems. We describe mathematical formulations for reinforcementlearning and a practical implementation method known as adaptivedynamicprogramming. These give us insight into the design of controllers for man-made engineered systems that both learn and exhibit optimal behavior. Relations are show between ADP and adaptive control.
In this paper we propose a novel strategy for converging dynamic policies generated by adaptive agents, which receive and accumulate rewards for their actions. The goal of the proposed strategy is to speed up the conv...
详细信息
ISBN:
(纸本)9781424427673
In this paper we propose a novel strategy for converging dynamic policies generated by adaptive agents, which receive and accumulate rewards for their actions. The goal of the proposed strategy is to speed up the convergence of such agents to a good policy in dynamic environments. Since it is difficult to have the good value for a state due to the continuous changing in the environment, previous policies are kept in memory for reuse in future policies, avoiding delays or unexpected speedups in the agent's learning. Experimental results on dynamic environments with different policies have shown that the proposed strategy is able to speed up the convergence of the agent while achieving good action policies.
This paper deals with computation of optimal nonrandomized nonstationary policies and mixed stationary policies for average-reward Markov decision processes with multiple criteria and constraints. We consider problems...
详细信息
This paper deals with computation of optimal nonrandomized nonstationary policies and mixed stationary policies for average-reward Markov decision processes with multiple criteria and constraints. We consider problems with finite state and action sets satisfying the unichain condition. The described procedure for computing optimal nonrandomized policies can also be used for adaptive control problems.
Feature discovery aims at finding the best representation of data. This is a very important topic in machine learning, and in reinforcementlearning in particular. Based on our recent work on feature discovery in the ...
详细信息
Feature discovery aims at finding the best representation of data. This is a very important topic in machine learning, and in reinforcementlearning in particular. Based on our recent work on feature discovery in the context of reinforcementlearning to discover a good, if not the best, representation of states, we report here on the use of the same kind of approach in the context of approximate dynamicprogramming. The striking difference with the usual approach is that we use a non parametric function approximator to represent the value function, instead of a parametric one. We also argue that the problem of discovering the best state representation and the problem of the value function approximation are just the two faces of the same coin, and that using a non parametric approach provides an elegant solution to both problems at once.
reinforcementlearning is an essential ability for robots to learn new motor skills. Nevertheless, few methods scale into the domain of anthropomorphic robotics. In order to improve in terms of efficiency, the problem...
详细信息
reinforcementlearning is an essential ability for robots to learn new motor skills. Nevertheless, few methods scale into the domain of anthropomorphic robotics. In order to improve in terms of efficiency, the problem is reduced onto reward-weighted imitation. By doing so, we are able to generate a framework for policy learning which both unifies previous reinforcementlearning approaches and allows the derivation of novel algorithms. We show our two most relevant applications both for motor primitive learning (e.g., a complex Ball-in-a-Cup task using a real Barrett WAM robot arm) and learning task-space control.
暂无评论