The proceedings contain 34 papers. The topics discussed include: a unified framework for temporal difference methods;efficient data reuse in value function approximation;constrained optimal control of affine nonlinear...
ISBN:
(纸本)9781424427611
The proceedings contain 34 papers. The topics discussed include: a unified framework for temporal difference methods;efficient data reuse in value function approximation;constrained optimal control of affine nonlinear discrete-time systems using GHJB method;algorithm and stability of ATC receding horizon control;online policy iteration based algorithms to solve the continuous-time infinite horizon optimal control problem;real-time motor control using recurrent neural networks;hierarchical optimal control of a 7-DOF Arm model;coupling perception and action using minimax optimal control;a convergent recursive least squares policy iteration algorithm for multi-dimensional Markov Decision Process with continuous state and action spaces;basis function adaptation methods for cost approximation in MDP;and executing concurrent actions with multiple Markov Decision Processes.
The proceedings contain 42 papers. The topics discussed include: approximate real-time optimal control based on sparse Gaussian process models;subspace identification for predictive state representation by nuclear nor...
ISBN:
(纸本)9781479945535
The proceedings contain 42 papers. The topics discussed include: approximate real-time optimal control based on sparse Gaussian process models;subspace identification for predictive state representation by nuclear norm minimization;active learning for classification: an optimistic approach;convergent reinforcementlearning control with neural networks and continuous action search;theoretical analysis of a reinforcementlearning based switching scheme;an analysis of optimistic, best-first search for minimax sequential decision making;information-theoretic stochastic optimal control via incremental sampling-based algorithms;policy gradient approaches for multi-objective sequential decision making: a comparison;and cognitive control in cognitive dynamic systems: a new way of thinking inspired by the brain.
The proceedings contain 45 papers. The topics discussed include: active learning for personalizing treatment;active exploration by searching for experiments that falsify the computed control policy;optimistic planning...
ISBN:
(纸本)9781424498888
The proceedings contain 45 papers. The topics discussed include: active learning for personalizing treatment;active exploration by searching for experiments that falsify the computed control policy;optimistic planning for sparsely stochastic systems;adaptive sample collection using active learning for kernel-based approximate policy iteration;tree-based variable selection for dimensionality reduction of large-scale control systems;high-order local dynamicprogramming;safe reinforcementlearning in high-risk tasks through policy improvement;agent self-assessment: determining policy quality without execution;reinforcementlearning algorithms for solving classification problems;reinforcementlearning in multidimensional continuous action spaces;grounding subgoals in information transitions;and directed exploration of policy space using support vector classifiers.
The proceedings contain 28 papers. The topics discussed include: local stability analysis of high-order recurrent neural networks with multi-step piecewise linear activation functions;finite-horizon optimal control de...
ISBN:
(纸本)9781467359252
The proceedings contain 28 papers. The topics discussed include: local stability analysis of high-order recurrent neural networks with multi-step piecewise linear activation functions;finite-horizon optimal control design for uncertain linear discrete-time systems;adaptive optimal control for nonlinear discrete-time systems;optimal control for a class of nonlinear system with controller constraints based on finite-approximation-errors ADP algorithm;finite horizon stochastic optimal control of uncertain linear networked control system;real-time tracking on adaptive critic design with uniformly ultimately bounded condition;a novel approach for constructing basis functions in approximate dynamicprogramming for feedback control;and a combined hierarchical reinforcementlearning based approach for multi-robot cooperative target searching in complex unknown environments.
adprl 2011 is the third ieee International symposium on Approximate dynamicprogramming and reinforcementlearning. The area of approximate dynamicprogramming and reinforcementlearning is a fusion of a number of res...
adprl 2011 is the third ieee International symposium on Approximate dynamicprogramming and reinforcementlearning. The area of approximate dynamicprogramming and reinforcementlearning is a fusion of a number of research areas in engineering, mathematics, artificial intelligence, operations research, and systems and control theory. This symposium brings together researchers from different disciplines and will provide a remarkable opportunity for the academic and industrial community to address new challenges, share innovative yet practical solutions, and define promising future research directions.
The proceedings contain 49 papers. The topics discussed include: fitted Q iteration with CMACs;reinforcement-learning-based magneto-hydrodynamic control hypersonic flows;a novel fuzzy reinforcementlearning approach i...
详细信息
ISBN:
(纸本)1424407060
The proceedings contain 49 papers. The topics discussed include: fitted Q iteration with CMACs;reinforcement-learning-based magneto-hydrodynamic control hypersonic flows;a novel fuzzy reinforcementlearning approach in two-level intelligent control of 3-DOF robot manipulators;knowledge transfer using local features;particle swarm optimization adaptivedynamicprogramming;discrete-time nonlinear HJB solution using approximation dynamicprogramming: convergence proof;dual representations for dynamicprogramming and reinforcementlearning;an optimal ADP algorithm for a high-dimensional stochastic control problem;convergence of model-based temporal difference learning for control;the effect of bootstrapping in multi-automata reinforcementlearning;and a theoretical analysis of cooperative behavior in multi-agent Q-learning.
Feature discovery aims at finding the best representation of data. This is a very important topic in machine learning, and in reinforcementlearning in particular. Based on our recent work on feature discovery in the ...
详细信息
ISBN:
(纸本)9781424427611
Feature discovery aims at finding the best representation of data. This is a very important topic in machine learning, and in reinforcementlearning in particular. Based on our recent work on feature discovery in the context of reinforcementlearning to discover a good, if not the best, representation of states, we report here on the use of the same kind of approach in the context of approximate dynamicprogramming. The striking difference with the usual approach is that we use a non parametric function approximator to represent the value function, instead of a parametric one. We also argue that the problem of discovering the best state representation and the problem of the value function approximation are just the two faces of the same coin, and that using a non parametric approach provides an elegant solution to both problems at once.
Although the combination of reinforcementlearning and imitation has been already considered in recent research, it always revolved around fixed settings where demonstrator and imitator are fixed and the imitation pro...
详细信息
ISBN:
(纸本)9781424427611
Although the combination of reinforcementlearning and imitation has been already considered in recent research, it always revolved around fixed settings where demonstrator and imitator are fixed and the imitation process is a well-defined period of time. What is missing is the investigation of approaches that also work in scenarios where imitation is only sporadically possible. This means that in a multi-robot scenario a robot is now allowed to interrupt another robot by asking to repeat certain actions, but can only observe and integrate information bits delivered occasionally. In this paper we present how that can be done in continuous and noisy environment within an SMDP context.
We present a reinforcementlearning algorithm based on Dyna-Sarsa that utilizes separate representations of reward and punishment when guiding state-action value learning and action selection. The adoption of policy m...
详细信息
ISBN:
(纸本)9781467359252
We present a reinforcementlearning algorithm based on Dyna-Sarsa that utilizes separate representations of reward and punishment when guiding state-action value learning and action selection. The adoption of policy meta-learning optimized by a genetic algorithm is explored and results in the context of a two-armed bandit goal-navigation task in a simple grid world are presented. The findings argue for an important role for a genetic algorithm approach for constructing the foundations of autonomous reinforcementlearning agents.
暂无评论