This paper proposes a novel approach for coupling perception and action through minimax dynamicprogramming. We tackle domains where the agent has some control over the observation process (e.g. via the manipulation o...
详细信息
This paper proposes a novel approach for coupling perception and action through minimax dynamicprogramming. We tackle domains where the agent has some control over the observation process (e.g. via the manipulation of some sensors), and show how to transform the system so that an optimal control solution can be sought with standard algorithms. We demonstrate our method in a toy domain, where an agent guides two point masses (ldquohandsrdquo) to a target in a 2D scene with obstacles. The agent can direct the gaze of a virtual ldquoeyerdquo to different parts of the scene, thereby reducing the observation noise for elements of the scene in that vicinity and improving the quality of feedback control. In this manner, motor control of the eye allots attentional resources. We propose a unified framework that treats both perception and action as interdependent components of the same optimal control task. The implications of uncertainty on task performance are uncovered by deploying an adversary whose strength to do harm is proportional to the instantaneous level of state uncertainty. We transform the partially-observable system to a fully-observable by coupling the state dynamics with a state-estimation filter, and so augment the state space to include an explicit representation of the instantaneous state uncertainty. The augmented system is high-dimensional, but through minimax differential dynamicprogramming, a local method that is less susceptible to the curse of dimensionality, we are able to solve for the optimal control of the hands and the eye at the same time, allowing for the emergence of interesting phenomena such as hand-eye coordination, saccades and smooth pursuit.
In cooperative retransmissions, nodes with better channel qualities help other nodes in retransmitting a failed packet to its intended destination. In this paper, we propose a cooperative retransmission scheme where e...
详细信息
In cooperative retransmissions, nodes with better channel qualities help other nodes in retransmitting a failed packet to its intended destination. In this paper, we propose a cooperative retransmission scheme where each node makes local decision to cooperate or not to cooperate at what transmission power using a Markov decision process with reinforcementlearning. With the reinforcementlearning, the proposed scheme avoids solving an Markov decision process with a large number of states. Through simulations, we show that the proposed scheme is robust to collisions, is scalable with regard to the network size, and can provide significant cooperative diversity.
The proceedings contain 49 papers. The topics discussed include: fitted Q iteration with CMACs;reinforcement-learning-based magneto-hydrodynamic control hypersonic flows;a novel fuzzy reinforcementlearning approach i...
详细信息
ISBN:
(纸本)1424407060
The proceedings contain 49 papers. The topics discussed include: fitted Q iteration with CMACs;reinforcement-learning-based magneto-hydrodynamic control hypersonic flows;a novel fuzzy reinforcementlearning approach in two-level intelligent control of 3-DOF robot manipulators;knowledge transfer using local features;particle swarm optimization adaptivedynamicprogramming;discrete-time nonlinear HJB solution using approximation dynamicprogramming: convergence proof;dual representations for dynamicprogramming and reinforcementlearning;an optimal ADP algorithm for a high-dimensional stochastic control problem;convergence of model-based temporal difference learning for control;the effect of bootstrapping in multi-automata reinforcementlearning;and a theoretical analysis of cooperative behavior in multi-agent Q-learning.
In this paper, we present a novel approach to controlling a robotic system online from scratch based on the reinforcementlearning principle. In contrast to other approaches, our method learns the system dynamics and ...
详细信息
In this paper, we present a novel approach to controlling a robotic system online from scratch based on the reinforcementlearning principle. In contrast to other approaches, our method learns the system dynamics and the value function separately, which permits to identify the individual characteristics and is, therefore, easily adaptable to changing conditions. The major problem in the context of learning control policies lies in high-dimensional state and action spaces, that needs to be explored in order to identify the optimal policy. In this paper, we propose an approach that learns the system dynamics and the value function in an alternating fashion based on Gaussian process models. Additionally, to reduce computation time and to make the system applicable to online learning, we present an efficient sparsification method. In experiments carried out with a real miniature blimp we demonstrate that our approach can learn height control online. Further results obtained with an inverted pendulum show that our method requires less data to achieve the same performance as an off-line learning approach.
dynamic collaborative driving involves the motion coordination of multiple vehicles using shared information from vehicles instrumented to perceive their surroundings in order to improve road usage and safety. A basic...
详细信息
ISBN:
(纸本)9781424425686
dynamic collaborative driving involves the motion coordination of multiple vehicles using shared information from vehicles instrumented to perceive their surroundings in order to improve road usage and safety. A basic requirement of any vehicle participating in dynamic collaborative driving is longitudinal control. Without this capability, higher-level coordination is not possible. This paper focuses on the problem of longitudinal motion control. A detailed nonlinear longitudinal vehicle model which serves as the control system design platform is used to develop a longitudinal adaptive control system based on Monte Carlo reinforcementlearning. The results of the reinforcementlearning phase and the performance of the adaptive control system for a single automobile as well as the performance in a multi-vehicle platoon is presented.
In this paper, an adaptive critic-based neurofuzzy controller is presented for water level regulation of nuclear steam generators. The problem has been of great concern for many years as the steam generator is a highl...
详细信息
In this paper, an adaptive critic-based neurofuzzy controller is presented for water level regulation of nuclear steam generators. The problem has been of great concern for many years as the steam generator is a highly nonlinear system showing inverse response dynamics especially at low operating power levels. Fuzzy critic-based learning is a reinforcementlearning method based on dynamicprogramming. The only information available for the critic agent is the system feedback which is interpreted as the last action the controller has performed in the previous state. The signal produced by the critic agent is used alongside the backpropagation of error algorithm to tune online conclusion parts of the fuzzy inference rules. The critic agent here has a proportional-derivative structure and the fuzzy rule base has nine rules. The proposed controller shows satisfactory transient responses, disturbance rejection and robustness to model uncertainty. Its simple design procedure and structure, nominates it as one of the suitable controller designs for the steam generator water level control in nuclear power plant industry.
Two distinguishing features of humanlike control vis-a-vis current technological control are the ability to make use of experience while selecting a control policy for distinct situations and the ability to do so fast...
详细信息
Two distinguishing features of humanlike control vis-a-vis current technological control are the ability to make use of experience while selecting a control policy for distinct situations and the ability to do so faster and faster as more experience is gained (in contrast to current technological implementations that slow down as more knowledge is stored). The notions of context and context discernment are important to understanding this human ability. Whereas methods known as adaptive control and learning control focus on modifying the design of a controller as changes in context occur, experience-based (EB) control entails selecting a previously designed controller that is appropriate to the current situation. Developing the EB approach entails a shift of the technologist's focus "up a level" away from designing individual (optimal) controllers to that of developing online algorithms that efficiently and effectively select designs from a repository of existing controller solutions. A key component of the notions presented here is that of higher level learning algorithm. This is a new application of reinforcementlearning and, in particular, approximate dynamicprogramming, with its focus shifted to the posited higher level, and is employed, with very promising results. The author's hope for this paper is to inspire and guide future work in this promising area.
A nonaffine discrete-time system represented by the nonlinear autoregressive moving average with eXogenous input (NARMAX) representation with unknown nonlinear system dynamics is considered. An equivalent affinelike r...
详细信息
A nonaffine discrete-time system represented by the nonlinear autoregressive moving average with eXogenous input (NARMAX) representation with unknown nonlinear system dynamics is considered. An equivalent affinelike representation in terms of the tracking error dynamics, is first obtained from the original nonaffine nonlinear discrete-time system so that reinforcement-learning-based near-optimal neural network (NN) controller can be developed. The control scheme consists of two linearly parameterized NNs. One NN is designated as the critic NN, which approximates a predefined long-term cost function, and an action NN is employed to derive a near-optimal control signal for the system to track a desired trajectory while minimizing the cost function simultaneously. The NN weights are tuned online. By using the standard Lyapunov approach, the stability of the closed-loop system is shown. The net result is a supervised actor-critic NN controller scheme which can be applied to a general nonaffine nonlinear discrete-time system without needing the affinelike representation. Simulation results demonstrate satisfactory performance of the controller.
暂无评论