In this paper we present a model free Deep reinforcementlearning based approach to the motion planning problem of a quadruped moving from a flat to an inclined plane. In our implementation, we do not provide any prio...
详细信息
ISBN:
(数字)9781728166674
ISBN:
(纸本)9781728166681
In this paper we present a model free Deep reinforcementlearning based approach to the motion planning problem of a quadruped moving from a flat to an inclined plane. In our implementation, we do not provide any prior information of the location of the inclined plane, nor pass any vision data during the training process. With this approach, we train a 12 degree of freedom quadruped robot to traverse up and down a variety of simulated sloped environments, in the process demonstrating that deep reinforcementlearning is able to generate highly dynamic and adaptable solutions.
The majority of current studies on autonomous vehicle control via deep reinforcementlearning (DRL) utilize point-mass kinematic models, neglecting vehicle dynamics which includes acceleration delay and acceleration c...
详细信息
ISBN:
(纸本)9781538670248
The majority of current studies on autonomous vehicle control via deep reinforcementlearning (DRL) utilize point-mass kinematic models, neglecting vehicle dynamics which includes acceleration delay and acceleration command dynamics. The acceleration delay, which results from sensing and actuation delays, results in delayed execution of the control inputs. The acceleration command dynamics dictates that the actual vehicle acceleration does not rise up to the desired command acceleration instantaneously due to dynamics. In this work, we investigate the feasibility of applying DRL controllers trained using vehicle kinematic models to more realistic driving control with vehicle dynamics. We consider a particular longitudinal car-following control, i.e., adaptive Cruise Control (ACC), problem solved via DRL using a point-mass kinematic model. When such a controller is applied to car following with vehicle dynamics, we observe significantly degraded car-following performance. Therefore, we redesign the DRL framework to accommodate the acceleration delay and acceleration command dynamics by adding the delayed control inputs and the actual vehicle acceleration to the reinforcementlearning environment state, respectively. The training results show that the redesigned DRL controller results in near-optimal control performance of car following with vehicle dynamics considered when compared with dynamicprogramming solutions.
Although the adaptivedynamicprogramming (ADP) scheme has been widely researched on the optimal problem in recent years, which has not been applied to the servo system. In this paper, a simplified reinforcement learn...
详细信息
ISBN:
(纸本)9789881563972
Although the adaptivedynamicprogramming (ADP) scheme has been widely researched on the optimal problem in recent years, which has not been applied to the servo system. In this paper, a simplified reinforcementlearning (RL) based (ADP) scheme is developed to obtain the optimal tracking control of the servo system, where the unknown system dynamics are approximated with a three-layer neural network (NN) identifier. First, the servo system model is constructed and a three-layer NN identifier is used to approximate the unknown servo system. The NN weights of both the hidden layer and output layer are synchronously tuned with an adaptive gradient law. An RL-based critic NN is then used to learn the optimal cost function, and NN weights are updated by minimizing the squared Hamilton-Jacobi-Bellman (HJB) error. The optimal tracking control of the servomechanism is obtained based on the three-layer NN identifier and RL scheme, which can make the motor speed track the predefined command. Moreover, the convergence of the identifier and NN weights is proved. Finally, a servomechanism model is provided, which can illustrate the proposed methods.
Lateral control design is one of the fundamental components for self-driving cars. In this paper, we propose a learning-based control strategy that enables a mobile car equipped with a camera to perfectly perform lane...
详细信息
ISBN:
(纸本)9781728111643
Lateral control design is one of the fundamental components for self-driving cars. In this paper, we propose a learning-based control strategy that enables a mobile car equipped with a camera to perfectly perform lane keeping in a road on the ground. Using the method of adaptivedynamicprogramming, the proposed control algorithm exploits the structural knowledge of the car kinematics as well as the collected data (images) about the lane information. An adaptive optimal lateral controller is obtained through a data-driven learning algorithm. The effectiveness of the proposed method is demonstrated by theoretical stability proofs and experimental evaluations.
In practice, it is quite common to face combinatorial optimization problems which contain uncertainty along with non determinism and dynamicity. These three properties call for appropriate algorithms; reinforcement le...
详细信息
ISBN:
(数字)9781728125473
ISBN:
(纸本)9781728125480
In practice, it is quite common to face combinatorial optimization problems which contain uncertainty along with non determinism and dynamicity. These three properties call for appropriate algorithms; reinforcementlearning (RL) is dealing with them in a very natural way. Today, despite some efforts, most real-life combinatorial optimization problems remain out of the reach of reinforcementlearning algorithms. In this paper, we propose a reinforcementlearning approach to solve a realistic scheduling problem, and apply it to an algorithm commonly executed in the high performance computing community, the CHOLESKY factorization. On the contrary to static scheduling, where tasks are assigned to processors in a predetermined ordering before the beginning of the parallel execution, our method is dynamic: task allocations and their execution ordering are decided at runtime, based on the system state and unexpected events, which allows much more flexibility. To do so, our algorithm uses graph neural networks in combination with an actor critic algorithm (A2C) to build an adaptive representation of the problem on the fly. We show that this approach is competitive with state-of-the-art heuristics used in high performance computing runtime systems. Moreover, our algorithm does not require an explicit model of the environment, but we demonstrate that extra knowledge can easily be incorporated and improves the performance. We also exhibit key properties provided by this RL approach, and study its transfer abilities to other instances.
In robot-assisted rehabilitation, assist-as-needed (AAN) controllers have been proposed to promote subjects' active participation, which is thought to lead to better training outcomes. Most of these AAN controller...
详细信息
ISBN:
(纸本)9781728140049
In robot-assisted rehabilitation, assist-as-needed (AAN) controllers have been proposed to promote subjects' active participation, which is thought to lead to better training outcomes. Most of these AAN controllers require a patient-specific manual tuning of the parameters defining the underlying force-field, which typically results in a tedious and time-consuming process. In this paper, we propose a reinforcement-learning-based impedance controller that actively reshapes the stiffness of the force-field to the subject's performance, while providing assistance only when needed. This adaptability is made possible by correlating the subject's most recent performance to the ultimate control objective in real-time. In addition, the proposed controller is built upon action dependent heuristic dynamicprogramming using the actor-critic structure, and therefore does not require prior knowledge of the system model. The controller is experimentally validated with healthy subjects through a simulated ankle mobilization training session using a powered ankle-foot orthosis.
As technology scales, Network-on-Chips (NoCs), currently being used for on-chip communication in manycore architectures, face several problems including high network latency, excessive power consumption, and low relia...
详细信息
ISBN:
(纸本)9781450366694
As technology scales, Network-on-Chips (NoCs), currently being used for on-chip communication in manycore architectures, face several problems including high network latency, excessive power consumption, and low reliability. Simultaneously addressing these problems is proving to be difficult due to the explosion of the design space and the complexity of handling many trade-offs. In this paper, we propose IntelliNoC, an intelligent NoC design framework which introduces architectural innovations and uses reinforcementlearning to manage the design complexity and simultaneously optimize performance, energy-efficiency, and reliability in a holistic manner. IntelliNoC integrates three NoC architectural techniques: (1) multi-function adaptive channels (MFACs) to improve energy-efficiency;(2) adaptive error detection/correction and re-transmission control to enhance reliability;and (3) a stress-relaxing bypass feature which dynamically powers off NoC components to prevent overheating and fatigue. To handle the complex dynamic interactions induced by these techniques, we train a dynamic control policy using Q-learning, with the goal of providing improved fault-tolerance and performance while reducing power consumption and area overhead. Simulation using PARSEC benchmarks shows that our proposed IntelliNoC design improves energy-efficiency by 67% and mean-time-to-failure (MTTF) by 77%, and decreases end-to-end packet latency by 32% and area requirements by 25% over baseline NoC architecture.
Packet routing is one of the fundamental problems in computer networks in which a router determines the next-hop of each packet in the queue to get it as quickly as possible to its destination. reinforcementlearning ...
详细信息
ISBN:
(纸本)9783903176201
Packet routing is one of the fundamental problems in computer networks in which a router determines the next-hop of each packet in the queue to get it as quickly as possible to its destination. reinforcementlearning has been introduced to design the autonomous packet routing policy namely Q-routing only using local information available to each router. However, the curse of dimensionality of Q-routing prohibits the more comprehensive representation of dynamic network states, thus limiting the potential benefit of reinforcementlearning. Inspired by recent success of deep reinforcementlearning (DRL), we embed deep neural networks in multi-agent Q-routing. Each router possesses an independent neural network that is trained without communicating with its neighbors and makes decision locally. Two multi-agent DRL-enabled routing algorithms are proposed: one simply replaces Q-table of vanilla Q-routing by a deep neural network, and the other further employs extra information including the past actions and the destinations of non-head of line packets. Our simulation manifests that the direct substitution of Q-table by a deep neural network may not yield minimal delivery delays because the neural network does not learn more from the same input. When more information is utilized, adaptive routing policy can converge and significantly reduce the packet delivery time.
This paper addresses Virtualized Network Function Forwarding Graph (VNF-FG) embedding with the objective of realizing long term reward compared to placement algorithms that aim at instantaneous optimal placement. The ...
详细信息
ISBN:
(数字)9781728144900
ISBN:
(纸本)9781728144917
This paper addresses Virtualized Network Function Forwarding Graph (VNF-FG) embedding with the objective of realizing long term reward compared to placement algorithms that aim at instantaneous optimal placement. The long term reward is obtained using reinforcementlearning (RL), following a Markov Decision Process (MDP) model, enhanced through the injection of expert knowledge in the learning process. A comparison with an Integer Linear programming (ILP) approach, a reduced candidate set (R-ILP), and an algorithm that treats the requests in batch reveals the potential improvements using the RL approach. The instantaneous and short term reward solutions are efficient only in finding instant solutions as they make decisions only on current infrastructure status for a given request at a time or eventually a batch of requests. They are efficient only for present conditions without anticipating future requests. RL possesses instead the learning and anticipation capabilities lacking in instantaneous and snapshot optimizations. A reinforcementlearning based approach, called EQL (Enhanced Q-learning), aiming at balancing the load on hosting infrastructures is proposed to achieve the desired longer term reward. EQL employs RL to learn the network and control it based on the usage patterns of the physical resources. Results from extensive simulations, based on realistic and large scale topologies, report the superior performance of EQL in terms of acceptance rate, quality, scalability and achieved gains.
We present a model-free optimal control design for electric power systems with unknown transmission network and load models to improve its dynamic performance using techniques from reinforcementlearning (RL) and adap...
详细信息
ISBN:
(数字)9781728171401
ISBN:
(纸本)9781728171418
We present a model-free optimal control design for electric power systems with unknown transmission network and load models to improve its dynamic performance using techniques from reinforcementlearning (RL) and adaptivedynamicprogramming (ADP). We consider different persistent disturbances in the grid including ambient oscillations resulting from load fluctuations and their effects on exciter voltage regulation loops. We also consider forced oscillation scenarios that frequently occur due to malfunctioning of governor valves. Our proposed RL algorithm recovers the optimal feedback response in spite of all of these disturbances in a completely model-free way using online measurements of the states, inputs, and the disturbances. The design is validated using the ieee benchmark 39-bus, 10-generator New England power system model perturbed with different ambient and forced oscillations.
暂无评论