In this paper, the problem of path planning of quadrotor unmanned aerial vehicles (UAVs) is investigated in the framework of reinforcement learning methodology. With the abstraction of the environment in the form of g...
详细信息
ISBN:
(纸本)9789811328534;9789811328527
In this paper, the problem of path planning of quadrotor unmanned aerial vehicles (UAVs) is investigated in the framework of reinforcement learning methodology. With the abstraction of the environment in the form of grid world in 2D, the design procedure is presented by utilizing the dyna-q algorithm, which is one of the reinforcement method combining both model-based and non-model framework. In this process, an optimal or suboptimal safe flight trajectory will be obtained by learning constantly and planning by simulated experience, thus calculative reward can be maximized efficiently. Matlab software is used for maze establishing and computation, and the effectiveness of the proposed method is illustrated by two typical examples.
This paper proposes a demand response method to reduce the long-term charging cost of single plug-in electric vehicles (PEV) while overcoming obstacles such as the stochastic nature of the user's driving behaviour...
详细信息
This paper proposes a demand response method to reduce the long-term charging cost of single plug-in electric vehicles (PEV) while overcoming obstacles such as the stochastic nature of the user's driving behaviour, traffic condition, energy usage, and energy price. The problem is formulated as a Markov Decision Process (MDP) with an unknown transition probability matrix and solved using deep reinforcement learning (RL) techniques. The proposed method does not require any initial data on the PEV driver's behaviour and shows improvement on learning speed when compared to a pure model-free reinforcement learning method. A combination of model-based and model-free learning methods called dyna-q reinforcement learning is utilized in our strategy. Every time a real experience is obtained, the model is updated, and the RL agent will learn from both the real experience and "imagined" experiences from the model. Due to the vast amount of state space, a table-lookup method is impractical, and a value approximation method using deep neural networks is employed for estimating the long-term expected reward of all state-action pairs. An average of historical price and a long short-term memory (LSTM) network are used to predict future price. Simulation results demonstrate the effectiveness of this approach and its ability to reach an optimal policy quicker while avoiding state of charge (SOC) depletion during trips when compared to existing PEV charging schemes.
Reinforcement learning (RL) is a popular method for solving the path planning problem of autonomous mobile robots in unknown environments. However, the primary difficulty faced by learning robots using the RL method i...
详细信息
Reinforcement learning (RL) is a popular method for solving the path planning problem of autonomous mobile robots in unknown environments. However, the primary difficulty faced by learning robots using the RL method is that they learn too slowly in obstacle-dense environments. To more efficiently solve the path planning problem of autonomous mobile robots in such environments, this paper presents a novel approach in which the robot's learning process is divided into two phases. The first one is to accelerate the learning process for obtaining an optimal policy by developing the well-known dyna-q algorithm that trains the robot in learning actions for avoiding obstacles when following the vector direction. In this phase, the robot's position is represented as a uniform grid. At each time step, the robot performs an action to move to one of its eight adjacent cells, so the path obtained from the optimal policy may be longer than the true shortest path. The second one is to train the robot in learning a collision-free smooth path for decreasing the number of the heading changes of the robot. The simulation results show that the proposed approach is efficient for the path planning problem of autonomous mobile robots in unknown environments with dense obstacles.
Reinforcement learning (RL) has been successfully applied to solve path planning problems, but learning is generally slow. The main reason is not making full use of information collected during interaction with the en...
详细信息
ISBN:
(纸本)9781665440899
Reinforcement learning (RL) has been successfully applied to solve path planning problems, but learning is generally slow. The main reason is not making full use of information collected during interaction with the environment. This paper proposes a novel method to solve the discrete space path planning problem in an environment without prior knowledge with intensive obstacles based on RL and heuristic search. Firstly, we apply dyna-q algorithm of RL to explore the map and search for the target point and optimize its policy with upper confidence bound (UCB). Then, when the target point is found, we use heuristic search to plan the path from the starting point to the target point and narrow the path to a small range. Finally, we combine dyna-q algorithm with the heuristic search recommended path for path planning. We evaluate our algorithm using maze navigation problem. The results verify that heuristic search accelerates Dyan-q convergence.
It is quit difficult to archive perfect effects by applying the traditional modeling and control methods-to the urban traffic signal control system because of non-linearity, fuzzyness, self-organization and uncertaint...
详细信息
ISBN:
(纸本)9787811240559
It is quit difficult to archive perfect effects by applying the traditional modeling and control methods-to the urban traffic signal control system because of non-linearity, fuzzyness, self-organization and uncertainty in the system. The artificial intelligence technologies may offer a new way to resolve this problem. In allusion to characteristics of the traffic signal control system, this paper proposes an on-line control algorithm based on dyna-q reinforcement learning, and utilizes the experiential knowledge gained by the traffic signal control agent in the trial-error process to estimate the model, and then plans the actions in the estimated model, accordingly it can accelerate the iterative process of the q-learning. This paper adapts TSIS(a microscopic traffic analysis software) to implement the simulation on two traffic trunk roads which consist of 10 intersections. Comparing with fixed-time control, genetic algorithm and q-learning control algorithm, simulation results indicate that dyna-q reinforcement learning algorithm has an obvious superiority.
Reinforcement learning(RL) has been successfully applied to solve path planning problems,but learning is generally *** main reason is not making full use of information collected during interaction with the *** paper ...
详细信息
Reinforcement learning(RL) has been successfully applied to solve path planning problems,but learning is generally *** main reason is not making full use of information collected during interaction with the *** paper proposes a novel method to solve the discrete space path planning problem in an environment without prior knowledge with intensive obstacles based on RL and heuristic ***,we apply dyna-q algorithm of RL to explore the map and search for the target point and optimize its policy with upper confidence bound(UCB).Then,when the target point is found,we use heuristic search to plan the path from the starting point to the target point and narrow the path to a small ***,we combine dyna-q algorithm with the heuristic search recommended path for path *** evaluate our algorithm using maze navigation *** results verify that heuristic search accelerates Dyan-q convergence.
暂无评论