检索结果-内蒙古大学图书馆

18th Annual Asia Simulation Conference (AsiaSim)

作者： Huo, Xin Zhang, Tianze Wang, Yuzhu Liu, Weizhen Harbin Inst Technol Harbin 150080 Peoples R China Natl Instruments China Shanghai 201203 Peoples R China Helong Senior High Sch Nongan Changchun 130216 Peoples R China

ISBN: (纸本)9789811328534;9789811328527

In this paper, the problem of path planning of quadrotor unmanned aerial vehicles (UAVs) is investigated in the framework of reinforcement learning methodology. With the abstraction of the environment in the form of grid world in 2D, the design procedure is presented by utilizing the dyna-q algorithm, which is one of the reinforcement method combining both model-based and non-model framework. In this process, an optimal or suboptimal safe flight trajectory will be obtained by learning constantly and planning by simulated experience, thus calculative reward can be maximized efficiently. Matlab software is used for maze establishing and computation, and the effectiveness of the proposed method is illustrated by two typical examples.

关键词： Reinforcement learning dyna-q algorithm Path planning quadrotor UAVs

来源：评论

学校读者我要写书评

暂无评论

Autonomous PEV Charging Scheduling Using dyna-q Reinforcement Learning

引用

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY 2020年第11期69卷 12609-12620页

作者： Wang, Fan Gao, Jie Li, Mushu Zhao, Lian Bell Canada Business Intelligence Dept Toronto ON M3C 4B4 Canada Marquette Univ Dept Elect & Comp Engn Milwaukee WI 53233 USA Univ Waterloo Dept Elect & Comp Engn Waterloo ON N2L 3G1 Canada Ryerson Univ Dept Elect Comp & Biomed Engn Toronto ON M5B 2K3 Canada

This paper proposes a demand response method to reduce the long-term charging cost of single plug-in electric vehicles (PEV) while overcoming obstacles such as the stochastic nature of the user's driving behaviour, traffic condition, energy usage, and energy price. The problem is formulated as a Markov Decision Process (MDP) with an unknown transition probability matrix and solved using deep reinforcement learning (RL) techniques. The proposed method does not require any initial data on the PEV driver's behaviour and shows improvement on learning speed when compared to a pure model-free reinforcement learning method. A combination of model-based and model-free learning methods called dyna-q reinforcement learning is utilized in our strategy. Every time a real experience is obtained, the model is updated, and the RL agent will learn from both the real experience and "imagined" experiences from the model. Due to the vast amount of state space, a table-lookup method is impractical, and a value approximation method using deep neural networks is employed for estimating the long-term expected reward of all state-action pairs. An average of historical price and a long short-term memory (LSTM) network are used to predict future price. Simulation results demonstrate the effectiveness of this approach and its ability to reach an optimal policy quicker while avoiding state of charge (SOC) depletion during trips when compared to existing PEV charging schemes.

关键词： State of charge Batteries Learning (artificial intelligence) Stochastic processes Smart grids Electronic mail Vehicles Deep reinforcement learning dyna-q algorithm neural networks PEV charging scheduling

来源：评论

学校读者我要写书评

暂无评论

Reinforcement Learning Combined with Heuristic Search for Solving Discrete Space Path Planning Problems 33

Reinforcement Learning Combined with Heuristic Search for So...

引用

33rd Chinese Control and Decision Conference (CCDC)

作者： Zhang, Xiuling Kang, Xuenan Wei, Kailun Li, Jinxiang Ma, Kai Yanshan Univ Engn Res Ctr Minist Educ Intelligent Control Syst & Intelligen Qinhuangdao 066004 Hebei Peoples R China Yanshan Univ Key Lab Ind Comp Control Engn Hebei Prov Qinhuangdao 066004 Hebei Peoples R China

ISBN: (纸本)9781665440899

Reinforcement learning (RL) has been successfully applied to solve path planning problems, but learning is generally slow. The main reason is not making full use of information collected during interaction with the environment. This paper proposes a novel method to solve the discrete space path planning problem in an environment without prior knowledge with intensive obstacles based on RL and heuristic search. Firstly, we apply dyna-q algorithm of RL to explore the map and search for the target point and optimize its policy with upper confidence bound (UCB). Then, when the target point is found, we use heuristic search to plan the path from the starting point to the target point and narrow the path to a small range. Finally, we combine dyna-q algorithm with the heuristic search recommended path for path planning. We evaluate our algorithm using maze navigation problem. The results verify that heuristic search accelerates Dyan-q convergence.

关键词： dyna-q algorithm Heuristic search Path planning Discrete space

来源：评论

学校读者我要写书评

暂无评论

Reinforcement Learning Combined with Heuristic Search for Solving Discrete Space Path Planning Problems

Reinforcement Learning Combined with Heuristic Search for So...

引用

第33届中国控制与决策会议

作者： Xiuling Zhang Xuenan Kang Kailun Wei Jinxiang Li Kai Ma Engineering Research Center of the Ministry of Education for Intelligent Control System and Intelligent Equipment Yanshan University Key Laboratory of Industrial Computer Control Engineering of Hebei Province Yanshan University

Reinforcement learning(RL) has been successfully applied to solve path planning problems,but learning is generally *** main reason is not making full use of information collected during interaction with the *** paper proposes a novel method to solve the discrete space path planning problem in an environment without prior knowledge with intensive obstacles based on RL and heuristic ***,we apply dyna-q algorithm of RL to explore the map and search for the target point and optimize its policy with upper confidence bound(UCB).Then,when the target point is found,we use heuristic search to plan the path from the starting point to the target point and narrow the path to a small ***,we combine dyna-q algorithm with the heuristic search recommended path for path *** evaluate our algorithm using maze navigation *** results verify that heuristic search accelerates Dyan-q convergence.

关键词： dyna-q algorithm Heuristic search Path planning Discrete space

来源：评论

学校读者我要写书评

暂无评论

dyna-q-based vector direction for path planning problem of autonomous mobile robots in unknown environments

引用

ADVANCED ROBOTICS 2013年第3期27卷 159-173页

作者： Hoang Huu Viet An, Sang Hyeok Chung, Tae Choong Kyung Hee Univ Dept Comp Engn Yongin 446701 Gyeonggi South Korea

Reinforcement learning (RL) is a popular method for solving the path planning problem of autonomous mobile robots in unknown environments. However, the primary difficulty faced by learning robots using the RL method is that they learn too slowly in obstacle-dense environments. To more efficiently solve the path planning problem of autonomous mobile robots in such environments, this paper presents a novel approach in which the robot's learning process is divided into two phases. The first one is to accelerate the learning process for obtaining an optimal policy by developing the well-known dyna-q algorithm that trains the robot in learning actions for avoiding obstacles when following the vector direction. In this phase, the robot's position is represented as a uniform grid. At each time step, the robot performs an action to move to one of its eight adjacent cells, so the path obtained from the optimal policy may be longer than the true shortest path. The second one is to train the robot in learning a collision-free smooth path for decreasing the number of the heading changes of the robot. The simulation results show that the proposed approach is efficient for the path planning problem of autonomous mobile robots in unknown environments with dense obstacles.

关键词： autonomous mobile robots dyna-q algorithm path planning reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

On-line reinforcement learning control for urban traffic signals

引用

26th Chinese Control Conference

作者： Liu Zhi-Yong Ma Feng-Wei Wuyi Univ Informat Sch Jiangmen 529020 Guangdong Peoples R China

ISBN: (纸本)9787811240559

It is quit difficult to archive perfect effects by applying the traditional modeling and control methods-to the urban traffic signal control system because of non-linearity, fuzzyness, self-organization and uncertainty in the system. The artificial intelligence technologies may offer a new way to resolve this problem. In allusion to characteristics of the traffic signal control system, this paper proposes an on-line control algorithm based on dyna-q reinforcement learning, and utilizes the experiential knowledge gained by the traffic signal control agent in the trial-error process to estimate the model, and then plans the actions in the estimated model, accordingly it can accelerate the iterative process of the q-learning. This paper adapts TSIS(a microscopic traffic analysis software) to implement the simulation on two traffic trunk roads which consist of 10 intersections. Comparing with fixed-time control, genetic algorithm and q-learning control algorithm, simulation results indicate that dyna-q reinforcement learning algorithm has an obvious superiority.

关键词： urban trunk road coordination control reinforcement learning dyna-q algorithm agent

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：