检索结果-内蒙古大学图书馆

ieee 34th Annual international symposium on Personal, Indoor and Mobile Radio Communications (PIMRC)

作者： Li, Yuting Liu, Yitong Liu, Xingcheng Tu, Qiang Xie, Yi Sun Yat Sen Univ Sch Elect & Informat Technol Guangzhou Peoples R China Sun Yat Sen Univ Sch Comp Sci & Engn Guangzhou Peoples R China Jiangsu Viscore Technol Co Ltd Suzhou Peoples R China

ISBN: (纸本)9781665464833

Mobile Edge Computing (MEC) is one of the key enabling technologies for future 6G wireless networks that can provide lower latency service and more efficient resource utilization for future intelligent applications and the Internet of Things (IoT), while also reducing the energy consumption of end devices. In the intricate dynamic edge environment, the task offloading problem is entangled with several factors, such as the uncertainty of online tasks, the heterogeneity of edge servers, and the mobility of devices. In this paper, considering the randomness of online task arrivals, time-varying channels, and mobility of devices, a deep reinforcement learning-based online task offloading (DRL-OTO) algorithm is designed to minimize the energy consumption of all mobile devices. Specifically, by portraying the system model consisting of the communication model, energy consumption model, and node mobility model, the task offloading optimization problem is modeled as a mixed integer nonlinear programming (MINLP) problem. By decomposing this problem, each mobile device first determines the edge server to be offloaded, and then the DRL-OTO algorithm is designed by utilizing the DDPG method, in which each mobile device is able to determine the offloading rate. Simulation results show that the proposed DRL-OTO algorithm can achieve fast convergence and is able to reduce energy consumption, thus increasing the utility of all devices in the dynamic edge environment.

关键词： Mobile edge computing task offloading deep reinforcement learning energy consumption

来源：评论

学校读者我要写书评

暂无评论

Development of a real-time learning scheduler using reinforcement learning concepts

Development of a real-time learning scheduler using reinforc...

引用

Proceedings of the 1994 ieee international symposium on Intelligent Control

作者： Rabelo, Luis C. Jones, Albert Yih, Yuehwern Ohio Univ Athens United States

A scheme for the scheduling of Flexible Manufacturing Systems (FMS) has been developed which divides the scheduling function (built upon a generic controller architecture) into four different steps: candidate rule selection, transient phenomena analysis, multicriteria compromise analysis, and learning. This scheme is based on a hybrid architecture which utilizes neural networks, simulation, genetic algorithms, and induction mechanism. This paper investigates the candidate rule selection process, which selects a small list of scheduling rules from a larger list of such rules. This candidate rule selector is developed by using the integration of dynamic programming and neural networks. The system achieves real-time learning using this approach. In addition, since an expert scheduler is not available, it utilizes reinforcement signals from the environment (a measure of how desirable the achieved state is as measured by the resulting performance criteria). The approach is discussed and further research issues are presented.

关键词： learning systems

来源：评论

学校读者我要写书评

暂无评论

Adaptive critic-based neurofuzzy controller for the steam generator water level

引用

ieee TRANSACTIONS ON NUCLEAR SCIENCE 2008年第3期55卷 1678-1685页

作者： Fakhrazari, Amin Boroushaki, Mehrdad Sharif Univ Technol Dept Mech Engn Tehran Iran

In this paper, an adaptive critic-based neurofuzzy controller is presented for water level regulation of nuclear steam generators. The problem has been of great concern for many years as the steam generator is a highly nonlinear system showing inverse response dynamics especially at low operating power levels. Fuzzy critic-based learning is a reinforcement learning method based on dynamic programming. The only information available for the critic agent is the system feedback which is interpreted as the last action the controller has performed in the previous state. The signal produced by the critic agent is used alongside the backpropagation of error algorithm to tune online conclusion parts of the fuzzy inference rules. The critic agent here has a proportional-derivative structure and the fuzzy rule base has nine rules. The proposed controller shows satisfactory transient responses, disturbance rejection and robustness to model uncertainty. Its simple design procedure and structure, nominates it as one of the suitable controller designs for the steam generator water level control in nuclear power plant industry.

关键词： adaptive critic-based design fuzzy logic reinforcement learning vertical U-tube steam generator

来源：评论

学校读者我要写书评

暂无评论

Bridging Hamilton-Jacobi Safety Analysis and reinforcement learning

Bridging Hamilton-Jacobi Safety Analysis and Reinforcement L...

引用

ieee international Conference on Robotics and Automation (ICRA)

作者： Fisac, Jaime E. Lugovoy, Neil E. Rubies-Royo, Vicenc Ghosh, Shromona Tomlin, Claire J. Univ Calif Berkeley Dept Elect Engn & Comp Sci Berkeley CA 94720 USA

ISBN: (纸本)9781538660263

Safety analysis is a necessary component in the design and deployment of autonomous robotic systems. Techniques from robust optimal control theory, such as Hamilton-Jacobi reachability analysis, allow a rigorous formalization of safety as guaranteed constraint satisfaction. Unfortunately, the computational complexity of these tools for general dynamical systems scales poorly with state dimension, making existing tools impractical beyond small problems. Modern reinforcement learning methods have shown promising ability to find approximate yet proficient solutions to optimal control problems in complex and high-dimensional systems, however their application has in practice been restricted to problems with an additive payoff over time, unsuitable for reasoning about safety. In recent work, we introduced a time-discounted modification of the problem of maximizing the minimum payoff over time, central to safety analysis, through a modified dynamic programming equation that induces a contraction mapping. Here, we show how a similar contraction mapping can render reinforcement learning techniques amenable to quantitative safety analysis as tools to approximate the safe set and optimal safety policy. This opens a new avenue of research connecting control-theoretic safety analysis and the reinforcement learning domain. We validate the correctness of our formulation by comparing safety results computed through Q-learning to analytic and numerical solutions, and demonstrate its scalability by learning safe sets and control policies for simulated systems of up to 18 state dimensions using value learning and policy gradient techniques.

关键词： Safety Automation reinforcement learning Robots Optimal control Jacobian matrices Reachability analysis

来源：评论

学校读者我要写书评

暂无评论

RLS Algorithms and Convergence Analysis Method for Online DLQR Control Design via Heuristic dynamic programming 16

RLS Algorithms and Convergence Analysis Method for Online DL...

引用

16th UKSim-AMSS international Conference on Computer Modelling and Simulation (UKSim)

作者： Santos, Watson R. M. Queiroz, Jonathan A. Neto, Joao Viana da F. Rego, Patricia H. M. Santana, Ewaldo Andrade, Gustavo Univ Estadual Maranhao Fed Univ Maranhao Fed Inst Maranhao Embedded Syst & Intelligent Control Lab Sao Luis Maranhao Brazil

ISBN: (纸本)9781479949236

In this paper, a method to design online optimal policies that encompasses Hamilton-Jacobi-Bellman (HJB) equation solution approximation and heuristic dynamic programming (HDP) approach is proposed. Recursive least squares (RLS) algorithms are developed to approximate the HJB equation solution that is supported by a sequence of greedy policies. The proposal investigates the convergence properties of a family of RLS algorithms and its numerical complexity in the context of reinforcement learning and optimal control. The algorithms are computationally evaluated in an electric circuit model that represents an MIMO dynamic system. The results presented herein emphasize the convergence behaviour of the RLS, projection and Kaczmarz algorithms that are developed for online applications.

关键词： Recursive Least Squares Heuristic dynamic programming RLS Convergence MIMO dynamic Systems Optimal Control Adaptive dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Optimal Control of a Wind Generator System Using Non-Squares Estimators 24

Optimal Control of a Wind Generator System Using Non-Squares...

引用

24th ieee international symposium on Industrial Electronics (ISIE)

作者： Queiroz, Jonathan Araujo Barros, Allan Kardec Neto, Joao Viana da F. Santana, Ewaldo Univ Fed Maranhao Biol Informat Proc Lab Sao Luis Brazil

ISBN: (纸本)9781467375542

The control of eolic and solar energy systems demands methods and technics adapted to the high degree of environment non-stationarities whose adjustments are carried out via adaptive filters. Among the best known are least mean square (LMS) and the recursive least square (RLS) algorithms [1] and [2]. However, those algorithms still fail to respond quickly to the optimal control of the doubly fed induction generator (DFIG) as required in online learning [3]. Here we propose a methodology based on approximate solutions to the linear quadratic regulator (LQR) by using a family of non-squares approximations [4], [5]. We show experimentally that the RLNS provides more accurate estimates for DLQR when compared to the RLS while showing a convergence speed to the actual solution in less than 50% of the iterations as required by the standard RLS estimator for approximating Ricatti equation solution via Heuristic dynamic programming (HDP) [6].

关键词： Heuristic dynamic programming Discrete Linear Quadratic Regulator Doubly Fed Induction Generator

来源：评论

学校读者我要写书评

暂无评论

On a Successful Application of Multi-Agent reinforcement learning to Operations Research Benchmarks

On a Successful Application of Multi-Agent Reinforcement Lea...

引用

ieee symposium on Adaptive dynamic programming and reinforcement learning, (ADPRL)

作者： Thomas Gabel Martin Riedmiller Department of Mathematics and Computer Science Institute of Cognitive Science University of Osnabrück Osnabruck Germany

In this paper, we suggest and analyze the use of approximate reinforcement learning techniques for a new category of challenging benchmark problems from the field of operations research. We demonstrate that interpreting and solving the task of job-shop scheduling as a multi-agent learning problem is beneficial for obtaining near-optimal solutions and can very well compete with alternative solution approaches. The evaluation of our algorithms focuses on numerous established operations research benchmark problems

关键词： learning Operations research Scheduling algorithm dynamic programming Application software Mathematics Computer science Cognitive science Bicycles Bridges

来源：评论

学校读者我要写书评

暂无评论

Solving PBQP-Based Register Allocation using Deep reinforcement learning 22

Solving PBQP-Based Register Allocation using Deep Reinforcem...

引用

20th ieee/ACM international symposium on Code Generation and Optimization (CGO)

作者： Kim, Minsu Park, Jeong-Keun Moon, Soo-Mook Seoul Natl Univ Dept Elect & Comp Engn Seoul South Korea

ISBN: (纸本)9781665405843

Irregularly structured registers are hard to abstract and allocate. Partitioned Boolean quadratic programming (PBQP) is a useful abstraction to represent complex register constraints, even those in highly irregular processors of automated test equipment (ATE) of DRAM memory chips. The PBQP problem is NP-hard, requiring a heuristic solution. If no spill is allowed as in ATE, however, we have to enumerate more to find a solution rather than to approximate, since a spill means a total compilation failure. We propose solving the PBQP problem with deep reinforcement learning (Deep-RL), more specifically, a model-based approach using Monte Carlo tree search and deep neural network as used in Alphazero, a proven Deep-RL technology. Through elaborate training with random PBQP graphs, our Deep-RL solver could cut the search space sharply, making an enumeration-based solution more affordable. Furthermore, by employing backtracking with a proper coloring order, Deep-RL can find a solution with modestly-trained neural networks with even less search space. Our experiments show that Deep-RL can successfully find a solution for 10 product-level ATE programs while searching much fewer (e.g., 1/3,500) states than the previous PBQP enumeration solver. Also, when applied to C programs in Byrn-test-suite for regular CPUs, it achieves a competitive performance to the existing PBQP register allocator in LLVM.

关键词： Training Program processors Neural networks reinforcement learning Search problems Registers Test equipment

来源：评论

学校读者我要写书评

暂无评论

A performance gradient perspective on approximate dynamic programming and its application to partially observable Markov decision processes

A performance gradient perspective on approximate dynamic pr...

引用

2006 ieee international symposium on Intelligent Control, ISIC 2006

作者： Dankert, James Lei, Yang Si, Jennie Department of Electrical Engineering Arizona State University Tempe AZ 85287-5706

ISBN: (纸本)0780397983

This paper shows an approach to integrating common approximate dynamic programming (ADP) algorithms into a theoretical framework to address both analytical characteristicsand algorithmic features. Several important insights are gained from this analysis, including new approaches to the creation of algorithms. Built on this paradigm, ADP learning algorithms are further developed to address a broader class of problems: optimization with partial observability. This framework is based on an average cost formulation which makes use of the concepts of differential costs and performance gradients to describe learning and optimization algorithms. Numerical simulations are conducted including a queueing problem and a maze problem to illustrate and verify features of the proposed algorithms. Pathways for applying this analysis to adaptive critics are also shown. ©2006 ieee.

关键词： learning algorithms

来源：评论

学校读者我要写书评

暂无评论

A New Discrete-Time Iterative Adaptive dynamic programming Algorithm Based on Q-learning 12th

引用

12th international symposium on Neural Networks (ISNN)

作者： Wei, Qinglai Liu, Derong Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China

ISBN: (纸本)9783319253930;9783319253923

In this paper, a novel Q-learning based policy iteration adaptive dynamic programming (ADP) algorithm is developed to solve the optimal control problems for discrete-time nonlinear systems. The idea is to use a policy iteration ADP technique to construct the iterative control law which stabilizes the system and simultaneously minimizes the iterative Q function. Convergence property is analyzed to show that the iterative Q function is monotonically non-increasing and converges to the solution of the optimality equation. Finally, simulation results are presented to show the performance of the developed algorithm.

关键词： Adaptive critic designs adaptive dynamic programming approximate dynamic programming Q-learning policy iteration neural networks nonlinear systems optimal control

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：