检索结果-内蒙古大学图书馆

international Joint Conference on Neural Networks (IJCNN)

作者： Liu, Yujia Gao, Aiguo Wei, Qinglai Chinese Acad Sci Inst Automat Beijing Peoples R China North China Elect Power Res Inst Co Ltd Beijing Peoples R China

ISBN: (纸本)9780738133669

To guarantee the efficient performance of the power plant, an adaptive tacking controller for the nonlinear boiler-turbine system based on offline policy iteration adaptive dynamic prorgamming (ADP) method is proposed in this paper. The optimal tracking controller is obtained through offline learning, which can maintain the characteristics of load changes in drum boiler-turbine type power plants. To implement the proposed method, neural networks (NNs) are used to construct the cost function and approximate optimal solution is achived. Then convergence of the method is analyzed. Simulation studies on the typical boiler-turbine system demonstrate that the proposed control strategy can achieve a satisfactory performance during a short period.

关键词： Integral reinforcement learning adaptive dynamic programming (ADP) boiler-turbine system policy iteration neural network

来源：评论

学校读者我要写书评

暂无评论

Adaptive railway traffic control using approximate dynamic programming

引用

TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES 2020年 113卷 91-107页

作者： Ghasempour, Taha Heydecker, Benjamin UCL Ctr Transport Studies London WC1E 6BT England

This study presents an adaptive railway traffic controller for real-time operations based on approximate dynamic programming (ADP). By assessing requirements and opportunities, the controller aims to limit consecutive delays resulting from trains that entered a control area behind schedule by sequencing them at a critical location in a timely manner, thus representing the practical requirements of railway operations. This approach depends on an approximation to the value function of dynamic programming after optimisation from a specified state, which is estimated dynamically from operational experience using reinforcement learning techniques. By using this approximation, the ADP avoids extensive explicit evaluation of performance and so reduces the computational burden substantially. In this investigation, we explore formulations of the approximation function and variants of the learning techniques used to estimate it. Evaluation of the ADP methods in a stochastic simulation environment shows considerable improvements in consecutive delays by comparison with the current industry practice of First-Come-First-Served sequencing. We also found that estimates of parameters of the approximate value function are similar across a range of test scenarios with different mean train entry delays.

关键词： approximate dynamic programming reinforcement learning Railway traffic management Adaptive control

来源：评论

学校读者我要写书评

暂无评论

Solving Unit Commitment Problems with Multi-step Deep reinforcement learning

Solving Unit Commitment Problems with Multi-step Deep Reinfo...

引用

2021 ieee international Conference on Communications, Control, and Computing Technologies for Smart Grids, SmartGridComm 2021

作者： Qin, Jingtao Yu, Nanpeng Gao, Yuanqi University of California Department of Electrical and Computer Engineering RiversideCA92507 United States

ISBN: (纸本)9781665415026

Solving the unit commitment (UC) problem in a computationally efficient manner is a critical issue of electricity market operations. Optimization-based methods such as heuristics, dynamic programming, and mixed-integer quadratic programming (MIQP) often yield good solutions to the UC problem. However, the computation time of optimization-based methods grows exponentially with the number of generating units, which is a major bottleneck in practice. To address this issue, we formulate the UC problem as a Markov decision process and propose a novel multi-step deep reinforcement learning (RL)-based algorithm to solve the problem. We approximate the action-value function with neural networks and design an algorithm to determine the feasible action space. Numerical studies on a 5-generator test case show that our proposed algorithm significantly outperforms the deep Q-learning and yields similar level of performance as that of MIQP-based optimization in terms of optimality. The computation time of our proposed algorithm is much shorter than that of MIQP-based optimization methods. © 2021 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

A Review of Safe Online learning for Nonlinear Control Systems

A Review of Safe Online Learning for Nonlinear Control Syste...

引用

2021 international Conference on Unmanned Aircraft Systems, ICUAS 2021

作者： Osborne, Matthew Shin, Hyo-Sang Tsourdos, Antonios Centre for Autonomous and Cyber-Physical Systems School of Aerospace Transport and Manufacturing Cranfield University CranfieldMK430AL United Kingdom

ISBN: (纸本)9780738131153

learning for autonomous dynamic control systems that can adapt to unforeseen environmental changes are of great interest but the realisation of a practical and safe online learning algorithm is incredibly challenging. This paper highlights some of the main approaches for safe online learning of stabilisable nonlinear control systems with a focus on safety certification for stability. We categorise a non-exhaustive list of salient techniques, with a focus on traditional control theory as opposed to reinforcement learning and approximate dynamic programming. This paper also aims to provide a simplified overview of techniques as an introduction to the field. It is the first paper to our knowledge that compares key attributes and advantages of each technique in one paper. © 2021 ieee.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Multi-agent Deep reinforcement learning based Information-Energy Collaboration in Vehicle Edge Computing Networks

Multi-agent Deep Reinforcement Learning based Information-En...

引用

ieee international symposium on Personal, Indoor and Mobile Radio Communications (PIMRC)

作者： Yaoyu Feng Biling Zhang Jung-Lang Yu School of Network Education Beijing University of Posts and Telecommunications P. R. China Department of Electrical Engineering Fu Jen Catholic University New Taipei City Taiwan

ISBN: (数字)9798350362244

ISBN: (纸本)9798350362251

In the vehicle edge computing network (VECN), how to deal with the computation resources and energy resources shortage problem the roadside units (RSUs) encounter when they are performing delay sensitive computation tasks is an important issue, especially during the peak hours and the situation of VECN is dynamic. To complete the computation tasks on time with the minimum expenditure, in this paper, we investigate the problem of information-energy collaboration among RSUs, where the spectrum management is also involved. For the considered scenario, the RSUs’ strategies of spectrum selection, computation task offloading and energy sharing are derived from the formulated optimization problem. Since the proposed problem is a highly complex mixed-integer nonlinear programming problem and the strategies are coupled with each other, a multi-agent deep deterministic policy gradient (MADDPG) based algorithm is proposed to find the sub-optimal solutions quickly in a dynamic environment. The simulation results show that our approach is superior to the existing schemes in terms of total system expenditure and the spectral efficiency.

关键词： Energy resources Spectral efficiency Simulation Heuristic algorithms Collaboration programming Vehicle dynamics Optimization Edge computing Radio spectrum management

来源：评论

学校读者我要写书评

暂无评论

Deep reinforcement learning for Perishable Inventory Optimization Problem

Deep Reinforcement Learning for Perishable Inventory Optimiz...

引用

ieee international Conference on Industrial Engineering and Engineering Management

作者： Yusuke Nomura Ziang Liu Tatsushi Nishi Graduate School of Environmental Life Natural Science and Technology Okayama University Okayama City Okayama Japan

While global attention on reducing food waste has increased, the demand for perishable commodities such as food and pharmaceuticals is growing. This emphasizes the need for effective perishable inventory management, which has become increasingly complex due to the perishability of these products. Traditional optimization methods, such as dynamic programming, require significant time and effort to solve these challenges. In this study, we use Deep Q-Network and Proximal Policy Optimization, which are deep reinforcement learning methods that can give numerical and approximate solutions to complex problems. In the inventory problem considering costs such as ordering, storage, lost opportunities, and spoilage, we define the inventory status as the state, the ordering as the action, and the negative total cost as the reward. We conducted a performance comparison of the two methods with an aligned total number of time steps. Furthermore, through numerical experiments, it was confirmed that the application of both methods resulted in a cost reduction of at least approximately 30% compared to the basic stock policy.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Policy Iteration Algorithm for Constrained Cost Optimal Control of Discrete-Time Nonlinear System

Policy Iteration Algorithm for Constrained Cost Optimal Cont...

引用

international Joint Conference on Neural Networks (IJCNN)

作者： Li, Tao Wei, Qinglai Li, Hongyang Song, Ruizhuo Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing Peoples R China Univ Chinese Acad Sci Sch Artificial Intelligence Beijing Peoples R China Univ Sci & Technol Beijing Sch Automat Beijing Peoples R China

ISBN: (纸本)9780738133669

In this paper, optimal control problems with constraints on summation of auxiliary utility function are called constrained cost optimal control problems and a constrained cost policy iteration adaptive dynamic programming (ADP) algorithm is developed to solve constrained cost optimal control problems for discrete-time nonlinear systems. A convergence analysis is developed to guarantee that the iterative value functions non-increasingly convergent to the approximate optimal value function. It is also proven that any of the iterative control policy is feasible and can stabilize the nonlinear systems. Finally, a simulation example is given to illustrate the performance of the developed constrained cost policy iteration algorithm.

关键词： Adaptive dynamic programming (ADP) reinforcement learning constrained cost optimal control policy iteration

来源：评论

学校读者我要写书评

暂无评论

Bayesian Sequential Optimal Experimental Design for Linear Regression with reinforcement learning

Bayesian Sequential Optimal Experimental Design for Linear R...

引用

international Conference on Machine learning and Applications (ICMLA)

作者： Fadil Santosa Loren Anderson Dept. of Applied Mathematics and Statistics Johns Hopkins University Baltimore MD USA School of Mathematics University of Minnesota Twin Cities Minneapolis MN USA

We perform a comparison study on Bayesian sequential optimal experimental design algorithms applied to linear regression in two unknowns. We transform the Bayesian sequential optimal experimental design problem into a reinforcement learning problem to determine the power of deep reinforcement learning algorithms against baselines including batch design, greedy design, dynamic programming, and approximate dynamic programming. Using KL-divergence to measure information gain in the unknown parameters, we construct objectives for each algorithm to maximize information gain. This work showcases novel comparisons between the aforementioned algorithms and provides a new application of reinforcement learning to Bayesian sequential optimal experimental design for inverse problems in linear regression with multiple parameters.

关键词： Machine learning algorithms Inverse problems Heuristic algorithms Linear regression reinforcement learning Transforms Gain measurement

来源：评论

学校读者我要写书评

暂无评论

Safe Adaptive dynamic programming Method for Nonlinear Safety-Critical Systems with Disturbance 6

Safe Adaptive Dynamic Programming Method for Nonlinear Safet...

引用

6th international Conference on Robotics and Automation Engineering, ICRAE 2021

作者： Wang, Jinguang Zhang, Dehua Zhang, Jishi Zhu, Heyang Hu, Shaolin Qin, Chunbin Henan University School of Artificial Intelligence Kaifeng China Guangdong University of Petrochemical Technology School of Automation Maoming China

ISBN: (纸本)9781665406970

In this paper, a safe adaptive dynamic programming (SADP) method based on the barrier function (BF) is proposed for the optimal control problem of nonlinear safety-critical systems with the safety constraints and external disturbance. Firstly, the barrier function is used to transform the nonlinear system with the security constraints into a transformed system without the security constraints. Secondly, based on the transformed system, a new barrier-disturbance-related term is proposed to approximate the effect of the external disturbance. On the premise of satisfying the security constraints and stability, the neural network (NN) approximation method is used to approximate the optimal cost function and optimal control strategy of the system online. Finally, the simulation results show that the proposed method can make the system state convergence well and does not violate the security constraints. © 2021 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

DATE: Disturbance-Aware Traffic Engineering with reinforcement learning in Software-Defined Networks 29

DATE: Disturbance-Aware Traffic Engineering with Reinforceme...

引用

29th ieee/ACM international symposium on Quality of Service (IWQOS)

作者： Ye, Minghao Zhang, Junjie Guo, Zehua Chao, H. Jonathan NYU Dept Elect & Comp Engn New York NY 11201 USA Fortinet Inc Sunnyvale CA 94086 USA Beijing Inst Technol Beijing 100081 Peoples R China

ISBN: (纸本)9781665414944

Traffic Engineering (TE) has been applied to optimize network performance by routing/rerouting flows based on traffic loads and network topologies. To cope with network dynamics from emerging applications, it is essential to reroute flows more frequently than today's TE to maintain network performance. However, existing TE solutions may introduce considerable Quality of Service (QoS) degradation and service disruption since they do not take the potential negative impact of flow rerouting into account. In this paper, we apply a new QoS metric named network disturbance to gauge the impact of flow rerouting while optimizing network load balancing in backbone networks. To employ this metric in TE design, we propose a disturbance-aware TE called DATE, which uses reinforcement learning (RL) to intelligently select some critical flows between nodes for each traffic matrix and reroute them using Linear programming (LP) to jointly optimize network performance and disturbance. DATE is equipped with a customized actor-critic architecture and Graph Neural Networks (GNNs) to handle dynamic traffic and single link failures. Extensive evaluations show that DATE can outperform state-of-the-art TE methods with close-to-optimal load balancing performance while effectively mitigating the 99th percentile network disturbance by up to 31.6%.

关键词： Traffic Engineering Software-Defined Networking reinforcement learning Routing Network Disturbance Link Failure

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：