检索结果-内蒙古大学图书馆

A novel triggering condition of event-triggered control based on heuristic dynamic programming for discrete-time systems

引用

OPTIMAL CONTROL APPLICATIONS & METHODS 2018年第4期39卷 1467-1478页

作者： Wang, Ziyang Wei, Qinglai Liu, Derong Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing Peoples R China Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Guangdong Univ Technol Sch Automat Guangzhou Guangdong Peoples R China

In this paper, an event-triggered heuristic dynamic programming algorithm for discrete-time nonlinear systems with a novel triggering condition is studied. Different from traditional heuristic dynamic programming algorithms, the control law in this algorithm will only be updated when the triggering condition is satisfied to reduce the computational burden. Three neural networks are employed, which are model network, action network, and critic network. Model functions, control laws, and value functions are estimated using neural networks, respectively. The main contribution of this algorithm is the novel triggering condition with simpler form and fewer assumptions. Additionally, a proof of the stability for discrete-time systems using Lyapunov technique is given. Finally, two simulations are shown to verify the effectiveness of the developed algorithm.

关键词： adaptive dynamic design adaptive dynamic programming approximate dynamic programming event-triggered control optimal control

来源：评论

学校读者我要写书评

暂无评论

Spatial Resource Allocation for Emerging Epidemics: A Comparison of Greedy, Myopic, and dynamic Policies

引用

M&SOM-MANUFACTURING & SERVICE OPERATIONS MANAGEMENT 2018年第2期20卷 181-198页

作者： Long, Elisa F. Nohdurft, Eike Spinler, Stefan Univ Calif Los Angeles Anderson Sch Management Los Angeles CA 90095 USA Kuhne Inst Logist Management WHU Otto Beisheim Sch Management D-56179 Vallendar Germany

Rapidly evolving infectious disease epidemics, such as the 2014 West African Ebola outbreak, pose significant health threats and present challenges to the global health community because of their heterogeneous geographic spread. Policy makers must allocate limited intervention resources quickly, in anticipation of where the outbreak is moving next. We develop a two-stage model for optimizing when and where to assign Ebola treatment units across geographic regions during the outbreak's early phases. The first stage employs a novel dynamic transmission model to forecast the occurrence of new cases at the region level, capturing connectivity among regions. We introduce an empirically estimated coefficient for behavioral adaptation to changing epidemic conditions. The second stage compares four approaches to allocate units across affected regions: (i) a heuristic based on observed cases, (ii) a greedy policy that prioritizes regions based on the reproductive number, (iii) a myopic linear program that allocates resources in the next period based on an iterative estimation-optimization approach coupled with the underlying epidemic model, and (iv) an approximate dynamic programming algorithm that optimizes over all future periods. After testing the allocation schemes under different budgets and time periods, we find that the myopic policy performs best, even when limited data are available. Our methodology could be generalized to other disease outbreaks, including the Zika virus, and other interventions.

关键词： healthcare operations resource allocation approximate dynamic programming epidemiology

来源：评论

学校读者我要写书评

暂无评论

Boundary Control of 2-D Burgers' PDE: An Adaptive dynamic programming Approach

引用

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018年第8期29卷 3669-3681页

作者： Talaei, Behzad Jagannathan, Sarangapani Singler, John Missouri Univ Sci & Technol Dept Elect & Comp Engn Rolla MO 65401 USA Missouri Univ Sci & Technol Dept Math & Stat Rolla MO 65401 USA

In this paper, an adaptive dynamic programming-based near optimal boundary controller is developed for partial differential equations (PDEs) modeled by the uncertain Burgers' equation under Neumann boundary condition in 2-D. Initially, Hamilton-Jacobi-Bellman equation is derived in infinite-dimensional space. Subsequently, a novel neural network (NN) identifier is introduced to approximate the nonlinear dynamics in the 2-D PDE. The optimal control input is derived by online estimation of the value function through an additional NN-based forward-in-time estimation and approximated dynamic model. Novel update laws are developed for estimation of the identifier and value function online. The designed control policy can be applied using a finite number of actuators at the boundaries. Local ultimate boundedness of the closed-loop system is studied in detail using Lyapunov theory. Simulation results confirm the optimizing performance of the proposed controller on an unstable 2-D Burgers' equation.

关键词： 2-D partial differential equations (PDEs) approximate dynamic programming Burgers' equation PDE boundary control

来源：评论

学校读者我要写书评

暂无评论

Robust Scheduling of EV Charging Load With Uncertain Wind Power Integration

引用

IEEE TRANSACTIONS ON SMART GRID 2018年第2期9卷 1043-1054页

作者： Huang, Qilong Jia, Qing-Shan Guan, Xiaohong Tsinghua Univ Dept Automat Ctr Intelligent & Networked Syst Beijing 100084 Peoples R China Xi An Jiao Tong Univ MOE KLINNS Lab Xian 710049 Peoples R China

In some micro grids, the charging of electric vehicles (EVs) and the generation of wind power may partially cancel each other. This is an effective way to reduce the variation of the wind power to the state grid. Due to the forecasting error, it is of great practical interest to schedule the EV charging demand under the worst-case scenario of the wind power generation. We consider this important robust scheduling problem in this paper and make three major contributions. First, we formulate this robust scheduling problem as a robust stochastic shortest path problem whereby the objective function is a weighted sum of the wind power utilization and the total charging cost. Second, a robust simulation-based policy improvement method is developed to improve the performance of a base policy in the worst case. This improvement is mathematically shown under mild assumptions. Third, the performance of this method is numerically demonstrated based on real wind and EV data.

关键词： approximate dynamic programming electric vehicle robust Markov decision process wind energy

来源：评论

学校读者我要写书评

暂无评论

dynamic bus substitution strategy for bunching intervention

引用

TRANSPORTATION RESEARCH PART B-METHODOLOGICAL 2018年 115卷 1-16页

作者： Petit, Antoine Ouyang, Yanfeng Lei, Chao Univ Illinois Dept Civil & Environm Engn Urbana IL 61801 USA

Bus headways are typically susceptible to external disturbances (e.g., due to traffic congestion, clustered passenger arrivals, and special passenger needs), which create gaps in the system that grow eventually into bunching. Although many control strategies, such as static and dynamic holding strategies, have been implemented to mitigate the effects of unreliable bus schedules, most of them would impose longer dwell times on the passengers. In this paper, we investigate the potential of an alternative bus substitution strategy that is currently implemented by some transit agencies in an ad-hoc manner. In this strategy, the agency deploys a fleet of standby buses to take over service from any early or late buses so as to contain deviations from schedule, and the intention is to impose minimum penalties on the onboard passengers. We develop a discrete-time infinite-horizon approximate dynamic programming approach to find the optimal policy to minimize the overall agency and passenger costs. It is shown through numerical examples that schedule deviations can be controlled by regularly inserting standby buses as substitutions. In some implementation scenarios, the proposed strategy holds the potential to achieve comparable performance with some of the most advanced strategies, and to outperform the conventional slack-based schedule control scheme. In light of the emerging opportunities associated with autonomous driving, the performance of the proposed strategy can become even stronger due to the reduction in costs for keeping the fleet of standby buses. (C) 2018 Elsevier Ltd. All rights reserved.

关键词： Bus bunching Transit operations approximate dynamic programming Autonomous vehicles

来源：评论

学校读者我要写书评

暂无评论

Adaptive dynamic programming for Discrete-Time Zero-Sum Games

引用

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018年第4期29卷 957-969页

作者： Wei, Qinglai Liu, Derong Lin, Qiao Song, Ruizhuo Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Univ Chinese Acad Sci Beijing 100049 Peoples R China Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China

In this paper, a novel adaptive dynamic programming (ADP) algorithm, called "iterative zero-sum ADP algorithm," is developed to solve infinite-horizon discrete-time two-player zero-sum games of nonlinear systems. The present iterative zero-sum ADP algorithm permits arbitrary positive semidefinite functions to initialize the upper and lower iterations. A novel convergence analysis is developed to guarantee the upper and lower iterative value functions to converge to the upper and lower optimums, respectively. When the saddle-point equilibrium exists, it is emphasized that both the upper and lower iterative value functions are proved to converge to the optimal solution of the zero-sum game, where the existence criteria of the saddle-point equilibrium are not required. If the saddle-point equilibrium does not exist, the upper and lower optimal performance index functions are obtained, respectively, where the upper and lower performance index functions are proved to be not equivalent. Finally, simulation results and comparisons are shown to illustrate the performance of the present method.

关键词： Adaptive critic designs adaptive dynamic programming (ADP) approximate dynamic programming neurodynamic programming optimal control zero-sum game

来源：评论

学校读者我要写书评

暂无评论

Relationship between least squares Monte Carlo and approximate linear programming

引用

OPERATIONS RESEARCH LETTERS 2017年第5期45卷 409-414页

作者： Nadarajah, Selvaprabu Secomandi, Nicola Univ Illinois Coll Business Adm 601 South Morgan St Chicago IL 60607 USA Carnegie Mellon Univ Tepper Sch Business 5000 Forbes Ave Pittsburgh PA 15213 USA

Least squares Monte Carlo (LSM) is commonly used to manage and value early or multiple exercise financial or real options. Recent research in this area has started applying approximate linear programming (ALP) and its relaxations, which aim at addressing a possible ALP drawback. We show that regress-later LSM is itself an ALP relaxation that potentially corrects this ALP shortcoming. Our analysis consolidates two streams of research and supports using this LSM version rather than ALP on the considered models. (C) 2017 Elsevier B.V. All rights reserved.

关键词： Markov decision processes approximate dynamic programming Least squares Monte Carlo approximate linear programming Financial and real options Energy storage

来源：评论

学校读者我要写书评

暂无评论

Adaptive Virtual Resource Allocation in 5G Network Slicing Using Constrained Markov Decision Process

引用

IEEE ACCESS 2018年 6卷 61184-61195页

作者： Tang, Lun Tan, Qi Shi, Yingjie Wang, Chenmeng Chen, Qianbin Chongqing Univ Posts & Telecommun Key Lab Mobile Commun Sch Commun & Informat Engn Chongqing 400065 Peoples R China

Network virtualization technology is generally envisaged as a promising technology to consequently satisfy various types of service requirements. On the other hand, non-orthogonal multiple access (NOMA) technology has the potential to significantly increase the spectral efficiency of the system. However, previous works that jointly address these two issues have not considered the dynamic resource allocation issue in this context. In this paper, we propose a slice-based virtual resources scheduling scheme with NOMA technology to enhance the quality-of-service (QoS) of the system. We formulate the power granularity allocation and subcarrier allocation strategies into a constrained Markov decision process problem, aiming at the maximization of the total user rate. In order to further avoid the curse of dimensionality and the expectation calculation in the optimal value function, we develop an adaptive resource allocation algorithm based on approximate dynamic programming to solve the problem. Extensive simulation works have been conducted under various system settings, and the results demonstrate that the proposed algorithm can significantly reduce the outage probability and increase the user data rate.

关键词： 5G slice adaptive virtual resource allocation constrained Markov decision process approximate dynamic programming NOMA

来源：评论

学校读者我要写书评

暂无评论

Iterative ADP learning algorithms for discrete-time multi-player games

引用

ARTIFICIAL INTELLIGENCE REVIEW 2018年第1期50卷 75-91页

作者： Jiang, He Zhang, Huaguang Northeastern Univ Coll Informat Sci & Engn Shenyang 110819 Liaoning Peoples R China

Adaptive dynamic programming (ADP) is an important branch of reinforcement learning to solve various optimal control issues. Most practical nonlinear systems are controlled by more than one controller. Each controller is a player, and to make a tradeoff between cooperation and conflict of these players can be viewed as a game. Multi-player games are divided into two main categories: zero-sum game and non-zero-sum game. To obtain the optimal control policy for each player, one needs to solve Hamilton-Jacobi-Isaacs equations for zero-sum games and a set of coupled Hamilton-Jacobi equations for non-zero-sum games. Unfortunately, these equations are generally difficult or even impossible to be solved analytically. To overcome this bottleneck, two ADP methods, including a modified gradient-descent-based online algorithm and a novel iterative offline learning approach, are proposed in this paper. Furthermore, to implement the proposed methods, we employ single-network structure, which obviously reduces computation burden compared with traditional multiple-network architecture. Simulation results demonstrate the effectiveness of our schemes.

关键词： Adaptive dynamic programming approximate dynamic programming Reinforcement learning Neural network

来源：评论

学校读者我要写书评

暂无评论

Manifold Regularized Reinforcement Learning

引用

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018年第4期29卷 932-943页

作者： Li, Hongliang Liu, Derong Wang, Ding Tencent Inc AI Platform Dept Shenzhen 518057 Peoples R China Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

This paper introduces a novel manifold regularized reinforcement learning scheme for continuous Markov decision processes. Smooth feature representations for value function approximation can be automatically learned using the unsupervised manifold regularization method. The learned features are data-driven, and can be adapted to the geometry of the state space. Furthermore, the scheme provides a direct basis representation extension for novel samples during policy learning and control. The performance of the proposed scheme is evaluated on two benchmark control tasks, i.e., the inverted pendulum and the energy storage problem. Simulation results illustrate the concepts of the proposed scheme and show that it can obtain excellent performance.

关键词： Adaptive dynamic programming approximate dynamic programming approximate policy iteration (API) manifold regularization reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：