检索结果-内蒙古大学图书馆

adaptive dynamic programming for Robust Regulation and Its Application to Power Systems

ieee TRANSACTIONS ON INDUSTRIAL ELECTRONICS 2018年第7期65卷 5722-5732页

作者： Yang, Xiong He, Haibo Zhong, Xiangnan Tianjin Univ Sch Elect & Informat Engn Tianjin 300072 Peoples R China Univ Rhode Isl Dept Elect Comp & Biomed Engn Kingston RI 02881 USA Univ North Texas Dept Elect Engn Denton TX 76207 USA

This paper presents a novel robust regulation method for a class of continuous-time nonlinear systems subject to unmatched perturbations. To begin with, the robust regulation problem is transformed into an optimal regulation problem by constructing a value function for the auxiliary system. Then, a simultaneous policy iteration (SPI) algorithm is developed to solve the optimal regulation problem with in the framework of adaptive dynamic programming. To implement the SPI algorithm, actor and critic networks are employed to approximate the optimal control and the optimal value function, respectively, and the Monte Carlo integration method is applied to obtain the unknown weight parameters. Finally, two examples, including a power system, are provided to demonstrate the applicability of the developed approach.

关键词： adaptive dynamic programming (ADP) neural network optimal control reinforcement learning robust regulation unmatched perturbation

来源：评论

学校读者我要写书评

暂无评论

2019 ieee 58th Conference on Decision and Control, CDC 2019

2019 IEEE 58th Conference on Decision and Control, CDC 2019

引用

58th ieee Conference on Decision and Control, CDC 2019

ISBN: (纸本)9781728113982

The proceedings contain 1192 papers. The topics discussed include: stochastic subgradient methods for dynamic programming in continuous state and action spaces;characterizing the interplay between information and strength in blotto games;a unifying approach to maximal permissiveness in modular control of discrete-event systems;adaptive optimization and control in online advertising;on the stability of coupled differential-difference systems with multiple time-varying delays: a positivity-based approach;on the optimal control of Volterra Integro-differential equations;network modification using a novel Gramian-based edge centrality;secure linear quadratic regulator using sparse model-free reinforcement learning;verification of switched stochastic systems via barrier certificates;and the turnpike property in nonlinear optimal control " a geometric approach.

关键词：

来源：评论

学校读者我要写书评

暂无评论

SOSA: Self-Optimizing learning with Self-adaptive Control for Hierarchical System-on-Chip Management 52

SOSA: Self-Optimizing Learning with Self-Adaptive Control fo...

引用

52nd Annual ieee/ACM International symposium on Microarchitecture (MICRO)

作者： Donyanavard, Bryan Muck, Tiago Rahmani, Amir M. Dutt, Nikil Sadighi, Armin Maurer, Florian Herkersdorf, Andreas UC Irvine Irvine CA 92697 USA Tech Univ Munich Munich Germany

ISBN: (纸本)9781450369381

Resource management strategies for many-core systems dictate the sharing of resources among applications such as power, processing cores, and memory bandwidth in order to achieve system goals. System goals require consideration of both system constraints (e.g., power envelope) and user demands (e.g., response time, energy-efficiency). Existing approaches use heuristics, control theory, and machine learning for resource management. They all depend on static system models, requiring a priori knowledge of system dynamics, and are therefore too rigid to adapt to emerging workloads or changing system dynamics. We present SOSA, a cross-layer hardware/software hierarchical resource manager. Low-level controllers optimize knob configurations to meet potentially conflicting objectives (e.g., maximize throughput and minimize energy). SOSA accomplishes this for many-core systems and unpredictable dynamic workloads by using rule-based reinforcement learning to build subsystem models from scratch at runtime. SOSA employs a high-level supervisor to respond to changing system goals due to operating condition, e.g., switch from maximizing performance to minimizing power due to a thermal event. SOSA's supervisor translates the system goal into low-level objectives (e.g., core instructions-per-second (IPS)) in order to control subsystems by coordinating numerous knobs (e.g., core operating frequency, task distribution) towards achieving the goal. The software supervisor allows for flexibility, while the hardware learners allow quick and efficient optimization. We evaluate a simulation-based implementation of SOSA and demonstrate SOSA's ability to manage multiple interacting resources in the presence of conflicting objectives, its efficiency in configuring knobs, and adaptability in the face of unpredictable workloads. Executing a combination of machine-learning kernels and microbenchmarks on a multicore system-on-a-chip, SOSA achieves target performance with less than 1% error starti

关键词： System-on-chip

来源：评论

学校读者我要写书评

暂无评论

Virtual Network Function Embedding under Nodal Outage using reinforcement learning

Virtual Network Function Embedding under Nodal Outage using ...

引用

International symposium on Advanced Networks and Telecommunication Systems (ANTS)

作者： Swarna Bindu Chetty Hamed Ahmadi Avishek Nag School of Electrical and Electronic Engineering University College Dublin Dublin Ireland University of York United Kingdom

ISBN: (数字)9781728192901

ISBN: (纸本)9781728192918

With the emergence of various types of applications such as delay-sensitive applications, future communication networks are expected to be increasingly complex and dynamic. Network Function Virtualization (NFV) provides the necessary support towards efficient management of such complex networks, by disintegrating the dependency on the hardware devices via virtualizing the network functions and placing them on shared data centres. However, one of the main challenges of the NFV paradigm is the resource allocation problem which is known as NFV-Resource Allocation (NFV-RA). NFV-RA is a method of deploying software-based network functions on the substrate nodes, subject to the constraints imposed by the underlying infrastructure and the agreed Service Level Agreement (SLA). This work investigates the potential of reinforcement learning (RL) as a fast yet accurate means (as compared to integer linear programming) for deploying the softwarized network functions onto substrate networks under several Quality of Service (QoS) constraints. In addition to the regular resource constraints and latency constraints, we introduced the concept of a complete outage of certain nodes in the network. This outage can be either due to a disaster or unavailability of network topology information due to proprietary and ownership issues. We have analyzed the network performance on different network topologies, different capacities of the nodes and the links, and different degrees of the nodal outage. The computational time escalated with the increase in the network density to achieve the optimal solutions; this is because Q-learning is an iterative process which results in a slow exploration. Our results also show that for certain topologies and a certain combination of resources, we can achieve between 7090% service acceptance rate even with a 40% nodal outage.

关键词： reinforcement learning Quality of service Network function virtualization Topology Resource management Substrates Service level agreements

来源：评论

学校读者我要写书评

暂无评论

UCT-ADP Progressive Bias Algorithm for Solving Gomoku

UCT-ADP Progressive Bias Algorithm for Solving Gomoku

引用

ieee symposium Series on Computational Intelligence (SSCI)

作者： Xu Cao Yanghao Lin School of Data Science Fudan University Shanghai China

ISBN: (数字)9781728124858

ISBN: (纸本)9781728124865

We combine adaptive dynamic programming (ADP), a reinforcement learning method and UCB applied to trees (UCT) algorithm with a more powerful heuristic function based on Progressive Bias method and two pruning strategies for a traditional board game Gomoku. For the adaptive dynamic programming part, we train a shallow forward neural network to give a quick evaluation of Gomoku board situations. UCT is a general approach in MCTS as a tree policy. Our framework use UCT to balance the exploration and exploitation of Gomoku game trees while we also apply powerful pruning strategies and heuristic function to re-select the available 2-adjacent grids of the state and use ADP instead of simulation to give estimated values of expanded nodes. Experiment result shows that this method can eliminate the search depth defect of the simulation process and converge to the correct value faster than single UCT. This approach can be applied to design new Gomoku AI and solve other Gomoku-like board game.

关键词： Games Heuristic algorithms Monte Carlo methods dynamic programming Artificial intelligence adaptive systems Neural networks

来源：评论

学校读者我要写书评

暂无评论

learning-Based Predictive Control for Discrete-Time Nonlinear Systems With Stochastic Disturbances

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2018年第12期29卷 6202-6213页

作者： Xu, Xin Chen, Hong Lian, Chuanqiang Li, Dazi Natl Univ Def Technol Coll Intelligence Sci Changsha 410073 Hunan Peoples R China Jilin Univ NanLing State Key Lab Automot Simulat & Control Changchun 130025 Jilin Peoples R China Jilin Univ NanLing Dept Control Sci & Engn Changchun 130025 Jilin Peoples R China Naval Univ Engn Natl Key Lab Sci & Technol Vessel Integrated Powe Wuhan 430032 Hubei Peoples R China Beijing Univ Chem Technol Dept Automat Beijing 100029 Peoples R China

In this paper, a learning-based predictive control (LPC) scheme is proposed for adaptive optimal control of discrete-time nonlinear systems under stochastic disturbances. The proposed LPC scheme is different from conventional model predictive control (MPC), which uses open-loop optimization or simplified closed-loop optimal control techniques in each horizon. In LPC, the control task in each horizon is formulated as a closed-loop nonlinear optimal control problem and a finite-horizon iterative reinforcement learning (RL) algorithm is developed to obtain the closed-loop optimal/suboptimal solutions. Therefore, in LPC, RL and adaptive dynamic programming ( ADP) are used as a new class of closed-loop learning-based optimization techniques for nonlinear predictive control with stochastic disturbances. Moreover, LPC also decomposes the infinite-horizon optimal control problem in previous RL and ADP methods into a series of finite horizon problems, so that the computational costs are reduced and the learning efficiency can be improved. Convergence of the finite-horizon iterative RL algorithm in each prediction horizon and the Lyapunov stability of the closed-loop control system are proved. Moreover, by using successive policy updates between adjoint time horizons, LPC also has lower computational costs than conventional MPC which has independent optimization procedures between two different prediction horizons. Simulation results illustrate that compared with conventional nonlinear MPC as well as ADP, the proposed LPC scheme can obtain a better performance both in terms of policy optimality and computational efficiency.

关键词： adaptive dynamic programming (ADP) function approximation model predictive control (MPC) optimal control receding horizon reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

Online Approximate Optimal Station Keeping of a Marine Craft in the Presence of an Irrotational Current

引用

ieee TRANSACTIONS ON ROBOTICS 2018年第2期34卷 486-496页

作者： Walters, Patrick Kamalapurkar, Rushikesh Voight, Forrest Schwartz, Eric M. Dixon, Warren E. Univ Florida Dept Mech & Aerosp Engn Gainesville FL 32611 USA Oklahoma State Univ Dept Mech & Aerosp Engn Stillwater OK 74074 USA Univ Florida Dept Elect & Comp Engn Gainesville FL 32611 USA

Online approximation of the optimal station-keeping strategy for a marine craft subject to an irrotational current is considered. An approximate policy that minimizes a user-defined cost function over an infinite time horizon is obtained using an actor-critic-identifier-based adaptive dynamic programming technique. The hydrodynamic drift dynamics are assumed to be unknown;therefore, a concurrent learning-based system identifier is developed to identify the unknown model parameters. The identified model is used to implement an adaptive model-based reinforcement learning technique to estimate the unknown value function. The developed policy guarantees uniformly ultimately bounded convergence of the vehicle to the desired station and uniformly ultimately bounded convergence of the approximated policies to the optimal polices without the requirement of persistence of excitation. The developed strategy is validated using an autonomous underwater vehicle, where the three degrees-of-freedom in the horizontal plane are regulated. The experiments are conducted in a second-magnitude spring located in central Florida.

关键词： adaptive dynamic programming (ADP) marine craft nonlinear control station keeping

来源：评论

学校读者我要写书评

暂无评论

A Low-Power Circuit for adaptive dynamic programming 31

A Low-Power Circuit for Adaptive Dynamic Programming

引用

31st International Conference on VLSI Design / 17th International Conference on Embedded Systems (VLSID & ES))

作者： Zheng, Nan Mazumder, Pinaki Univ Michigan Elect Engn & Comp Sci Dept Ann Arbor MI 48109 USA

ISBN: (纸本)9781538636923

This paper presents a low-power CMOS design for accelerating an adaptive dynamic programming algorithm, called action-dependent heuristic dynamic programming, which is widely employed in many real-life control problems. The objective of this work is to solve Bellman equation approximately using efficient hardware in order to generate near-optimal real-time control policies for many control applications. The hardware exploits the data-level parallelism exists in both inference and learning of neural networks in order to improve throughput as well as to provide good scalability. The circuit is realized in a 65-nm technology. It is shown with simulations that the design is two orders of magnitude faster than a software running on a general-purpose processor thanks to the parallelization of the algorithm and the reduction in unnecessary control overheads. Performance of the CMOS circuit is benchmarked with two popular control tasks. Successful learning can be achieved with a power consumption of 28 mW.

关键词： reinforcement learning adaptive dynamic programming neural networks low-power CMOS circuit

来源：评论

学校读者我要写书评

暂无评论

On Model-Free reinforcement learning of Reduced-order Optimal Control for Singularly Perturbed Systems 57

On Model-Free Reinforcement Learning of Reduced-order Optima...

引用

57th ieee Conference on Decision and Control (CDC)

作者： Mukherjee, Sayak Bai, He Chakrabortty, Aranya North Carolina State Univ Dept Elect & Comp Engn Raleigh NC 27695 USA Oklahoma State Univ Sch Mech & Aerosp Engn Stillwater OK 74078 USA

ISBN: (纸本)9781538613955

We propose a model-free reduced-order optimal control design for linear time-invariant singularly perturbed (SP) systems using reinforcement learning (RL). Both the state and input matrices of the plant model are assumed to be completely unknown. The only assumption imposed is that the model admits a similarity transformation that results in a SP representation. We propose a variant of adaptive dynamic programming (ADP) that employs only the slow states of this SP model to learn a reduced-order adaptive optimal controller. The method significantly reduces the learning time, and complexity required for the feedback control by taking advantage of this model reduction. We use approximation theorems from singular perturbation theory to establish sub-optimality of the learned controller, and to guarantee closed-loop stability. We validate our results using two representative examples -one with a standard singularly perturbed dynamics, and the other with clustered multi-agent consensus dynamics. Both examples highlight various implementation details and effectiveness of the proposed approach.

关键词： reinforcement learning adaptive dynamic programming model reduction model free control singular perturbation

来源：评论

学校读者我要写书评

暂无评论

ieee SSCI 2011: symposium Series on Computational Intelligence - ADPRL 2011: 2011 ieee symposium on adaptive dynamic programming and reinforcement learning

IEEE SSCI 2011: Symposium Series on Computational Intelligen...

引用

symposium Series on Computational Intelligence, ieee SSCI2011 - 2011 ieee symposium on adaptive dynamic programming and reinforcement learning, ADPRL 2011

ISBN: (纸本)9781424498888

The proceedings contain 45 papers. The topics discussed include: active learning for personalizing treatment;active exploration by searching for experiments that falsify the computed control policy;optimistic planning for sparsely stochastic systems;adaptive sample collection using active learning for kernel-based approximate policy iteration;tree-based variable selection for dimensionality reduction of large-scale control systems;high-order local dynamic programming;safe reinforcement learning in high-risk tasks through policy improvement;agent self-assessment: determining policy quality without execution;reinforcement learning algorithms for solving classification problems;reinforcement learning in multidimensional continuous action spaces;grounding subgoals in information transitions;and directed exploration of policy space using support vector classifiers.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：