检索结果-内蒙古大学图书馆

IEEE Power and Energy Society General Meeting

作者： Diogenes Molina Jiaqi Liang Ronald G. Harley Ganesh Kumar Venayagamoorthy Intelligent Power Infrastructure Consortium Department of Electrical and Computer Engineering Georgia Institute of Technology Atlanta GA 30332 USA Holcombe Department of Electrical and Computer Engineering Clemson University Clemson SC 29634 USA

ISBN: (纸本)9781467327275

This paper introduces a new concept called a Virtual Generator (VG). VGs are simplified representations of groups of coherent synchronous generators in a power system. They resemble commonly used power system dynamic equivalents obtained via generator aggregation techniques. Traditionally power system dynamic equivalents are developed offline, fixed, and used to replace large portions of the system that are considered external to the portion of the system being analyzed in detail. In contrast, VGs are calculated online, are not limited to representing external areas of the system being analyzed/controlled, and do not replace any portion of the power system. Instead, they allow wide-area damping controllers (WADCs) to exploit the realization that a group of coherent synchronous generators in a power system can be controlled as a single generating unit for achieving wide-area damping control objectives. The implementation of VGs is made possible by the availability of Wide-Area Measurements (WAMs) from Phasor Measurement Units (PMUs). To the authors' knowledge, this is the first time that the use of power system equivalencing techniques has been extended to real-time WADC. Simulation studies carried out on the 68-bus New England/New York power system demonstrate that intelligent controllers developed using VGs can significantly improve the stability of a power system by effectively damping low-frequency interarea oscillations.

关键词： virtual generator power system stabilizer wide-area control power system equivalents intelligent control approximate dynamic programming adaptive critic designs generator coherency interarea oscillations power systems damped control dynamos intelligent controller Phasor measurement units Power system dynamics Generating sets Synchronous generators representations Power system stability

来源：评论

学校读者我要写书评

暂无评论

A Least-Squares Temporal Difference based method for solving resource allocation problems

引用

IFAC JOURNAL OF SYSTEMS AND CONTROL 2020年 13卷

作者： Forootani, Ali Tipaldi, Massimo Zarch, Majid Ghaniee Liuzza, Davide Glielmo, Luigi Univ Sannio Dept Engn Piazza Roma I-82100 Benevento Italy Bu Ali Sina Univ Dept Elect Engn Hamadan Hamadan Iran ENEA Fus & Nucl Safety Dept Rome Italy

Value function approximation has a central role in approximate dynamic programming (ADP) to overcome the so-called curse of dimensionality associated to real stochastic processes. In this regard, we propose a novel Least-Squares Temporal Difference (LSTD) based method: the ``Multi-trajectory Greedy LSTD'' (MG-LSTD). It is an exploration-enhanced recursive LSTD algorithm with the policy improvement embedded within the LSTD iterations. It makes use of multi-trajectories Monte Carlo simulations in order to enhance the system state space exploration. This method is applied for solving resource allocation problems modeled via a constrained Stochastic dynamic programming (SDP) based framework. In particular, such problems are formulated as a set of parallel Birth-Death Processes (BDPs). Some operational scenarios are defined and solved to show the effectiveness of the proposed approach. Finally, we provide some experimental evidence on the MG-LSTD algorithm convergence properties in function of its key-parameters. (C) 2020 Elsevier Ltd. All rights reserved.

关键词： Least-squares temporal difference approximate dynamic programming Markov decision process Birth-death process Monte Carlo simulations

来源：评论

学校读者我要写书评

暂无评论

Multiperiod Stochastic Resource Planning in Professional Services Organizations

引用

DECISION SCIENCES 2019年第6期50卷 1281-1318页

作者： Solomon, Stanislaus Li, Haitao Womer, Keith Santos, Cipriano Southern Illinois Univ Edwardsville Management & Mkt Dept Sch Business Edwardsville IL 62025 USA Univ Missouri St Louis Supply Chain & Analyt Dept Coll Business Adm St Louis MO 63121 USA Gurobi Optimizat Beaverton OR 97008 USA

Resource planning (RP) in a professional service organization matches workforce resources with project tasks while considering a myriad of factors such as skill requirements, service delivery role, skill type, workforce proficiency level, and geographical location. The multiperiod stochastic resource planning studied in this article extends the one-period deterministic resource planning by explicitly coping with both internal resource attrition and project demand uncertainty in a sequential decision-making framework. It allows resource managers to make effective use of their internal resources and identify the need to outsource to external contingent resources. We model the multiperiod stochastic resource planning as a Markov decision process and implement an approximate dynamic programming algorithm to obtain dynamic and adaptive solutions in reasonable computation times. A comprehensive computational study shows that our approximate dynamic programming algorithm achieves higher profitability and internal resource utilization compared to the rolling horizon approach used as a benchmark.

关键词： approximate dynamic programming Markov Decision Process Project Management Resource Planning Uncertainty Workforce Optimization

来源：评论

学校读者我要写书评

暂无评论

HIGH-DIMENSIONAL PORTFOLIO OPTIMIZATION WITH TRANSACTION COSTS

引用

INTERNATIONAL JOURNAL OF THEORETICAL AND APPLIED FINANCE 2016年第4期19卷

作者： Broadie, Mark Shen, Weiwei Columbia Univ Grad Sch Business New York NY 10027 USA Columbia Univ Appl Phys & Appl Math New York NY 10027 USA

This paper studies Merton's portfolio optimization problem with proportional transaction costs in a discrete-time finite horizon. Facing short-sale and borrowing constraints, investors have access to a risk-free asset and multiple risky assets whose returns follow a multivariate geometric Brownian motion. Lower and upper bounds for optimal solutions up to the problem with 20 risky assets and 40 investment periods are computed. Three lower bounds are proposed: the value function optimization (VF), the hyper-sphere and the hyper-cube policy parameterizations (HS and HC). VF attacks the conundrums in traditional value function iteration for high-dimensional dynamic programs with continuous decision and state spaces. HS and HC respectively approximate the geometry of the trading policy in the high-dimensional state space by two surfaces. To evaluate lower bounds, two new upper bounds are provided via a duality method based on a new auxiliary problem (OMG and OMG2). Compared with existing methods across various suites of parameters, new methods lucidly show superiority. The three lower bound methods always achieve higher utilities, HS and HC cut run times by a factor of 100, and OMG and OMG2 mostly provide tighter upper bounds. In addition, how the no-trading region characterizing the optimal policy deforms when short-sale and borrowing constraints bind is investigated.

关键词： Portfolio optimization transaction costs value function iteration approximate dynamic programming lower and upper bounds

来源：评论

学校读者我要写书评

暂无评论

Optimal and approximate algorithms for sequential clinical scheduling with no-shows

引用

IIE Transactions on Healthcare Systems Engineering 2011年第1期1卷 20-36页

作者： Lin, J. Muthuraman, Kumar Lawley, Mark Weldon School of Biomedical Engineering Purdue University West Lafayette IN 47907-2032 206 S. Martin Jischke Drive United States McCombs School of Business University of Texas Austin TX United States

The accessibility and efficiency of outpatient clinic operations are largely affected by appointment schedules. Clinical scheduling is a process of assigning physician appointment times to sequentially calling patients. A significant problem in clinical operations is patient no-show, that is, scheduled patients not showing for their appointments. Overbooking can compensate revenue loss due to no-show, but naive overbooking can result in longer patient waiting times and uneven physician work loads. In the past few years, new overbooking methods have been developed for sequential scheduling that yield higher expected profit than simple scheduling rules, but these often fail to exploit information about the future call-in process (they are myopic). To fully use this important information, we develop a Markov Decision Processes (MDP) model for sequential clinical scheduling that books patients to optimize the performance of clinic operations. The model is solved by dynamic programming (DP) for small problems. approximate dynamic programming (ADP) algorithms based on aggregation and simulation are developed to find schedules for larger problems. Our computational experiments indicate good improvement over myopic methods. © 2011 Copyright Taylor and Francis Group, LLC.

关键词： approximate dynamic programming Markov Decision Process outpatient clinics patient no-shows Sequential scheduling

来源：评论

学校读者我要写书评

暂无评论

Deep reinforcement learning based finite-horizon optimal control for a discrete-time affine nonlinear system

Deep reinforcement learning based finite-horizon optimal con...

引用

SICE Annual Conference

作者： Jong Woo Kim Byung Jun Park Haeun Yoo Jay H. Lee Jong Min Lee School of Chemical and Biological Engineering Seoul National University Seoul Republic of Korea Chemical and Biomolecular Engineering Department Korea Advanced Institute of Science and Technology Daejeon Republic of Korea

approximate dynamic programming (ADP) aims to obtain an approximate numerical solution to the discrete-time Hamilton-Jacobi-Bellman (HJB) equation. Heuristic dynamic programming (HDP) is a two-stage iterative scheme of ADP by separating the HJB equation into two equations, one for the value function and another for the policy function, which are referred to as the critic and the actor, respectively. Previous ADP implementations have been limited by the choice of function approximator, which requires significant prior domain knowledge or a large number of parameters to be fitted. However, recent advances in deep learning brought by the computer science community enable the use of deep neural networks (DNN) to approximate high-dimensional nonlinear functions without prior domain knowledge. Motivated by this, we examine the potential of DNNs as function approximators of the critic and the actor. In contrast to the infinite-horizon optimal control problem, the critic and the actor of the finite horizon optimal control (FHOC) problem are time-varying functions and have to satisfy a boundary condition. DNN structure and training algorithm suitable for FHOC are presented. Illustrative examples are provided to demonstrate the validity of the proposed method.

关键词： Reinforcement learning approximate dynamic programming Deep learning Actor-critic method Finite horizon optimal control

来源：评论

学校读者我要写书评

暂无评论

Adaptive Traffic Signal Control for Multi-intersection Based on Microscopic Model

Adaptive Traffic Signal Control for Multi-intersection Based...

引用

International Conference on Tools with Artificial Intelligence

作者： Biao Yin Mahjoub Dridi Abdellah El Moudni Laboratoire IRTES-SeT Université de Technologie de Belfort-Montbéliard (UTBM) Belfort France

ISBN: (纸本)9781509001644

In this paper, we mainly propose an online learning method for adaptive traffic signal control in a multi-intersection system. The method uses approximate dynamic programming (ADP) to achieve a near-optimal solution of the signal optimization in a distributed network, which is modeled in a microscopic way. The traffic network loading model and traffic signal control model are presented to serve as the basis of discrete-time control environment. The learning process of linear function approximation in ADP approach adopts the tunable parameters of the traffic states, including the vehicle queue length and the signal indication. ADP overcomes the computational complexity, which usually appears in large-scale problems solved by exact algorithms, such as dynamic programming. Moreover, the proposed adaptive phase sequence (APS) mode improves the performance by comparing with other control methods. The results in simulation show that our method performs quite well for adaptive traffic signal control problem.

关键词： adaptive signal control multi-intersection approximate dynamic programming adaptive phase sequence

来源：评论

学校读者我要写书评

暂无评论

Discrete-Time Generalized Policy Iteration ADP Algorithm With Approximation Errors

Discrete-Time Generalized Policy Iteration ADP Algorithm Wit...

引用

IEEE Symposium Series on Computational Intelligence

作者： Qinglai Wei Benkai Li Ruizhuo Song The State Key Laboratory of Management and Control for Complex Systems Chinese Academy of Sciences Beijing China School of Automation and Electrical Engineering University of Science and Technology Beijing Beijing China

This paper concerns with a novel generalized policy iteration (GPI) algorithm with approximation errors. Approximation errors are explicitly considered in the GPI algorithm. The properties of the stable GPI algorithm with approximation errors are analyzed. The convergence of the developed algorithm is established to show that the iterative value function is convergent to a finite neighborhood of the optimal performance index function. Finally, numerical examples and comparisons are presented.

关键词： Adaptive critic designs Adaptive dynamic programming approximate dynamic programming Neuro-dynamic programming Generalized policy iteration Nonlinear systems Optimal control Neural networks Reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

approximate modified policy iteration and its application to the game of Tetris

The Journal of Machine Learning Research

引用

The Journal of Machine Learning Research 2015年第1期16卷

作者： Bruno Scherrer Mohammad Ghavamzadeh Victor Gabillon Boris Lesner Matthieu Geist INRIA Nancy-Grand Est Team Maia Vandœuvre-ls-Nancy France Adobe Research & INRIA Lille San Jose CA INRIA Lille-Nord Europe Team SequeL Villeneuve d'Ascq France CentraleSupélec IMS-MaLIS Research Group & UMI (GeorgiaTech-CNRS) Metz France

Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are extensions of the well-known approximate DP algorithms: fitted-value iteration, fitted-Q iteration, and classification-based policy iteration. We provide error propagation analysis that unify those for approximate policy and value iteration. We develop the finite-sample analysis of these algorithms, which highlights the influence of their parameters. In the classification-based version of the algorithm (CBMPI), the analysis shows that MPI's main parameter controls the balance between the estimation error of the classifier and the overall value function approximation. We illustrate and evaluate the behavior of these new algorithms in the Mountain Car and Tetris problems. Remarkably, in Tetris, CBMPI outperforms the existing DP approaches by a large margin, and competes with the current state-of-the-art methods while using fewer samples.

关键词： Markov decision processes approximate dynamic programming finite-sample analysis game of tetris performance bounds reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

dynamic policy programming

The Journal of Machine Learning Research

引用

The Journal of Machine Learning Research 2012年第1期13卷

作者： Mohammad Gheshlaghi Azar Vicenç Gómez Hilbert J. Kappen Department of Biophysics Radboud University Nijmegen Nijmegen The Netherlands

In this paper, we propose a novel policy iteration method, called dynamic policy programming (DPP), to estimate the optimal policy in the infinite-horizon Markov decision processes. DPP is an incremental algorithm that forces a gradual change in policy update. This allows us to prove finite-iteration and asymptotic l∞-norm performance-loss bounds in the presence of approximation/ estimation error which depend on the average accumulated error as opposed to the standard bounds which are expressed in terms of the supremum of the errors. The dependency on the average error is important in problems with limited number of samples per iteration, for which the average of the errors can be significantly smaller in size than the supremum of the errors. Based on these theoretical results, we prove that a sampling-based variant of DPP (DPP-RL) asymptotically converges to the optimal policy. Finally, we illustrate numerically the applicability of these results on some benchmark problems and compare the performance of the approximate variants of DPP with some existing reinforcement learning (RL) methods.

关键词： Markov decision processes Monte-Carlo methods approximate dynamic programming function approximation reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：