检索结果-内蒙古大学图书馆

Strategic Capacity Decision-Making in a Stochastic Manufacturing Environment Using Real-Time approximate dynamic programming

引用

NAVAL RESEARCH LOGISTICS 2010年第3期57卷 211-224页

作者： Pratikakis, Nikolaos E. Realff, Matthew J. Lee, Jay H. Georgia Inst Technol Sch Chem & Biomol Engn Atlanta GA 30332 USA

In this study, we illustrate a real-time approximate dynamic programming (RTADP) method for solving multistage capacity decision problems in a stochastic manufacturing environment, by using an exemplary three-stage manufacturing system with recycle. The system is a moderate size queuing network, which experiences stochastic variations in demand and product yield. The dynamic capacity decision problem is formulated as a Markov decision process (MDP). The proposed RTADP method starts with a set of heuristics and learns a superior quality solution by interacting with the stochastic system via simulation. The curse-of-dimensionality associated with DP methods is alleviated by the adoption of several notions including "evolving set of relevant states," for which the value function table is built and updated, "adaptive action set" for keeping track of attractive action candidates, and "nonparametric k nearest neighbor averager" for value function approximation. The performance of the learned solution is evaluated against (1) an "ideal" Solution derived using a mixed integer programming (MIP) formulation, which assumes full knowledge of future realized values of the stochastic variables (2) a myopic heuristic solution, and (3) a sample path based rolling horizon MIP solution. The policy learned through the RTADP method turned out to be superior to polices of 2 and 3. (C) 2010 Wiley Periodicals, Inc. Naval Research Logistics 57: 211-224, 2010

关键词： queueing networks approximate dynamic programming real-time dynamic programming capacity planning

来源：评论

学校读者我要写书评

暂无评论

Opportunistic Fair Scheduling in Wireless Networks: An approximate dynamic programming Approach

引用

MOBILE NETWORKS & APPLICATIONS 2010年第5期15卷 710-728页

作者： Zhang, Zhi Moola, Sudhir Chong, Edwin K. P. Colorado State Univ Dept Elect & Comp Engn Ft Collins CO 80523 USA

We consider the problem of temporal fair scheduling of queued data transmissions in wireless heterogeneous networks. We deal with both the throughput maximization problem and the delay minimization problem. Taking fairness constraints and the data arrival queues into consideration, we formulate the transmission scheduling problem as a Markov decision process (MDP) with fairness constraints. We study two categories of fairness constraints, namely temporal fairness and utilitarian fairness. We consider two criteria: infinite horizon expected total discounted reward and expected average reward. Applying the dynamic programming approach, we derive and prove explicit optimality equations for the above constrained MDPs, and give corresponding optimal fair scheduling policies based on those equations. A practical stochastic-approximation-type algorithm is applied to calculate the control parameters online in the policies. Furthermore, we develop a novel approximation method-temporal fair rollout-to achieve a tractable computation. Numerical results show that the proposed scheme achieves significant performance improvement for both throughput maximization and delay minimization problems compared with other existing schemes.

关键词： approximate dynamic programming fairness Markov decision process resource allocation scheduling stochastic approximation wireless networks

来源：评论

学校读者我要写书评

暂无评论

Tutorial on Stochastic Optimization in Energy-Part II: An Energy Storage Illustration

引用

IEEE TRANSACTIONS ON POWER SYSTEMS 2016年第2期31卷 1468-1475页

作者： Powell, Warren B. Meisel, Stephan Princeton Univ Dept Operat Res & Financial Engn Princeton NJ 08544 USA Univ Munster Dept Informat Syst Munster Germany

In Part I of this tutorial, we provided a canonical modeling framework for sequential, stochastic optimization (control) problems. A major feature of this framework is a clear separation of the process of modeling a problem, versus the design of policies to solve the problem. In Part II, we provide additional discussion behind some of the more subtle concepts such as the construction of a state variable. We illustrate the modeling process using an energy storage problem. We then create five variations of this problem designed to bring out the features of the different policies. The first four of these problems demonstrate that each of the four classes of policies is best for particular problem characteristics. The fifth policy is a hybrid that illustrates the ability to combine the strengths of multiple policy classes.

关键词： approximate dynamic programming dynamic programming energy storage energy systems optimal control reinforcement learning robust optimization stochastic optimization stochastic programming

来源：评论

学校读者我要写书评

暂无评论

approximate optimal adaptive control for weakly coupled nonlinear systems: A neuro-inspired approach

引用

INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING 2016年第8-10期30卷 1494-1522页

作者： Carrillo, Luis Rodolfo Garcia Vamvoudakis, Kyriakos G. Hespanha, Joao Pedro Univ Nevada Dept Elect & Biomed Engn Reno NV 89557 USA Univ Calif Santa Barbara Ctr Control Dynam Syst & Computat CCDC Santa Barbara CA 93106 USA

This paper proposes a new approximate dynamic programming algorithm to solve the infinite-horizon optimal control problem for weakly coupled nonlinear systems. The algorithm is implemented as a three-critic/four-actor approximators structure, where the critic approximators are used to learn the optimal costs, while the actor approximators are used to learn the optimal control policies. Simultaneous continuous-time adaptation of both critic and actor approximators is implemented, a method commonly known as synchronous policy iteration. The adaptive control nature of the algorithm requires a persistence of excitation condition to be a priori guaranteed, but this can be relaxed by using previously stored data concurrently with current data in the update of the critic approximators. Appropriate robustifying terms are added to the controllers to eliminate the effects of the residual errors, leading to asymptotic stability of the equilibrium point of the closed-loop system. Simulation results show the effectiveness of the proposed approach for a sixth-order dynamical example. Copyright (C) 2015 John Wiley & Sons, Ltd.

关键词： weakly coupled systems large-scale systems adaptive control approximate dynamic programming asymptotic stability optimal control reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Data-Driven Zero-Sum Neuro-Optimal Control for a Class of Continuous-Time Unknown Nonlinear Systems With Disturbance Using ADP

引用

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016年第2期27卷 444-458页

作者： Wei, Qinglai Song, Ruizhuo Yan, Pengfei Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China

This paper is concerned with a new data-driven zero-sum neuro-optimal control problem for continuous-time unknown nonlinear systems with disturbance. According to the input-output data of the nonlinear system, an effective recurrent neural network is introduced to reconstruct the dynamics of the nonlinear system. Considering the system disturbance as a control input, a two-player zero-sum optimal control problem is established. Adaptive dynamic programming ( ADP) is developed to obtain the optimal control under the worst case of the disturbance. Three single-layer neural networks, including one critic and two action networks, are employed to approximate the performance index function, the optimal control law, and the disturbance, respectively, for facilitating the implementation of the ADP method. Convergence properties of the ADP method are developed to show that the system state will converge to a finite neighborhood of the equilibrium. The weight matrices of the critic and the two action networks are also convergent to finite neighborhoods of their optimal ones. Finally, the simulation results will show the effectiveness of the developed data-driven ADP methods.

关键词： Adaptive critic designs adaptive dynamic programming (ADP) approximate dynamic programming neurodynamic programming nonlinear systems optimal control recurrent neural network (RNN) reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Value Iteration Adaptive dynamic programming for Optimal Control of Discrete-Time Nonlinear Systems

引用

IEEE TRANSACTIONS ON CYBERNETICS 2016年第3期46卷 840-853页

作者： Wei, Qinglai Liu, Derong Lin, Hanquan Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China

In this paper, a value iteration adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon undiscounted optimal control problems for discrete-time nonlinear systems. The present value iteration ADP algorithm permits an arbitrary positive semi-definite function to initialize the algorithm. A novel convergence analysis is developed to guarantee that the iterative value function converges to the optimal performance index function. Initialized by different initial functions, it is proven that the iterative value function will be monotonically nonincreasing, monotonically nondecreasing, or nonmonotonic and will converge to the optimum. In this paper, for the first time, the admissibility properties of the iterative control laws are developed for value iteration algorithms. It is emphasized that new termination criteria are established to guarantee the effectiveness of the iterative control laws. Neural networks are used to approximate the iterative value function and compute the iterative control law, respectively, for facilitating the implementation of the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the present method.

关键词： Adaptive critic designs adaptive dynamic programming (ADP) approximate dynamic programming neural networks neuro-dynamic programming optimal control reinforcement learning value iteration

来源：评论

学校读者我要写书评

暂无评论

Network-Level Infrastructure Management Using approximate dynamic programming

引用

JOURNAL OF INFRASTRUCTURE SYSTEMS 2010年第2期16卷 103-111页

作者： Kuhn, Kenneth D. Univ Canterbury Dept Civil & Nat Resources Engn Christchurch 8041 New Zealand

This research introduces the use of approximate dynamic programming to overcome a variety of limitations of distinct infrastructure management problem formulations. The form, as well as the parameters, of a model specifying the long-term costs associated with alternate infrastructure maintenance policies are learned via simulation. The introduced methodology makes it possible to manage large heterogeneous networks of facilities related by budgetary restrictions and resource constraints as well as by dependencies in maintenance costs or deterioration. In addition, the methodology is particularly well suited to consideration of multiple types of infrastructure condition data at the same time, including continuous-valued data and relevant historical data. Introduced techniques will prove valuable when high-quality deterioration and cost estimation models are available but are ill suited for use in a Markov decision problem framework. Computational studies show that the introduced approach is able to find an optimal solution to a relatively simple infrastructure management problem, and is able to find increasingly good solutions to a more complex problem.

关键词： Infrastructure management approximate dynamic programming Markov decision problem Infrastructure maintenance

来源：评论

学校读者我要写书评

暂无评论

Convergence of the standard RLS method and UDU^T factorisation of covariance matrix for solving the algebraic Riccati equation of the DLQR via heuristic approximate dynamicprogramming

引用

INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE 2015年第11期46卷 2006-2028页

作者： Moraes Rego, Patricia Helena da Fonseca Neto, Joao Viana Ferreira, Ernesto M. Univ Estadual Maranhao BR-65055970 Sao Luis MA Brazil Univ Fed Maranhao BR-65055970 Sao Luis MA Brazil

The main focus of this article is to present a proposal to solve, via UDUT factorisation, the convergence and numerical stability problems that are related to the covariance matrix ill-conditioning of the recursive least squares (RLS) approach for online approximations of the algebraic Riccati equation (ARE) solution associated with the discrete linear quadratic regulator (DLQR) problem formulated in the actor-critic reinforcement learning and approximate dynamic programming context. The parameterisations of the Bellman equation, utility function and dynamic system as well as the algebra of Kronecker product assemble a framework for the solution of the DLQR problem. The condition number and the positivity parameter of the covariance matrix are associated with statistical metrics for evaluating the approximation performance of the ARE solution via RLS-based estimators. The performance of RLS approximators is also evaluated in terms of consistence and polarisation when associated with reinforcement learning methods. The used methodology contemplates realisations of online designs for DLQR controllers that is evaluated in a multivariable dynamic system model.

关键词： numerical stability multivariable control recursive least squares approximate dynamic programming convergence discrete linear quadratic regulator

来源：评论

学校读者我要写书评

暂无评论

Multi-class, multi-resource advance scheduling with no-shows, cancellations and overbooking

引用

COMPUTERS & OPERATIONS RESEARCH 2016年 67卷 90-101页

作者： Parizi, Mahshid Salemi Ghate, Archis Univ Washington Dept Ind & Syst Engn Seattle WA 98195 USA

We investigate a class of scheduling problems where dynamically and stochastically arriving appointment requests are either rejected or booked for future slots. A customer may cancel an appointment. A customer who does not cancel may fail to show up. The planner may overbook appointments to mitigate the detrimental effects of cancellations and no-shows. A customer needs multiple renewable resources. The system receives a reward for providing service;and incurs costs for rejecting requests, appointment delays, and overtime. Customers are heterogeneous in all problem parameters. We provide a Markov decision process (MDP) formulation of these problems. Exact solution of this MDP is intractable. We show that this MDP has a weakly coupled structure that enables us to apply an approximate dynamic programming method rooted in Lagrangian relaxation, affine value function approximation, and constraint generation. We compare this method with a myopic scheduling heuristic on eighteen hundred problem instances. Our experiments show that there is a statistically significant difference in the performance of the two methods in 77% of these instances. Of these statistically significant instances, the Lagrangian method outperforms the myopic method in 97% of the instances. (C) 2015 Elsevier Ltd. All rights reserved.

关键词： approximate dynamic programming Markov decision processes

来源：评论

学校读者我要写书评

暂无评论

Multi-time scale procurement planning considering multiple suppliers and uncertainty in supply and demand

引用

COMPUTERS & CHEMICAL ENGINEERING 2016年 91卷 114-126页

作者： Shin, Joohyun Lee, Jay H. Korea Adv Inst Sci & Technol Chem & Biomol Engn Dept Daejeon South Korea

Inventory management of procurement system is decomposed into sub-problems according to the timescale of decisions: the long-term planning for ordering raw materials and the short-term scheduling for unloading the orders. To ensure more sustainable and robust operation, different decision layers should be integrated (which is nature of multi-scale), and supply and demand uncertainty should be considered. In this study, the planning problem is formulated as a Markov decision process (MDP) to incorporate possible realizations of uncertainty into the decision-making process. The MDP planning model is integrated with a scheduling model expressed by a MILP (or closely approximated by a heuristic approach). Decision policies are obtained from solving the MDP problem through an exact value iteration, as well as an approximate approach intended to alleviate the computational challenges. We compare the results from applying them with those of a reference policy obtained without any rigorous integration with scheduling through benchmark problems. (C) 2016 Elsevier Ltd. All rights reserved.

关键词： Procurement planning and scheduling Multi-scale decision making Supply and demand uncertainty Markov decision process approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：