检索结果-内蒙古大学图书馆

A price-directed heuristic for the economic lot scheduling problem

IIE TRANSACTIONS 2014年第12期46卷 1343-1356页

作者： Adelman, Daniel Barz, Christiane Univ Chicago Booth Sch Business Chicago IL 60637 USA Univ Calif Los Angeles Anderson Sch Management Los Angeles CA 90095 USA

The article formulates the well-known economic lot scheduling problem (ELSP) with sequence-dependent setup times and costs as a semi-Markov decision process. Using an affine approximation of the bias function, a semi-infinite linear program is obtained and a lower bound for the minimum average total cost rate is determined. The solution of this problem is directly used in a price-directed, dynamic heuristic to determine a good cyclic schedule. As the state space of the ELSP is non-trivial for the multi-product setting with setup times, the authors further illustrate how a lookahead version of the price-directed, dynamic heuristic can be used to construct and dynamically improve an approximation of the state space. Numerical results show that the resulting heuristic performs competitively with one reported in the literature.

关键词： Economic lot scheduling problem sequence-dependent setups semi-Markov decision process approximate dynamic programming dynamic heuristic

来源：评论

学校读者我要写书评

暂无评论

A Novel Iterative θ-Adaptive dynamic programming for Discrete-Time Nonlinear Systems

引用

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING 2014年第4期11卷 1176-1190页

作者： Wei, Qinglai Liu, Derong Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

This paper is concerned with a new iterative theta-adaptive dynamic programming (ADP) technique to solve optimal control problems of infinite horizon discrete-time nonlinear systems. The idea is to use an iterative ADP algorithm to obtain the iterative control law which optimizes the iterative performance index function. In the present iterative theta-ADP algorithm, the condition of initial admissible control in policy iteration algorithm is avoided. It is proved that all the iterative controls obtained in the iterative theta-ADP algorithm can stabilize the nonlinear system which means that the iterative theta-ADP algorithm is feasible for implementations both online and offline. Convergence analysis of the performance index function is presented to guarantee that the iterative performance index function will converge to the optimum monotonically. Neural networks are used to approximate the performance index function and compute the optimal control policy, respectively, for facilitating the implementation of the iterative theta-ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the established method.

关键词： Adaptive critic designs adaptive dynamic programming approximate dynamic programming neural networks neuro-dynamic programming nonlinear systems optimal control policy iteration reinforcement learning value iteration

来源：评论

学校读者我要写书评

暂无评论

Effective Load Carrying Capability Evaluation for High Penetration Renewable Energy Integration

Effective Load Carrying Capability Evaluation for High Penet...

引用

IEEE Power and Energy Society General Meeting

作者： Zhi Chen Lei Wu Department of Electrical Engineering Arkansas Tech University Department of Electrical and Computer Engineering Clarkson University

ISBN: (纸本)9781467380416

This paper proposes an approximate dynamic programming (ADP) based approach to evaluate the effective load carrying capability (ELCC) of high penetration renewable resources by solving the long-term security-constrained unit commitment (SCUC) problem with various uncertainties related to solar radiation, wind speed, and load level. Compared with traditional approaches, the proposed approach can assist Independent System Operator (ISO) to make the decision on the basis of current day information only, hence it can reduce the computation burden from future states forecasting. The objective of the proposed long-term SCUC formulation is to minimize the operation cost for the base case with forecast values while considering variable cost from uncertainties. Numerical case studies on a 6-bus system illustrate the effectiveness of the proposed ADP based long-term SCUC model for the investigation of ELCC under various uncertainties.

关键词： Effective load carrying capability approximate dynamic programming Renewable resource integration Power system reliability Electric power reliability dynamic programming Payload Capabilities evaluation Uncertainty Adenosine Diphosphate Automatic data processing

来源：评论

学校读者我要写书评

暂无评论

Neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming

引用

INFORMATION SCIENCES 2014年 282卷 167-179页

作者： Wang, Ding Liu, Derong Li, Hongliang Ma, Hongwen Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

In this paper, the neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamic programming approach is investigated. First, the robust controller of the original uncertain system is derived by adding a feedback gain to the optimal controller of the nominal system. It is also shown that this robust controller can achieve optimality under a specified cost function, which serves as the basic idea of the robust optimal control design. Then, a critic network is constructed to solve the Hamilton-Jacobi-Bellman equation corresponding to the nominal system, where an additional stabilizing term is introduced to verify the stability. The uniform ultimate boundedness of the closed-loop system is also proved by using the Lyapunov approach. Moreover, the obtained results are extended to solve decentralized optimal control problem of continuous-time nonlinear interconnected large-scale systems. Finally, two simulation examples are presented to illustrate the effectiveness of the established control scheme. (C) 2014 Elsevier Inc. All rights reserved.

关键词： Adaptive dynamic programming approximate dynamic programming Decentralized optimal control Neural network Robust optimal control

来源：评论

学校读者我要写书评

暂无评论

The dynamic fleet management problem with uncertain demand and customer chosen service level

引用

INTERNATIONAL JOURNAL OF PRODUCTION ECONOMICS 2014年 148卷 110-121页

作者： Shi, Ning Song, Haiqing Powell, Warren B. Sun Yat Sen Univ Sch Business Guangzhou 510275 Guangdong Peoples R China Sun Yat Sen Univ Lingnan Coll Guangzhou 510275 Guangdong Peoples R China Princeton Univ Dept Operat Res & Financial Engn Princeton NJ 08544 USA

In this paper, we study a dynamic fleet management problem with uncertain demands and customer chosen service levels. We first show that the problem can be transformed into a dynamic network with partially dependent random arc capacities, and then develop a structural decomposition approach which decomposes the network recourse problem into a series of tree recourse problems (TRPs). As each TRP can be solved by an efficient algorithm, the decomposition approach can solve the problem very efficiently. We conduct numerical experiments to compare its performance with two alternative methods. Numerical experiments show that the performance of our method is quite encouraging. (C) 2013 Elsevier B.V. All rights reserved.

关键词： Stochastic programming approximate dynamic programming dynamic fleet management Decomposition method Multistage networks

来源：评论

学校读者我要写书评

暂无评论

Optimal Patrol to Uncover Threats in Time When Detection is Imperfect

引用

NAVAL RESEARCH LOGISTICS 2014年第8期61卷 557-576页

作者： Lin, Kyle Y. Atkinson, Michael P. Glazebrook, Kevin D. Naval Postgrad Sch Dept Operat Res Monterey CA 93943 USA Univ Lancaster Dept Management Sci Sch Management Lancaster LA1 4YX England

Consider a patrol problem, where a patroller traverses a graph through edges to detect potential attacks at nodes. An attack takes a random amount of time to complete. The patroller takes one time unit to move to and inspect an adjacent node, and will detect an ongoing attack with some probability. If an attack completes before it is detected, a cost is incurred. The attack time distribution, the cost due to a successful attack, and the detection probability all depend on the attack node. The patroller seeks a patrol policy that minimizes the expected cost incurred when, and if, an attack eventually happens. We consider two cases. A random attacker chooses where to attack according to predetermined probabilities, while a strategic attacker chooses where to attack to incur the maximal expected cost. In each case, computing the optimal solution, although possible, quickly becomes intractable for problems of practical sizes. Our main contribution is to develop efficient index policiesbased on Lagrangian relaxation methodology, and also on approximate dynamic programmingwhich typically achieve within 1% of optimality with computation time orders of magnitude less than what is required to compute the optimal policy for problems of practical sizes. (c) 2014 Wiley Periodicals, Inc. Naval Research Logistics, 61: 557-576, 2014

关键词： surveillance infrastructure protection search and detection Lagrangian relaxation approximate dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Policy oscillation is overshooting

引用

NEURAL NETWORKS 2014年第0期52卷 43-61页

作者： Wagner, Paul Aalto Univ Dept Informat & Comp Sci FI-00076 Aalto Finland

A majority of approximate dynamic programming approaches to the reinforcement learning problem can be categorized into greedy value function methods and value-based policy gradient methods. The former approach, although fast, is well known to be susceptible to the policy oscillation phenomenon. We take a fresh view to this phenomenon by casting, within the context of non-optimistic policy iteration, a considerable subset of the former approach as a limiting special case of the latter. We explain the phenomenon in terms of this view and illustrate the underlying mechanism with artificial examples. We also use it to derive the constrained natural actor-critic algorithm that can interpolate between the aforementioned approaches. In addition, it has been suggested in the literature that the oscillation phenomenon might be subtly connected to the grossly suboptimal performance in the Tetris benchmark problem of all attempted approximate dynamic programming methods. Based on empirical findings, we offer a hypothesis that might explain the inferior performance levels and the associated policy degradation phenomenon, and which would partially support the suggested connection. Finally, we report scores in the Tetris problem that improve on existing dynamic programming based results by an order of magnitude. (C) 2014 Elsevier Ltd. All rights reserved.

关键词： Reinforcement learning approximate dynamic programming Policy gradient Natural gradient Policy oscillation Policy chattering

来源：评论

学校读者我要写书评

暂无评论

Reinforcement learning algorithms with function approximation: Recent advances and applications

引用

INFORMATION SCIENCES 2014年 261卷 1-31页

作者： Xu, Xin Zuo, Lei Huang, Zhenhua Natl Univ Def Technol Coll Mechatron & Automat Changsha 410073 Hunan Peoples R China

In recent years, the research on reinforcement learning (RL) has focused on function approximation in learning prediction and control of Markov decision processes (MDPs). The usage of function approximation techniques in RL will be essential to deal with MDPs with large or continuous state and action spaces. In this paper, a comprehensive survey is given on recent developments in RL algorithms with function approximation. From a theoretical point of view, the convergence and feature representation of RL algorithms are analyzed. From an empirical aspect, the performance of different RL algorithms was evaluated and compared in several benchmark learning prediction and learning control tasks. The applications of RL with function approximation are also discussed. At last, future works on RL with function approximation are suggested. (C) 2013 Elsevier Inc. All rights reserved.

关键词： Reinforcement learning Function approximation approximate dynamic programming Learning control Generalization

来源：评论

学校读者我要写书评

暂无评论

Rollout Event-Triggered Control: Beyond Periodic Control Performance

引用

IEEE TRANSACTIONS ON AUTOMATIC CONTROL 2014年第12期59卷 3296-3311页

作者： Antunes, D. Heemels, W. P. M. H. Eindhoven Univ Technol Dept Mech Engn Control Syst Technol Grp NL-5600 MB Eindhoven Netherlands

Cyber-Physical Systems (CPSs) resulting from the interconnection of computational, communication, and control (cyber) devices with physical processes are wide spreading in our society. In several CPS applications it is crucial to minimize the communication burden, while still providing desirable closed-loop control properties. To this effect, a promising approach is to embrace the recently proposed event-triggered control paradigm, in which the transmission times are chosen based on well-defined events, using state information. However, few general event-triggered control methods guarantee closed-loop improvements over traditional periodic transmission strategies. Here, we provide a new class of event-triggered controllers for linear systems which guarantee better quadratic performance than traditional periodic time-triggered control using the same average transmission rate. In particular, our main results explicitly quantify the obtained performance improvements for quadratic average cost problems. The proposed controllers are inspired by rollout ideas in the context of dynamic programming.

关键词： approximate dynamic programming control over communications event-triggered control Markov processes stochastic optimal control

来源：评论

学校读者我要写书评

暂无评论

Integral Reinforcement Learning for Linear Continuous-Time Zero-Sum Games With Completely Unknown dynamics

引用

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING 2014年第3期11卷 706-714页

作者： Li, Hongliang Liu, Derong Wang, Ding Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

In this paper, we develop an integral reinforcement learning algorithm based on policy iteration to learn online the Nash equilibrium solution for a two-player zero-sum differential game with completely unknown linear continuous-time dynamics. This algorithm is a fully model-free method solving the game algebraic Riccati equation forward in time. The developed algorithm updates value function, control and disturbance policies simultaneously. The convergence of the algorithm is demonstrated to be equivalent to Newton's method. To implement this algorithm, one critic network and two action networks are used to approximate the game value function, control and disturbance policies, respectively, and the least squares method is used to estimate the unknown parameters. The effectiveness of the developed scheme is demonstrated in the simulation by designing an H-infinity state feedback controller for a power system. Note to Practitioners-Noncooperative zero-sum differential game provides an ideal tool to study multiplayer optimal decision and control problems. Existing approaches usually solve the Nash equilibrium solution by means of offline iterative computation, and require the exact knowledge of the system dynamics. However, it is difficult to obtain the exact knowledge of the system dynamics for many real-world industrial systems. The algorithm developed in this paper is a fully model-free method which solves the zero-sum differential game problem forward in time by making use of online measured data. This method is not affected by errors between an identification model and a real system, and responds fast to changes of the system dynamics. Exploration signals are required to satisfy the persistence of excitation condition to update the value function and the policies, and these signals do not affect the convergence of the learning process. The least squares method is used to obtain the approximate solution for the zero-sum games with unknown dynamics. The developed a

关键词： Adaptive critic designs adaptive dynamic programming approximate dynamic programming reinforcement learning policy iteration zero-sum games

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：