The article formulates the well-known economic lot scheduling problem (ELSP) with sequence-dependent setup times and costs as a semi-Markov decision process. Using an affine approximation of the bias function, a semi-...
详细信息
The article formulates the well-known economic lot scheduling problem (ELSP) with sequence-dependent setup times and costs as a semi-Markov decision process. Using an affine approximation of the bias function, a semi-infinite linear program is obtained and a lower bound for the minimum average total cost rate is determined. The solution of this problem is directly used in a price-directed, dynamic heuristic to determine a good cyclic schedule. As the state space of the ELSP is non-trivial for the multi-product setting with setup times, the authors further illustrate how a lookahead version of the price-directed, dynamic heuristic can be used to construct and dynamically improve an approximation of the state space. Numerical results show that the resulting heuristic performs competitively with one reported in the literature.
This paper is concerned with a new iterative theta-adaptive dynamicprogramming (ADP) technique to solve optimal control problems of infinite horizon discrete-time nonlinear systems. The idea is to use an iterative AD...
详细信息
This paper is concerned with a new iterative theta-adaptive dynamicprogramming (ADP) technique to solve optimal control problems of infinite horizon discrete-time nonlinear systems. The idea is to use an iterative ADP algorithm to obtain the iterative control law which optimizes the iterative performance index function. In the present iterative theta-ADP algorithm, the condition of initial admissible control in policy iteration algorithm is avoided. It is proved that all the iterative controls obtained in the iterative theta-ADP algorithm can stabilize the nonlinear system which means that the iterative theta-ADP algorithm is feasible for implementations both online and offline. Convergence analysis of the performance index function is presented to guarantee that the iterative performance index function will converge to the optimum monotonically. Neural networks are used to approximate the performance index function and compute the optimal control policy, respectively, for facilitating the implementation of the iterative theta-ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the established method.
This paper proposes an approximate dynamic programming (ADP) based approach to evaluate the effective load carrying capability (ELCC) of high penetration renewable resources by solving the long-term security-constrain...
详细信息
ISBN:
(纸本)9781467380416
This paper proposes an approximate dynamic programming (ADP) based approach to evaluate the effective load carrying capability (ELCC) of high penetration renewable resources by solving the long-term security-constrained unit commitment (SCUC) problem with various uncertainties related to solar radiation, wind speed, and load level. Compared with traditional approaches, the proposed approach can assist Independent System Operator (ISO) to make the decision on the basis of current day information only, hence it can reduce the computation burden from future states forecasting. The objective of the proposed long-term SCUC formulation is to minimize the operation cost for the base case with forecast values while considering variable cost from uncertainties. Numerical case studies on a 6-bus system illustrate the effectiveness of the proposed ADP based long-term SCUC model for the investigation of ELCC under various uncertainties.
In this paper, the neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamicprogramming approach is investigated. First, the robust controller of the original ...
详细信息
In this paper, the neural-network-based robust optimal control design for a class of uncertain nonlinear systems via adaptive dynamicprogramming approach is investigated. First, the robust controller of the original uncertain system is derived by adding a feedback gain to the optimal controller of the nominal system. It is also shown that this robust controller can achieve optimality under a specified cost function, which serves as the basic idea of the robust optimal control design. Then, a critic network is constructed to solve the Hamilton-Jacobi-Bellman equation corresponding to the nominal system, where an additional stabilizing term is introduced to verify the stability. The uniform ultimate boundedness of the closed-loop system is also proved by using the Lyapunov approach. Moreover, the obtained results are extended to solve decentralized optimal control problem of continuous-time nonlinear interconnected large-scale systems. Finally, two simulation examples are presented to illustrate the effectiveness of the established control scheme. (C) 2014 Elsevier Inc. All rights reserved.
In this paper, we study a dynamic fleet management problem with uncertain demands and customer chosen service levels. We first show that the problem can be transformed into a dynamic network with partially dependent r...
详细信息
In this paper, we study a dynamic fleet management problem with uncertain demands and customer chosen service levels. We first show that the problem can be transformed into a dynamic network with partially dependent random arc capacities, and then develop a structural decomposition approach which decomposes the network recourse problem into a series of tree recourse problems (TRPs). As each TRP can be solved by an efficient algorithm, the decomposition approach can solve the problem very efficiently. We conduct numerical experiments to compare its performance with two alternative methods. Numerical experiments show that the performance of our method is quite encouraging. (C) 2013 Elsevier B.V. All rights reserved.
Consider a patrol problem, where a patroller traverses a graph through edges to detect potential attacks at nodes. An attack takes a random amount of time to complete. The patroller takes one time unit to move to and ...
详细信息
Consider a patrol problem, where a patroller traverses a graph through edges to detect potential attacks at nodes. An attack takes a random amount of time to complete. The patroller takes one time unit to move to and inspect an adjacent node, and will detect an ongoing attack with some probability. If an attack completes before it is detected, a cost is incurred. The attack time distribution, the cost due to a successful attack, and the detection probability all depend on the attack node. The patroller seeks a patrol policy that minimizes the expected cost incurred when, and if, an attack eventually happens. We consider two cases. A random attacker chooses where to attack according to predetermined probabilities, while a strategic attacker chooses where to attack to incur the maximal expected cost. In each case, computing the optimal solution, although possible, quickly becomes intractable for problems of practical sizes. Our main contribution is to develop efficient index policiesbased on Lagrangian relaxation methodology, and also on approximate dynamic programmingwhich typically achieve within 1% of optimality with computation time orders of magnitude less than what is required to compute the optimal policy for problems of practical sizes. (c) 2014 Wiley Periodicals, Inc. Naval Research Logistics, 61: 557-576, 2014
A majority of approximate dynamic programming approaches to the reinforcement learning problem can be categorized into greedy value function methods and value-based policy gradient methods. The former approach, althou...
详细信息
A majority of approximate dynamic programming approaches to the reinforcement learning problem can be categorized into greedy value function methods and value-based policy gradient methods. The former approach, although fast, is well known to be susceptible to the policy oscillation phenomenon. We take a fresh view to this phenomenon by casting, within the context of non-optimistic policy iteration, a considerable subset of the former approach as a limiting special case of the latter. We explain the phenomenon in terms of this view and illustrate the underlying mechanism with artificial examples. We also use it to derive the constrained natural actor-critic algorithm that can interpolate between the aforementioned approaches. In addition, it has been suggested in the literature that the oscillation phenomenon might be subtly connected to the grossly suboptimal performance in the Tetris benchmark problem of all attempted approximate dynamic programming methods. Based on empirical findings, we offer a hypothesis that might explain the inferior performance levels and the associated policy degradation phenomenon, and which would partially support the suggested connection. Finally, we report scores in the Tetris problem that improve on existing dynamicprogramming based results by an order of magnitude. (C) 2014 Elsevier Ltd. All rights reserved.
In recent years, the research on reinforcement learning (RL) has focused on function approximation in learning prediction and control of Markov decision processes (MDPs). The usage of function approximation techniques...
详细信息
In recent years, the research on reinforcement learning (RL) has focused on function approximation in learning prediction and control of Markov decision processes (MDPs). The usage of function approximation techniques in RL will be essential to deal with MDPs with large or continuous state and action spaces. In this paper, a comprehensive survey is given on recent developments in RL algorithms with function approximation. From a theoretical point of view, the convergence and feature representation of RL algorithms are analyzed. From an empirical aspect, the performance of different RL algorithms was evaluated and compared in several benchmark learning prediction and learning control tasks. The applications of RL with function approximation are also discussed. At last, future works on RL with function approximation are suggested. (C) 2013 Elsevier Inc. All rights reserved.
Cyber-Physical Systems (CPSs) resulting from the interconnection of computational, communication, and control (cyber) devices with physical processes are wide spreading in our society. In several CPS applications it i...
详细信息
Cyber-Physical Systems (CPSs) resulting from the interconnection of computational, communication, and control (cyber) devices with physical processes are wide spreading in our society. In several CPS applications it is crucial to minimize the communication burden, while still providing desirable closed-loop control properties. To this effect, a promising approach is to embrace the recently proposed event-triggered control paradigm, in which the transmission times are chosen based on well-defined events, using state information. However, few general event-triggered control methods guarantee closed-loop improvements over traditional periodic transmission strategies. Here, we provide a new class of event-triggered controllers for linear systems which guarantee better quadratic performance than traditional periodic time-triggered control using the same average transmission rate. In particular, our main results explicitly quantify the obtained performance improvements for quadratic average cost problems. The proposed controllers are inspired by rollout ideas in the context of dynamicprogramming.
In this paper, we develop an integral reinforcement learning algorithm based on policy iteration to learn online the Nash equilibrium solution for a two-player zero-sum differential game with completely unknown linear...
详细信息
In this paper, we develop an integral reinforcement learning algorithm based on policy iteration to learn online the Nash equilibrium solution for a two-player zero-sum differential game with completely unknown linear continuous-time dynamics. This algorithm is a fully model-free method solving the game algebraic Riccati equation forward in time. The developed algorithm updates value function, control and disturbance policies simultaneously. The convergence of the algorithm is demonstrated to be equivalent to Newton's method. To implement this algorithm, one critic network and two action networks are used to approximate the game value function, control and disturbance policies, respectively, and the least squares method is used to estimate the unknown parameters. The effectiveness of the developed scheme is demonstrated in the simulation by designing an H-infinity state feedback controller for a power system. Note to Practitioners-Noncooperative zero-sum differential game provides an ideal tool to study multiplayer optimal decision and control problems. Existing approaches usually solve the Nash equilibrium solution by means of offline iterative computation, and require the exact knowledge of the system dynamics. However, it is difficult to obtain the exact knowledge of the system dynamics for many real-world industrial systems. The algorithm developed in this paper is a fully model-free method which solves the zero-sum differential game problem forward in time by making use of online measured data. This method is not affected by errors between an identification model and a real system, and responds fast to changes of the system dynamics. Exploration signals are required to satisfy the persistence of excitation condition to update the value function and the policies, and these signals do not affect the convergence of the learning process. The least squares method is used to obtain the approximate solution for the zero-sum games with unknown dynamics. The developed a
暂无评论