We investigate the optimum control of a stochastic system, in the presence of both exogenous (control-independent) stochastic state variables and endogenous (control-dependent) state variables. Our solution approach r...
详细信息
We investigate the optimum control of a stochastic system, in the presence of both exogenous (control-independent) stochastic state variables and endogenous (control-dependent) state variables. Our solution approach relies on simulations and regressions with respect to the state variables, but also grafts the endogenous state variable into the simulation paths. That is, unlike most other simulation approaches found in the literature, no discretization of the endogenous variable is required. The approach is meant to handle several stochastic variables, offers a high level of flexibility in their modeling, and should be at its best in non time-homogenous cases, when the optimal policy structure changes with time. We provide numerical results for a dam-based hydropower application, where the exogenous variable is the stochastic spot price of power, and the endogenous variable is the water level in the reservoir. (C) 2013 Elsevier Ltd. All rights reserved.
This work considers enhancing the stability and improving the economic performance of nonlinear model predictive control in the presence of disturbances or model uncertainties. First, a robust control Lyapunov functio...
详细信息
This work considers enhancing the stability and improving the economic performance of nonlinear model predictive control in the presence of disturbances or model uncertainties. First, a robust control Lyapunov function (RCLF)-based predictive control strategy is proposed. Second, the approximate dynamic programming (ADP) is employed to further improve regulation performance. Finally, the ADP and RCLF-MPC are combined to provide a switching control scheme, which is illustrated on a CSTR example to show its effectiveness. (C) 2013 Elsevier Ltd. All rights reserved.
Model predictive control has been a major success story in process control. More recently, the methodology has been used in other contexts, including automotive engine control, power electronics and telecommunications...
详细信息
Model predictive control has been a major success story in process control. More recently, the methodology has been used in other contexts, including automotive engine control, power electronics and telecommunications. Most applications focus on set-point tracking and use single-sequence optimisation. Here we consider an alternative class of problems motivated by the scheduling of emergency vehicles. Here disturbances are the dominant feature. We develop a novel closed-loop model predictive control strategy aimed at this class of problems. We motivate, and illustrate, the ideas via the problem of fluid deployment of ambulance resources.
Both femtocells and cognitive radio (CR) are envisioned as promising technologies for the NeXt Generation (xG) cellular networks. Cognitive femtocell networks (CogFem) incorporate CR technology into femtocell deployme...
详细信息
Both femtocells and cognitive radio (CR) are envisioned as promising technologies for the NeXt Generation (xG) cellular networks. Cognitive femtocell networks (CogFem) incorporate CR technology into femtocell deployment to reduce its demand for more spectrum bands, thereby improving the spectrum utilization. In this paper, we focus on the channel allocation problem in CogFem, and formulate it as a stochastic dynamicprogramming (SDP) problem aiming at optimizing the long-term cumulative system throughput of individual femtocells. However, the multi-dimensional state variables resulted from complex exogenous stochastic information make the SDP problem computationally intractable using standard value iteration algorithms. To address this issue, we propose an approximate dynamic programming (ADP) algorithm in pursuit of an approximate solution to the SDP problem. The proposed ADP algorithm relies on an efficient value function approximation (VFA) architecture that we design and a stochastic gradient learning strategy to function, enabling each femtocell to learn and improve its own channel allocation policy. The algorithm is computationally attractive for large-scale downlink channel allocation problems in CogFem since its time complexity does not grow exponentially with the number of femtocells. Simulation results have shown that the proposed ADP algorithm exhibits great advantages: (1) it is feasible for online implementation with a fair rate of convergence and adaptability to both long-term and short-term network dynamics;and (2) it produces high-quality solutions fast, reaching approximately 80% of the upper bounds provided by optimal backward dynamicprogramming (DP) solutions to a set of deterministic counterparts of the formulated SDP problem. (C) 2013 Elsevier B.V. All rights reserved.
In this paper, the adaptive dynamicprogramming (ADP) approach is utilized to design a neural-network-based optimal controller for a class of unknown discrete-time nonlinear systems with quadratic cost function. To be...
详细信息
In this paper, the adaptive dynamicprogramming (ADP) approach is utilized to design a neural-network-based optimal controller for a class of unknown discrete-time nonlinear systems with quadratic cost function. To begin with, a neural network identifier is constructed to learn the unknown dynamic system with stability proof. Then, the iterative ADP algorithm is developed to handle the nonlinear optimal control problem with convergence analysis. Moreover, the single network dual heuristic dynamicprogramming (SN-DHP) technique, which eliminates the use of action network, is introduced to implement the iterative ADP algorithm. Finally, two simulation examples are included to illustrate the effectiveness of the present approach. (C) 2013 Elsevier B.V. All rights reserved.
The employment of intelligent energy management systems likely allows reducing consumptions and thus saving money for consumers. The residential load demand must be met, and some advantages can be obtained if specific...
详细信息
The employment of intelligent energy management systems likely allows reducing consumptions and thus saving money for consumers. The residential load demand must be met, and some advantages can be obtained if specific optimization policies are taken. With an efficient use of renewable sources and power imported from the grid, an intelligent and adaptive system which manages the battery is able to satisfy the load demand and minimize the entire energy cost related to the scenario under study. In this paper, an adaptive dynamicprogramming-based algorithm is presented to face dynamic situations, in which some conditions of the environment or habits of customer may vary with time, especially using renewable energy. Based on the idea of smart grid, we propose an intelligent management scheme for renewable resources combined with battery implemented with a faster and simpler scheme of dynamicprogramming, by considering only one critic network and some optimization policies in order to satisfy the load demand. Since this kind of problem is suitable to avoid the training of an action network, the training loop among the two neural networks is deleted and the training process is greatly simplified. Computer simulations confirm the effectiveness of this self-learning design in a typical residential scenario.
dynamic pricing for a network of resources over a finite selling horizon has received considerable attention in recent years, yet few papers provide effective computational approaches to solve the problem. We consider...
详细信息
dynamic pricing for a network of resources over a finite selling horizon has received considerable attention in recent years, yet few papers provide effective computational approaches to solve the problem. We consider a resource decomposition approach to solve the problem and investigate the performance of the approach in a computational study. We compare the performance of the approach to static pricing and choice-based availability control. Our numerical results show that dynamic pricing policies from network resource decomposition can achieve significant revenue lift compared with choice-based availability control and static pricing, even when the latter is frequently resolved. As a by-product of our approach, network decomposition provides an upper bound in revenue, which is provably tighter than the well-known upper bound from a deterministic approximation.
An online adaptive reinforcement learning-based solution is developed for the infinite-horizon optimal control problem for continuous-time uncertain nonlinear systems. A novel actor-critic-identifier (ACI) is proposed...
详细信息
An online adaptive reinforcement learning-based solution is developed for the infinite-horizon optimal control problem for continuous-time uncertain nonlinear systems. A novel actor-critic-identifier (ACI) is proposed to approximate the Hamilton-Jacobi-Bellman equation using three neural network (NN) structures actor and critic NNs approximate the optimal control and the optimal value function, respectively, and a robust dynamic neural network identifier asymptotically approximates the uncertain system dynamics. An advantage of using the ACI architecture is that learning by the actor, critic, and identifier is continuous and simultaneous, without requiring knowledge of system drift dynamics. Convergence of the algorithm is analyzed using Lyapunov-based adaptive control methods. A persistence of excitation condition is required to guarantee exponential convergence to a bounded region in the neighborhood of the optimal control and uniformly ultimately bounded (UUB) stability of the closed-loop system. Simulation results demonstrate the performance of the actor-critic-identifier method for approximate optimal control. (C) 2012 Elsevier Ltd. All rights reserved.
Markdown policies for product groups having significant cross-price elasticity among each other should be jointly determined. However, finding optimal policies for product groups becomes computationally intractable as...
详细信息
Markdown policies for product groups having significant cross-price elasticity among each other should be jointly determined. However, finding optimal policies for product groups becomes computationally intractable as the number of products increases. Therefore, we formulate the problem as a Markov decision process and use approximate dynamic programming approach to solve it. Since the state space is multidimensional and very large, the number of iterations required to learn the state values is enormous. Therefore, we use aggregation and neural networks in order to approximate the value function and to determine the optimal markdown policies approximately. In a numerical study, we provide insights on the behavior of markdown policies when one product is expensive, the other is cheap and both have the same price. We also provide insights and compare the markdown policies for the cases in which there is a substitution effect between products and the products are independent. (C) 2013 Elsevier B.V. All rights reserved.
A novel multi-objective adaptive dynamicprogramming (ADP) method is constructed to obtain the optimal controller of a class of nonlinear time-delay systems in this paper. Using the weighted sum technology, the origin...
详细信息
A novel multi-objective adaptive dynamicprogramming (ADP) method is constructed to obtain the optimal controller of a class of nonlinear time-delay systems in this paper. Using the weighted sum technology, the original multi-objective optimal control problem is transformed to the single one. An ADP method is established for nonlinear time-delay systems to solve the optimal control problem. To demonstrate that the presented iterative performance index function sequence is convergent and the closed-loop system is asymptotically stable, the convergence analysis is also given. The neural networks are used to get the approximative control policy and the approximative performance index function, respectively. Two simulation examples are presented to illustrate the performance of the presented optimal control method.
暂无评论