In this paper, we present a stochastic model for the dynamic fleet management problem with random travel times. Our approach decomposes the problem into time-staged subproblems by formulating it as a dynamic program a...
详细信息
In this paper, we present a stochastic model for the dynamic fleet management problem with random travel times. Our approach decomposes the problem into time-staged subproblems by formulating it as a dynamic program and uses approximations of the value function. In order to deal with random travel times, the state variable of our dynamic program includes all individual decisions over a relevant portion of the history. We show how to approximate the value function in a tractable manner under this new high-dimensional state variable. Under our approximation scheme, the subproblem for each time period decomposes with respect to locations, making our model very appealing for large-scale applications. Numerical work shows that the proposed approach provides high-quality solutions and performs significantly better than standard benchmark methods. (c) 2005 Elsevier B.V. All rights reserved.
The increasing complexity of the modern power grid highlights the need for advanced modeling and control techniques for effective control of excitation and turbine systems. The crucial factors affecting the modern pow...
详细信息
ISBN:
(纸本)9781424404926
The increasing complexity of the modern power grid highlights the need for advanced modeling and control techniques for effective control of excitation and turbine systems. The crucial factors affecting the modern power systems today is voltage control and system stabilization during small and large disturbances. Simulation studies and real-time laboratory experimental studies carried out are described and the results show the successful control of the power system excitation and turbine systems with adaptive and optimal neurocontrol approaches. Performances of the neurocontrollers are compared with the conventional PI controllers for damping under different operating conditions for small and large disturbances.
In the present paper, a call admission control scheme that can learn from the network environment and user behavior is developed for code division multiple access (CDMA) cellular networks that handle both voice and da...
详细信息
In the present paper, a call admission control scheme that can learn from the network environment and user behavior is developed for code division multiple access (CDMA) cellular networks that handle both voice and data services. The idea is built upon a novel learning control architecture with only a single module instead of two or three modules in adaptive critic designs (ACDs). The use of adaptive critic approach for call admission control in wireless cellular networks is new. The call admission controller can perform learning in real-time as well as in offline environments and the controller improves its performance as it gains more experience. Another important contribution in the present work is the choice of utility function for the present self-learning control approach which makes the present learning process much more efficient than existing learning control methods. The performance of our algorithm will be shown through computer simulation and compared with existing algorithms.
In this work, we present an approximate value iteration algorithm for a production and storage model with multiple production stages and a single final product, subject to random demand. We use linear function approxi...
详细信息
ISBN:
(纸本)078039044X
In this work, we present an approximate value iteration algorithm for a production and storage model with multiple production stages and a single final product, subject to random demand. We use linear function approximation schemes in subsets of the state space and represent a few key states in a look-up table form. We obtain some promising results and perform sensitivity analysis with respect to the parameters of the algorithm for the benchmark problem studied.
We demonstrate the possibility of improving on theoretically-optimal fixed policies for control of physical inventory systems in a non-stationary fitness terrain, based on the combined application of evolutionary sear...
详细信息
ISBN:
(纸本)0780370449
We demonstrate the possibility of improving on theoretically-optimal fixed policies for control of physical inventory systems in a non-stationary fitness terrain, based on the combined application of evolutionary search and adaptive critic terrain following. We show that adaptive critic based approximate dynamic programming techniques based on plant-controller Jacobeans can be used with systems characterized by discrete valued states and controls. Improvements over the best fixed policies (found using either an LP model or a genetic algorithm) in a high-penalty environment, average 83% under conditions both of stationary and non-stationary demand using real world data.
There has been considerable recent interest in the dynamic vehicle routing problem, but the complexities of this problem class have generally restricted research to myopic models. In this paper, we address the simpler...
详细信息
There has been considerable recent interest in the dynamic vehicle routing problem, but the complexities of this problem class have generally restricted research to myopic models. In this paper, we address the simpler dynamic assignment problem, where a resource (container, vehicle, or driver) can serve only one task at a time. We propose a very general class of dynamic assignment models, and propose an adaptive, nonmyopic algorithm that involves iteratively solving sequences of assignment problems no larger than what would be required of a myopic model. We consider problems where the attribute space of future resources and tasks is small enough to be enumerated, and propose a hierarchical aggregation strategy for problems where the attribute spaces are too large to be enumerated. Finally, we use the formulation to also test the value of advance information, which offers a more realistic estimate over studies that use purely myopic models.
This paper advances a neural-network-based approximate dynamic programming control mechanism that can be applied to complex control problems such as helicopter flight control design. Based on direct neural dynamic pro...
详细信息
This paper advances a neural-network-based approximate dynamic programming control mechanism that can be applied to complex control problems such as helicopter flight control design. Based on direct neural dynamicprogramming (DNDP), an approximate dynamic programming methodology, the control system is tailored to learn to maneuver a helicopter. The paper consists of a comprehensive treatise of this DNDP-based tracking control framework and extensive simulation studies for an Apache helicopter. A trim network is developed and seamlessly integrated into the neural dynamicprogramming (NDP) controller as part of a baseline structure for controlling complex nonlinear systems such as a helicopter. Design robustness is addressed by performing simulations under various disturbance conditions. All designs are tested using FLYRT, a sophisticated industrial scale nonlinear,validated model of the Apache helicopter. This is probably the first time that an approximate dynamic programming methodology has been systematically applied to, and evaluated on, a complex, continuous state, multiple-input-multiple-output nonlinear system with uncertainty. Though illustrated for helicopters, the DNDP control system framework should be applicable to general purpose tracking control.
A set of neural networks is employed to develop control policies that are better than fixed, theoretically optimal policies, when applied to a combined physical inventory and distribution system in a nonstationary dem...
详细信息
A set of neural networks is employed to develop control policies that are better than fixed, theoretically optimal policies, when applied to a combined physical inventory and distribution system in a nonstationary demand environment. Specifically, we show that model-based adaptive critic approximate dynamic programming techniques can be used with systems characterized by discrete valued states and controls. The control policies embodied by the trained neural networks outperformed the best, fixed policies (found by either linear programming or genetic algorithms) in a high-penalty cost environment with time-varying demand.
This paper focuses on the problem of providing real-time, closed-loop feedback control of Joint Air Operations (JAO) via near-optimal mission assignments. For this application, a rollout algorithm is employed which is...
详细信息
ISBN:
(纸本)0780364953
This paper focuses on the problem of providing real-time, closed-loop feedback control of Joint Air Operations (JAO) via near-optimal mission assignments. For this application, a rollout algorithm is employed which is based on the theory of stochastic dynamicprogramming. The primary benefits of this technology are agile and stable control of distributed stochastic systems. The rollout algorithm is applied to a small JAO scenario that includes limited assets, risk/reward that is dependent on mission composition, basic threat avoidance routing, and multiple targets, some of which are fleeting and emerging. Simulation results illustrate the benefits of the closed-loop feedback control. It is shown that the rollout strategy provides statistically significant performance improvements over an open-loop feedback strategy that uses the same baseline heuristic. The performance improvements are attributed to the fact that the rollout algorithm was able to learn near-optimal behaviors that were not modeled in the baseline heuristic.
The convergence properties for reinforcement learning approaches, such as temporal differences and Q-learning, have been established under moderate assumptions for discrete state and action spaces. In practice, howeve...
详细信息
The convergence properties for reinforcement learning approaches, such as temporal differences and Q-learning, have been established under moderate assumptions for discrete state and action spaces. In practice, however, many systems have either continuous action spaces or a large number of discrete elements. This paper presents an approximate dynamic programming approach to reinforcement learning for continuous action set-point regulator problems, which learns near-optimal control policies based on scalar performance measures. The continuous-action space (CAS) algorithm uses derivative-free line search methods to obtain the optimal action in the continuous space. The theoretical convergence properties of the algorithm are presented. Several heuristic stopping criteria are investigated and practical application is illustrated by two example problems -the inverted pendulum balancing problem and the power system stabilization problem.
暂无评论