The simplicity of approximate dynamic programming offers benefits for large-scale systems compared to other synthesis and control methodologies. A common technique to approximate the dynamic Program, is through the so...
详细信息
ISBN:
(纸本)9781509025923
The simplicity of approximate dynamic programming offers benefits for large-scale systems compared to other synthesis and control methodologies. A common technique to approximate the dynamic Program, is through the solution of the corresponding Linear Program. The major drawback of this approach is that the online performance is very sensitive to the choice of tuning parameters, in particular the state relevance weighting parameter. Our work aims at alleviating this sensitivity. To achieve this, we propose to find a set of approximate Q-functions, each for a different choice of the tuning parameters, and then to use the pointwise maximum of the set of Q-functions for the online policy. The pointwise maximum promises to be better than using only one of individual Q-functions for the online policy. We demonstrate that this approach immunizes against tuning errors through a stylized portfolio optimization problem.
Optimal output synchronization of multi-agent leader-follower systems is considered. The agents are assumed heterogeneous so that the dynamics may be non-identical. An optimal control protocol is designed for each age...
详细信息
ISBN:
(纸本)9781509006212
Optimal output synchronization of multi-agent leader-follower systems is considered. The agents are assumed heterogeneous so that the dynamics may be non-identical. An optimal control protocol is designed for each agent based on the leader state and the agent local state. A distributed observer is designed to provide the leader state for each agent. A model-free approximate dynamic programming algorithm is then developed to solve the optimal output synchronization problem online in real time. No knowledge of the agents' dynamics is required. The proposed approach does not require explicitly solving of the output regulator equations, though it implicitly solves them by imposing optimality. A simulation example verifies the suitability of the proposed approach.
Project scheduling problems with both resource constraints and uncertain task durations have applications in a variety of industries. While the existing research literature has been focusing on finding an a priori ope...
详细信息
Project scheduling problems with both resource constraints and uncertain task durations have applications in a variety of industries. While the existing research literature has been focusing on finding an a priori open-loop task sequence that minimizes the expected makespan, finding a dynamic and adaptive closed-loop policy has been regarded as being computationally intractable. In this research, we develop effective and efficient approximate dynamic programming (ADP) algorithms based on the rollout policy for this category of stochastic scheduling problems. To enhance performance of the rollout algorithm, we employ constraint programming (CP) to improve the performance of base policy offered by a priority-rule heuristic. We further devise a hybrid ADP framework that integrates both the look-back and look-ahead approximation architectures, to simultaneously achieve both the quality of a rollout (look-ahead) policy to sequentially improve a task sequence, and the efficiency of a lookup table (look-back) approach. Computational results on the benchmark instances show that our hybrid ADP algorithm is able to obtain competitive solutions with the state-of-the-art algorithms in reasonable computational time. It performs particularly well for instances with non-symmetric probability distribution of task durations. (C) 2015 Elsevier B.V. and Association of European Operational Research Societies (EURO) within the International Federation of Operational Research Societies (IFORS). All rights reserved.
There is growing interest in the use of grid-level storage to smooth variations in supply that are likely to arise with an increased use of wind and solar energy. Energy arbitrage, the process of buying, storing, and ...
详细信息
There is growing interest in the use of grid-level storage to smooth variations in supply that are likely to arise with an increased use of wind and solar energy. Energy arbitrage, the process of buying, storing, and selling electricity to exploit variations in electricity spot prices, is becoming an important way of paying for expensive investments into grid-level storage. Independent system operators such as the New York Independent System Operator (NYISO) require that battery storage operators place bids into an hour-ahead market (although settlements may occur in increments as small as five minutes, which is considered near "real-time"). The operator has to place these bids without knowing the energy level in the battery at the beginning of the hour and simultaneously accounting for the value of leftover energy at the end of the hour. The problem is formulated as a dynamic program. We describe and employ a convergent approximate dynamic programming (ADP) algorithm that exploits monotonicity of the value function to find a revenue-generating bidding policy;using optimal benchmarks, we empirically show the computational benefits of the algorithm. Furthermore, we propose a distribution-free variant of the ADP algorithm that does not require any knowledge of the distribution of the price process (and makes no assumptions regarding a specific real-time price model). We demonstrate that a policy trained on historical real-time price data from the NYISO using this distribution-free approach is indeed effective.
A novel quantum-inspired approximate dynamic programming algorithm(ADP) is proposed for solving unit commitment(UC) problems. The quantum computing theory is applied to tackle some new issues rising from ADP. In detai...
详细信息
A novel quantum-inspired approximate dynamic programming algorithm(ADP) is proposed for solving unit commitment(UC) problems. The quantum computing theory is applied to tackle some new issues rising from ADP. In details, the unit states in UC problems are expressed by the quantum superposition. Then, the collapsing principle of quantum measurement is applied to solve the Bellman equation of ADP speedily. Based on the quantum rotation gate, the pre-decision states of the ADP are generated by quantum amplitude amplification technology. In the proposal algorithm, the quantum computation balances between state space exploration and exploitation automatically. Test cases of UC are performed to verify the feasibility of the proposal approximate algorithm for the range of 10 to 100 units with 24-hour with ramp rate constraints. The experimental results show that the quantum-inspired ADP algorithm can find the better sub-optimal solutions of large scale UC problems within a reasonable time.
In this paper, we propose a finite-horizon neuro-optimal tracking control strategy for a class of discrete-time linear systems. In applying the iterative approximate dynamic programming (ADP) algorithm to determine th...
详细信息
In this paper, we propose a finite-horizon neuro-optimal tracking control strategy for a class of discrete-time linear systems. In applying the iterative approximate dynamic programming (ADP) algorithm to determine the optimal tracking control law for linear systems, we need finite iterations to obtain the result in practical applications, instead of infinite iterations. An epsilon-error bound is introduced into the ADP algorithm to determine the number of iteration steps. The approximation optimal tracking control law will approach the solution of the Hamilton-Jacobi-Bellman (HJB) equation through a self-adaptive iteration within the given value of epsilon-error bound. epsilon error bound is used to stop the iteration process. So, we can obtain the epsilon-approximation tracking control law in a finite number of iterations. Nevertheless, different epsilon will produce different control performances. Furthermore, we will find an optimal epsilon error bound, which can obtain optimal performance of the ADP algorithm on the basis of the controlled system tracking the desired trajectory. One example is included to complete the ADP algorithm under different error bounds. From the simulation results, we can find the optimal epsilon error bound. Finally, the simulation validates the efficiency of the proposed algorithm.
Many sequential decision problems can be formulated as Markov decision processes (MDPs) where the optimal value function (or cost-to-go function) can be shown to satisfy a monotone structure in some or all of its dime...
详细信息
Many sequential decision problems can be formulated as Markov decision processes (MDPs) where the optimal value function (or cost-to-go function) can be shown to satisfy a monotone structure in some or all of its dimensions. When the state space becomes large, traditional techniques, such as the backward dynamicprogramming algorithm (i. e., backward induction or value iteration), may no longer be effective in finding a solution within a reasonable time frame, and thus we are forced to consider other approaches, such as approximate dynamic programming (ADP). We propose a provably convergent ADP algorithm called Monotone-ADP that exploits the monotonicity of the value functions to increase the rate of convergence. In this paper, we describe a general finite-horizon problem setting where the optimal value function is monotone, present a convergence proof for Monotone-ADP under various technical assumptions, and show numerical results for three application domains: optimal stopping, energy storage/allocation, and glycemic control for diabetes patients. The empirical results indicate that by taking advantage of monotonicity, we can attain high quality solutions within a relatively small number of iterations, using up to two orders of magnitude less computation than is needed to compute the optimal solution exactly.
Recently, the optimization of power flows in portable hybrid power-supply systems (HPSSs) has become an important issue with the advent of a variety of mobile systems and hybrid energy technologies. In this paper, a c...
详细信息
Recently, the optimization of power flows in portable hybrid power-supply systems (HPSSs) has become an important issue with the advent of a variety of mobile systems and hybrid energy technologies. In this paper, a control strategy is considered for dynamically managing power flows in portable HPSSs employing batteries and supercapacitors. Our dynamic power management strategy utilizes the concept of approximate dynamic programming (ADP). ADP methods are important tools in the fields of stochastic control and machine learning, and the utilization of these tools for practical engineering problems is now an active and promising research field. We propose an ADP-based procedure based on optimization under constraints including the iterated Bellman inequalities, which can be solved by convex optimization carried out offline, to find the optimal power management rules for portable HPSSs. The effectiveness of the proposed procedure is tested through dynamic simulations for smartphone workload scenarios, and simulation results show that the proposed strategy can successfully cope with uncertain workload demands.
We consider an approximate dynamic programming heuristic to support the selection of defense projects when projects have different values and are originated intermittently but fairly frequently. We show that a simple ...
详细信息
We consider an approximate dynamic programming heuristic to support the selection of defense projects when projects have different values and are originated intermittently but fairly frequently. We show that a simple policy reserving a positive fraction of the available budget for high-value projects not yet originated is superior to a greedy knapsack approach.
We study an extension of the delivery dispatching problem (DDP) with time windows, applied on LTL orders arriving at an urban consolidation center. Order properties (e.g., destination, size, dispatch window) may be hi...
详细信息
ISBN:
(纸本)9783319242644;9783319242637
We study an extension of the delivery dispatching problem (DDP) with time windows, applied on LTL orders arriving at an urban consolidation center. Order properties (e.g., destination, size, dispatch window) may be highly varying, and directly distributing an incoming order batch may yield high costs. Instead, the hub operator may wait to consolidate with future arrivals. A consolidation policy is required to decide which orders to ship and which orders to hold. We model the dispatching problem as a Markov decision problem. dynamicprogramming (DP) is applied to solve toy-sized instances to optimality. For larger instances, we propose an approximate dynamic programming (ADP) approach. Through numerical experiments, we show that ADP closely approximates the optimal values for small instances, and outperforms two myopic benchmark policies for larger instances. We contribute to literature by (i) formulating a DDP with dispatch windows and (ii) proposing an approach to solve this DDP.
暂无评论