We consider the linear programming approach to approximate dynamic programming with an average cost objective and a finite state space. Using a Lagrangian form of the linear program (LP), the average cost error is sho...
详细信息
We consider the linear programming approach to approximate dynamic programming with an average cost objective and a finite state space. Using a Lagrangian form of the linear program (LP), the average cost error is shown to be a multiple of the best fit differential cost error. This result is analogous to previous error bounds for a discounted cost objective. Second, bounds are derived for average cost error and performance of the policy generated from the LP that involve the mixing time of the Markov decision process (MDP) under this policy or the optimal policy. These results improve on a previous performance bound involving mixing times.
In this paper, we mainly propose an online learning method for adaptive traffic signal control in a multi-intersection system. The method uses approximate dynamic programming (ADP) to achieve a near-optimal solution o...
详细信息
ISBN:
(纸本)9781509001644
In this paper, we mainly propose an online learning method for adaptive traffic signal control in a multi-intersection system. The method uses approximate dynamic programming (ADP) to achieve a near-optimal solution of the signal optimization in a distributed network, which is modeled in a microscopic way. The traffic network loading model and traffic signal control model are presented to serve as the basis of discrete-time control environment. The learning process of linear function approximation in ADP approach adopts the tunable parameters of the traffic states, including the vehicle queue length and the signal indication. ADP overcomes the computational complexity, which usually appears in large-scale problems solved by exact algorithms, such as dynamicprogramming. Moreover, the proposed adaptive phase sequence (APS) mode improves the performance by comparing with other control methods. The results in simulation show that our method performs quite well for adaptive traffic signal control problem.
In this paper a novel interference-based formulation and solution methodology for the problem of link scheduling in wireless mesh networks is proposed. Traditionally, this problem has been formulated as a deterministi...
详细信息
In this paper a novel interference-based formulation and solution methodology for the problem of link scheduling in wireless mesh networks is proposed. Traditionally, this problem has been formulated as a deterministic integer program, which has been shown to be NP-hard. The proposed formulation is based on dynamicprogramming and allows greater flexibility since dynamic and stochastic components of the problem can be embedded into the optimization framework. By temporal decomposition we reduce the size of the integer program and using approximate dynamic programming (ADP) methods we tackle the curse of dimensionality. The numerical results reveal that the proposed algorithm outperforms well-known heuristics under different network topologies. Finally, the proposed ADP methodology can be used not only as an upper bound but also as a generic framework where different heuristics can be integrated. (c) 2007 Elsevier Ltd. All rights reserved.
Modified policy iteration (MPI) is a dynamicprogramming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its ...
详细信息
Modified policy iteration (MPI) is a dynamicprogramming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are extensions of the well-known approximate DP algorithms: fitted-value iteration, fitted-Q iteration, and classification-based policy iteration. We provide error propagation analysis that unify those for approximate policy and value iteration. We develop the finite-sample analysis of these algorithms, which highlights the influence of their parameters. In the classification-based version of the algorithm (CBMPI), the analysis shows that MPI's main parameter controls the balance between the estimation error of the classifier and the overall value function approximation. We illustrate and evaluate the behavior of these new algorithms in the Mountain Car and Tetris problems. Remarkably, in Tetris, CBMPI outperforms the existing DP approaches by a large margin, and competes with the current state-of-the-art methods while using fewer samples.
In this paper, a new iterative adaptive dynamicprogramming (ADP) algorithm is developed to solve optimal control problems for infinite horizon discrete-time nonlinear systems with finite approximation errors. First, ...
详细信息
In this paper, a new iterative adaptive dynamicprogramming (ADP) algorithm is developed to solve optimal control problems for infinite horizon discrete-time nonlinear systems with finite approximation errors. First, a new generalized value iteration algorithm of ADP is developed to make the iterative performance index function converge to the solution of the Hamilton-Jacobi-Bellman equation. The generalized value iteration algorithm permits an arbitrary positive semi-definite function to initialize it, which overcomes the disadvantage of traditional value iteration algorithms. When the iterative control law and iterative performance index function in each iteration cannot accurately be obtained, for the first time a new "design method of the convergence criteria" for the finite-approximation-error-based generalized value iteration algorithm is established. A suitable approximation error can be designed adaptively to make the iterative performance index function converge to a finite neighborhood of the optimal performance index function. Neural networks are used to implement the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the developed method.
In this brief, a novel adaptive-critic-based neural network (NN) controller is investigated for nonlinear pure-feedback systems. The controller design is based on the transformed predictor form, and the actor-critic N...
详细信息
In this brief, a novel adaptive-critic-based neural network (NN) controller is investigated for nonlinear pure-feedback systems. The controller design is based on the transformed predictor form, and the actor-critic NN control architecture includes two NNs, whereas the critic NN is used to approximate the strategic utility function, and the action NN is employed to minimize both the strategic utility function and the tracking error. A deterministic learning technique has been employed to guarantee that the partial persistent excitation condition of internal states is satisfied during tracking control to a periodic reference orbit. The uniformly ultimate boundedness of closed-loop signals is shown via Lyapunov stability analysis. Simulation results are presented to demonstrate the effectiveness of the proposed control.
When the state dimension is large, classical approximate dynamic programming techniques may become computationally unfeasible, since the complexity of the algorithm grows exponentially with the state space size (curse...
详细信息
When the state dimension is large, classical approximate dynamic programming techniques may become computationally unfeasible, since the complexity of the algorithm grows exponentially with the state space size (curse of dimensionality). Policy search techniques are able to overcome this problem because, instead of estimating the value function over the entire state space, they search for the optimal control policy in a restricted parameterized policy space. This paper presents a new policy parametrization that exploits a single point (particle) to represent an entire region of the state space and can be tuned through a recently introduced policy gradient method with parameter-based exploration. Experiments demonstrate the superior performance of the proposed approach in high dimensional environments.
In this paper, a new generalized value iteration algorithm is developed to solve infinite horizon optimal control problems for discrete-time nonlinear systems. The idea is to use iterative adaptive dynamicprogramming...
详细信息
In this paper, a new generalized value iteration algorithm is developed to solve infinite horizon optimal control problems for discrete-time nonlinear systems. The idea is to use iterative adaptive dynamicprogramming (ADP) to obtain the iterative control law which makes the iterative performance index function reach the optimum. The generalized value iteration algorithm permits an arbitrary positive semi-definite function to initialize it, which overcomes the disadvantage of traditional value iteration algorithms. When the iterative control law and iterative performance index function in each iteration cannot be accurately obtained, a new design method of the convergence criterion for the generalized value iteration algorithm with finite approximation errors is established to make the iterative performance index functions converge to a finite neighborhood of the lowest bound of all performance index functions. Simulation results are given to illustrate the performance of the developed algorithm.
The dynamic planning and development of a large collection of systems or a ‘System of Systems’ (SoS) pose significant programmatic challenges due to the complex interactions that exist between its constituent system...
详细信息
The dynamic planning and development of a large collection of systems or a ‘System of Systems’ (SoS) pose significant programmatic challenges due to the complex interactions that exist between its constituent systems. Decisions to add, remove, or reconstitute connections between systems can result in repercussive failures across operational and developmental dimensions of an SoS. The work conducted in this research is part of a larger body of work funded by the DoD Systems Engineering Research Center (SERC) towards the development of an Analytic Workbench. This paper in particular develops a tool that adopts an operations research-based perspective to SoS level planning based on metrics of cost, performance, schedule and risk. Specifically, our work employs an approximate dynamic programming approach that is well suited to address issues of computational tractability of the resulting dynamic planning optimization problem. This approach allows for identification of near-optimal multi-stage decisions in evolving SoS architectures. A Naval Warfare Scenario SoS example problem illustrates application of the method.
This paper proposes an online optimal tracking algorithm to provide the desired voltage magnitude and frequency at the load. This eventually will work as a DC/AC inverter that with appropriate switching of semiconduct...
详细信息
ISBN:
(纸本)9781479932757
This paper proposes an online optimal tracking algorithm to provide the desired voltage magnitude and frequency at the load. This eventually will work as a DC/AC inverter that with appropriate switching of semiconductor devices will convert low DC voltage to high AC voltage. An L - C filter is used to reduce the effects caused by switching semiconductor devices. The proposed control scheme ensures a good tracking of an exosystem that provides the desired voltage magnitude and frequency. It builds upon the ideas of approximate dynamic programming (ADP) and uses only partial information of the system and exosystem. A Lyapunov stability proof ensures that the closed-loop system is asympotically stable. Finally, simulations show the effectiveness of the proposed approach.
暂无评论