In this paper, we mainly propose an online learning method for adaptive traffic signal control in a multi-intersection system. The method uses approximate dynamic programming (ADP) to achieve a near-optimal solution o...
详细信息
ISBN:
(纸本)9781509001637
In this paper, we mainly propose an online learning method for adaptive traffic signal control in a multi-intersection system. The method uses approximate dynamic programming (ADP) to achieve a near-optimal solution of the signal optimization in a distributed network, which is modeled in a microscopic way. The traffic network loading model and traffic signal control model are presented to serve as the basis of discrete-time control environment. The learning process of linear function approximation in ADP approach adopts the tunable parameters of the traffic states, including the vehicle queue length and the signal indication. ADP overcomes the computational complexity, which usually appears in large-scale problems solved by exact algorithms, such as dynamicprogramming. Moreover, the proposed adaptive phase sequence (APS) mode improves the performance by comparing with other control methods. The results in simulation show that our method performs quite well for adaptive traffic signal control problem.
This paper proposes an approximate dynamic programming (ADP) based approach to evaluate the effective load carrying capability (ELCC) of high penetration renewable resources by solving the long-term security-constrain...
详细信息
ISBN:
(纸本)9781467380409
This paper proposes an approximate dynamic programming (ADP) based approach to evaluate the effective load carrying capability (ELCC) of high penetration renewable resources by solving the long-term security-constrained unit commitment (SCUC) problem with various uncertainties related to solar radiation, wind speed, and load level. Compared with traditional approaches, the proposed approach can assist Independent System Operator (ISO) to make the decision on the basis of current day information only, hence it can reduce the computation burden from future states forecasting. The objective of the proposed long-term SCUC formulation is to minimize the operation cost for the base case with forecast values while considering variable cost from uncertainties. Numerical case studies on a 6-bus system illustrate the effectiveness of the proposed ADP based long-term SCUC model for the investigation of ELCC under various uncertainties.
A policy iteration method is proposed to solve the optimal tracking control of continuous-time systems based on HJB equation. The performance index function is composed by the state tracking error and the tracking con...
详细信息
ISBN:
(纸本)9789881563897
A policy iteration method is proposed to solve the optimal tracking control of continuous-time systems based on HJB equation. The performance index function is composed by the state tracking error and the tracking control error. The iterative performance index function and the iterative control are obtained by the presented policy iteration. It is proven that the iterative control makes the system asymptotic stability, and the iterative performance index function is convergent. Simulation study demonstrates that the effectiveness of the proposed optimal tracking control method.
In this paper, an iterative adaptive dynamicprogramming (ADP) algorithm is developed to solve the optimal cooperative control problems for residential multi-battery systems. To avoid solving high-dimensional optimal ...
详细信息
ISBN:
(纸本)9781479970162
In this paper, an iterative adaptive dynamicprogramming (ADP) algorithm is developed to solve the optimal cooperative control problems for residential multi-battery systems. To avoid solving high-dimensional optimal control problems, we first constrain all the batteries at their worst performance, which transforms the multi-input optimal control problem into a single-input one. Based on the worst-performance optimal control law, the optimal cooperative control law for the residential multi-battery systems is obtained, where in each iteration, only a single-input optimization problem is implemented. Finally, numerical results are given to illustrate the performance of the developed algorithm.
This thesis describes the research conducted to support the optimal selection of a portfolio of military solutions during wartime. During peacetime, the United States military selects a portfolio of military solutions...
详细信息
This thesis describes the research conducted to support the optimal selection of a portfolio of military solutions during wartime. During peacetime, the United States military selects a portfolio of military solutions holistically as part of an annual budgetary planning cycle supported by long-term planning. During wartime, this annual review and decision process is not responsive to the rapid cycles of battlefield adaptation and the resulting exploitation of opposing capability gaps by adversaries. The long-running vulnerability of U.S. forces in Iraq and Afghanistan to Improvised Explosive Device (IED) attacks, as the threat's tactics and techniques evolved during these conflicts, provides a poignant example. During conflict, opportunities to improve the force arrive irregularly over time and are difficult to anticipate. When potential solutions are identified, these must be rapidly pursued, subject to resource constraints. Bad decisions early rob resources from better opportunities that arrive later, while good opportunities unduly delayed may lead to lost opportunities on the battlefield. We develop quantitative methods to support decision makers in the optimal selection of solutions in this context, employing as our motivating case study the challenges faced by the United States Department of Defense's Joint Improvised Explosive Device Defeat Organization (JIEDDO). For example, in fiscal year 2013, the JIEDDO budget was $1.6 billion. This organization had to make counter-IED solution selection decisions continuously as these arrived, without knowing precisely what other opportunities might occur in the future months. Two key aspects of this problem are how to measure the military value of potential solutions, and how to make the best choices as opportunities arrive. Correspondingly, this dissertation examines these two sides: valuation of war-fighting solutions in an uncertain and time-sensitive context, and, given a valuation method, methods for optimal sequen
In this paper, a novel Q-learning based policy iteration adaptive dynamicprogramming (ADP) algorithm is developed to solve the optimal control problems for discrete-time nonlinear systems. The idea is to use a policy...
详细信息
ISBN:
(纸本)9783319253930;9783319253923
In this paper, a novel Q-learning based policy iteration adaptive dynamicprogramming (ADP) algorithm is developed to solve the optimal control problems for discrete-time nonlinear systems. The idea is to use a policy iteration ADP technique to construct the iterative control law which stabilizes the system and simultaneously minimizes the iterative Q function. Convergence property is analyzed to show that the iterative Q function is monotonically non-increasing and converges to the solution of the optimality equation. Finally, simulation results are presented to show the performance of the developed algorithm.
An adaptive feedback velocity control laws is proposed for the kinematic system of nonholonomic mobile robot to guarantee that the posture error of tracking a reference trajectory asymptotically stable in optimally wa...
详细信息
ISBN:
(纸本)9781467371896
An adaptive feedback velocity control laws is proposed for the kinematic system of nonholonomic mobile robot to guarantee that the posture error of tracking a reference trajectory asymptotically stable in optimally way. We transformed the tracking problem into a regulation problem by redefining new system states and inputs to make the pre-defined cost function finite, then proposed an online policy iteration (PI) algorithm by using a single neural networks (NNs) to solve of Hamilton-Jacobi-Bellman (HJB) equation approximately. This method learns online in real-time to approximate the cost function and then the adaptive optimal controller can be computed directly according to the cost function. Simulation results are provided to demonstrate the effectiveness of the proposed approach.
In this paper, a novel iterative Q-learning algorithm, called "policy iteration based deterministic Qlearning algorithm", is developed to solve the optimal control problems for discrete-time deterministic no...
详细信息
In this paper, a novel iterative Q-learning algorithm, called "policy iteration based deterministic Qlearning algorithm", is developed to solve the optimal control problems for discrete-time deterministic nonlinear systems. The idea is to use an iterative adaptive dynamicprogramming(ADP) technique to construct the iterative control law which optimizes the iterative Q function. When the optimal Q function is obtained, the optimal control law can be achieved by directly minimizing the optimal Q function, where the mathematical model of the system is not necessary. Convergence property is analyzed to show that the iterative Q function is monotonically non-increasing and converges to the solution of the optimality equation. It is also proven that any of the iterative control laws is a stable control law. Neural networks are employed to implement the policy iteration based deterministic Q-learning algorithm, by approximating the iterative Q function and the iterative control law, respectively. Finally, two simulation examples are presented to illustrate the performance of the developed algorithm.
We addressed the problem of developing a model to simulate at a high level of detail the movements of over 6,000 drivers for Schneider National, the largest truckload motor carrier in the United States. The goal of th...
详细信息
We addressed the problem of developing a model to simulate at a high level of detail the movements of over 6,000 drivers for Schneider National, the largest truckload motor carrier in the United States. The goal of the model was not to obtain a better solution but rather to closely match a number of operational statistics. In addition to the need to capture a wide range of operational issues, the model had to match the performance of a highly skilled group of dispatchers while also returning the marginal value of drivers domiciled at different locations. These requirements dictated that it was not enough to optimize at each point in time (something that could be easily handled by a simulation model) but also over time. The project required bringing together years of research in approximate dynamic programming, merging math programming with machine learning, to solve dynamic programs with extremely high-dimensional state variables. The result was a model that closely calibrated against real-world operations and produced accurate estimates of the marginal value of 300 different types of drivers.
approximate dynamic programming (ADP) is a broad umbrella for a modeling and algorithmic strategy for solving problems that are sometimes large and complex, and are usually (but not always) stochastic. It is most ofte...
详细信息
approximate dynamic programming (ADP) is a broad umbrella for a modeling and algorithmic strategy for solving problems that are sometimes large and complex, and are usually (but not always) stochastic. It is most often presented as a method for overcoming the classic curse of dimensionality that is well-known to plague the use of Bellman's equation. For many problems, there are actually up to three curses of dimensionality. But the richer message of approximate dynamic programming is learning what to learn, and how to learn it, to make better decisions over time. This article provides a brief review of approximate dynamic programming, without intending to be a complete tutorial. Instead, our goal is to provide a broader perspective of ADP and how it should be approached from the perspective of different problem classes. (C) 2009 Wiley Periodicals, Inc. Naval Research Logistics 56: 239-249,2009
暂无评论