In this paper, an event-triggered heuristic dynamicprogramming algorithm for discrete-time nonlinear systems with a novel triggering condition is studied. Different from traditional heuristic dynamicprogramming algo...
详细信息
In this paper, an event-triggered heuristic dynamicprogramming algorithm for discrete-time nonlinear systems with a novel triggering condition is studied. Different from traditional heuristic dynamicprogramming algorithms, the control law in this algorithm will only be updated when the triggering condition is satisfied to reduce the computational burden. Three neural networks are employed, which are model network, action network, and critic network. Model functions, control laws, and value functions are estimated using neural networks, respectively. The main contribution of this algorithm is the novel triggering condition with simpler form and fewer assumptions. Additionally, a proof of the stability for discrete-time systems using Lyapunov technique is given. Finally, two simulations are shown to verify the effectiveness of the developed algorithm.
Rapidly evolving infectious disease epidemics, such as the 2014 West African Ebola outbreak, pose significant health threats and present challenges to the global health community because of their heterogeneous geograp...
详细信息
Rapidly evolving infectious disease epidemics, such as the 2014 West African Ebola outbreak, pose significant health threats and present challenges to the global health community because of their heterogeneous geographic spread. Policy makers must allocate limited intervention resources quickly, in anticipation of where the outbreak is moving next. We develop a two-stage model for optimizing when and where to assign Ebola treatment units across geographic regions during the outbreak's early phases. The first stage employs a novel dynamic transmission model to forecast the occurrence of new cases at the region level, capturing connectivity among regions. We introduce an empirically estimated coefficient for behavioral adaptation to changing epidemic conditions. The second stage compares four approaches to allocate units across affected regions: (i) a heuristic based on observed cases, (ii) a greedy policy that prioritizes regions based on the reproductive number, (iii) a myopic linear program that allocates resources in the next period based on an iterative estimation-optimization approach coupled with the underlying epidemic model, and (iv) an approximate dynamic programming algorithm that optimizes over all future periods. After testing the allocation schemes under different budgets and time periods, we find that the myopic policy performs best, even when limited data are available. Our methodology could be generalized to other disease outbreaks, including the Zika virus, and other interventions.
In this paper, an adaptive dynamicprogramming-based near optimal boundary controller is developed for partial differential equations (PDEs) modeled by the uncertain Burgers' equation under Neumann boundary condit...
详细信息
In this paper, an adaptive dynamicprogramming-based near optimal boundary controller is developed for partial differential equations (PDEs) modeled by the uncertain Burgers' equation under Neumann boundary condition in 2-D. Initially, Hamilton-Jacobi-Bellman equation is derived in infinite-dimensional space. Subsequently, a novel neural network (NN) identifier is introduced to approximate the nonlinear dynamics in the 2-D PDE. The optimal control input is derived by online estimation of the value function through an additional NN-based forward-in-time estimation and approximated dynamic model. Novel update laws are developed for estimation of the identifier and value function online. The designed control policy can be applied using a finite number of actuators at the boundaries. Local ultimate boundedness of the closed-loop system is studied in detail using Lyapunov theory. Simulation results confirm the optimizing performance of the proposed controller on an unstable 2-D Burgers' equation.
In some micro grids, the charging of electric vehicles (EVs) and the generation of wind power may partially cancel each other. This is an effective way to reduce the variation of the wind power to the state grid. Due ...
详细信息
In some micro grids, the charging of electric vehicles (EVs) and the generation of wind power may partially cancel each other. This is an effective way to reduce the variation of the wind power to the state grid. Due to the forecasting error, it is of great practical interest to schedule the EV charging demand under the worst-case scenario of the wind power generation. We consider this important robust scheduling problem in this paper and make three major contributions. First, we formulate this robust scheduling problem as a robust stochastic shortest path problem whereby the objective function is a weighted sum of the wind power utilization and the total charging cost. Second, a robust simulation-based policy improvement method is developed to improve the performance of a base policy in the worst case. This improvement is mathematically shown under mild assumptions. Third, the performance of this method is numerically demonstrated based on real wind and EV data.
Bus headways are typically susceptible to external disturbances (e.g., due to traffic congestion, clustered passenger arrivals, and special passenger needs), which create gaps in the system that grow eventually into b...
详细信息
Bus headways are typically susceptible to external disturbances (e.g., due to traffic congestion, clustered passenger arrivals, and special passenger needs), which create gaps in the system that grow eventually into bunching. Although many control strategies, such as static and dynamic holding strategies, have been implemented to mitigate the effects of unreliable bus schedules, most of them would impose longer dwell times on the passengers. In this paper, we investigate the potential of an alternative bus substitution strategy that is currently implemented by some transit agencies in an ad-hoc manner. In this strategy, the agency deploys a fleet of standby buses to take over service from any early or late buses so as to contain deviations from schedule, and the intention is to impose minimum penalties on the onboard passengers. We develop a discrete-time infinite-horizon approximate dynamic programming approach to find the optimal policy to minimize the overall agency and passenger costs. It is shown through numerical examples that schedule deviations can be controlled by regularly inserting standby buses as substitutions. In some implementation scenarios, the proposed strategy holds the potential to achieve comparable performance with some of the most advanced strategies, and to outperform the conventional slack-based schedule control scheme. In light of the emerging opportunities associated with autonomous driving, the performance of the proposed strategy can become even stronger due to the reduction in costs for keeping the fleet of standby buses. (C) 2018 Elsevier Ltd. All rights reserved.
In this paper, a novel adaptive dynamicprogramming (ADP) algorithm, called "iterative zero-sum ADP algorithm," is developed to solve infinite-horizon discrete-time two-player zero-sum games of nonlinear sys...
详细信息
In this paper, a novel adaptive dynamicprogramming (ADP) algorithm, called "iterative zero-sum ADP algorithm," is developed to solve infinite-horizon discrete-time two-player zero-sum games of nonlinear systems. The present iterative zero-sum ADP algorithm permits arbitrary positive semidefinite functions to initialize the upper and lower iterations. A novel convergence analysis is developed to guarantee the upper and lower iterative value functions to converge to the upper and lower optimums, respectively. When the saddle-point equilibrium exists, it is emphasized that both the upper and lower iterative value functions are proved to converge to the optimal solution of the zero-sum game, where the existence criteria of the saddle-point equilibrium are not required. If the saddle-point equilibrium does not exist, the upper and lower optimal performance index functions are obtained, respectively, where the upper and lower performance index functions are proved to be not equivalent. Finally, simulation results and comparisons are shown to illustrate the performance of the present method.
Least squares Monte Carlo (LSM) is commonly used to manage and value early or multiple exercise financial or real options. Recent research in this area has started applying approximate linear programming (ALP) and its...
详细信息
Least squares Monte Carlo (LSM) is commonly used to manage and value early or multiple exercise financial or real options. Recent research in this area has started applying approximate linear programming (ALP) and its relaxations, which aim at addressing a possible ALP drawback. We show that regress-later LSM is itself an ALP relaxation that potentially corrects this ALP shortcoming. Our analysis consolidates two streams of research and supports using this LSM version rather than ALP on the considered models. (C) 2017 Elsevier B.V. All rights reserved.
Network virtualization technology is generally envisaged as a promising technology to consequently satisfy various types of service requirements. On the other hand, non-orthogonal multiple access (NOMA) technology has...
详细信息
Network virtualization technology is generally envisaged as a promising technology to consequently satisfy various types of service requirements. On the other hand, non-orthogonal multiple access (NOMA) technology has the potential to significantly increase the spectral efficiency of the system. However, previous works that jointly address these two issues have not considered the dynamic resource allocation issue in this context. In this paper, we propose a slice-based virtual resources scheduling scheme with NOMA technology to enhance the quality-of-service (QoS) of the system. We formulate the power granularity allocation and subcarrier allocation strategies into a constrained Markov decision process problem, aiming at the maximization of the total user rate. In order to further avoid the curse of dimensionality and the expectation calculation in the optimal value function, we develop an adaptive resource allocation algorithm based on approximate dynamic programming to solve the problem. Extensive simulation works have been conducted under various system settings, and the results demonstrate that the proposed algorithm can significantly reduce the outage probability and increase the user data rate.
Adaptive dynamicprogramming (ADP) is an important branch of reinforcement learning to solve various optimal control issues. Most practical nonlinear systems are controlled by more than one controller. Each controller...
详细信息
Adaptive dynamicprogramming (ADP) is an important branch of reinforcement learning to solve various optimal control issues. Most practical nonlinear systems are controlled by more than one controller. Each controller is a player, and to make a tradeoff between cooperation and conflict of these players can be viewed as a game. Multi-player games are divided into two main categories: zero-sum game and non-zero-sum game. To obtain the optimal control policy for each player, one needs to solve Hamilton-Jacobi-Isaacs equations for zero-sum games and a set of coupled Hamilton-Jacobi equations for non-zero-sum games. Unfortunately, these equations are generally difficult or even impossible to be solved analytically. To overcome this bottleneck, two ADP methods, including a modified gradient-descent-based online algorithm and a novel iterative offline learning approach, are proposed in this paper. Furthermore, to implement the proposed methods, we employ single-network structure, which obviously reduces computation burden compared with traditional multiple-network architecture. Simulation results demonstrate the effectiveness of our schemes.
This paper introduces a novel manifold regularized reinforcement learning scheme for continuous Markov decision processes. Smooth feature representations for value function approximation can be automatically learned u...
详细信息
This paper introduces a novel manifold regularized reinforcement learning scheme for continuous Markov decision processes. Smooth feature representations for value function approximation can be automatically learned using the unsupervised manifold regularization method. The learned features are data-driven, and can be adapted to the geometry of the state space. Furthermore, the scheme provides a direct basis representation extension for novel samples during policy learning and control. The performance of the proposed scheme is evaluated on two benchmark control tasks, i.e., the inverted pendulum and the energy storage problem. Simulation results illustrate the concepts of the proposed scheme and show that it can obtain excellent performance.
暂无评论