To guarantee the efficient performance of the power plant, an adaptive tacking controller for the nonlinear boiler-turbine system based on offline policy iteration adaptive dynamic prorgamming (ADP) method is proposed...
详细信息
ISBN:
(纸本)9780738133669
To guarantee the efficient performance of the power plant, an adaptive tacking controller for the nonlinear boiler-turbine system based on offline policy iteration adaptive dynamic prorgamming (ADP) method is proposed in this paper. The optimal tracking controller is obtained through offline learning, which can maintain the characteristics of load changes in drum boiler-turbine type power plants. To implement the proposed method, neural networks (NNs) are used to construct the cost function and approximate optimal solution is achived. Then convergence of the method is analyzed. Simulation studies on the typical boiler-turbine system demonstrate that the proposed control strategy can achieve a satisfactory performance during a short period.
This study presents an adaptive railway traffic controller for real-time operations based on approximatedynamicprogramming (ADP). By assessing requirements and opportunities, the controller aims to limit consecutive...
详细信息
This study presents an adaptive railway traffic controller for real-time operations based on approximatedynamicprogramming (ADP). By assessing requirements and opportunities, the controller aims to limit consecutive delays resulting from trains that entered a control area behind schedule by sequencing them at a critical location in a timely manner, thus representing the practical requirements of railway operations. This approach depends on an approximation to the value function of dynamicprogramming after optimisation from a specified state, which is estimated dynamically from operational experience using reinforcementlearning techniques. By using this approximation, the ADP avoids extensive explicit evaluation of performance and so reduces the computational burden substantially. In this investigation, we explore formulations of the approximation function and variants of the learning techniques used to estimate it. Evaluation of the ADP methods in a stochastic simulation environment shows considerable improvements in consecutive delays by comparison with the current industry practice of First-Come-First-Served sequencing. We also found that estimates of parameters of the approximate value function are similar across a range of test scenarios with different mean train entry delays.
Solving the unit commitment (UC) problem in a computationally efficient manner is a critical issue of electricity market operations. Optimization-based methods such as heuristics, dynamicprogramming, and mixed-intege...
详细信息
learning for autonomous dynamic control systems that can adapt to unforeseen environmental changes are of great interest but the realisation of a practical and safe online learning algorithm is incredibly challenging....
详细信息
In the vehicle edge computing network (VECN), how to deal with the computation resources and energy resources shortage problem the roadside units (RSUs) encounter when they are performing delay sensitive computation t...
详细信息
ISBN:
(数字)9798350362244
ISBN:
(纸本)9798350362251
In the vehicle edge computing network (VECN), how to deal with the computation resources and energy resources shortage problem the roadside units (RSUs) encounter when they are performing delay sensitive computation tasks is an important issue, especially during the peak hours and the situation of VECN is dynamic. To complete the computation tasks on time with the minimum expenditure, in this paper, we investigate the problem of information-energy collaboration among RSUs, where the spectrum management is also involved. For the considered scenario, the RSUs’ strategies of spectrum selection, computation task offloading and energy sharing are derived from the formulated optimization problem. Since the proposed problem is a highly complex mixed-integer nonlinear programming problem and the strategies are coupled with each other, a multi-agent deep deterministic policy gradient (MADDPG) based algorithm is proposed to find the sub-optimal solutions quickly in a dynamic environment. The simulation results show that our approach is superior to the existing schemes in terms of total system expenditure and the spectral efficiency.
While global attention on reducing food waste has increased, the demand for perishable commodities such as food and pharmaceuticals is growing. This emphasizes the need for effective perishable inventory management, w...
While global attention on reducing food waste has increased, the demand for perishable commodities such as food and pharmaceuticals is growing. This emphasizes the need for effective perishable inventory management, which has become increasingly complex due to the perishability of these products. Traditional optimization methods, such as dynamicprogramming, require significant time and effort to solve these challenges. In this study, we use Deep Q-Network and Proximal Policy Optimization, which are deep reinforcementlearning methods that can give numerical and approximate solutions to complex problems. In the inventory problem considering costs such as ordering, storage, lost opportunities, and spoilage, we define the inventory status as the state, the ordering as the action, and the negative total cost as the reward. We conducted a performance comparison of the two methods with an aligned total number of time steps. Furthermore, through numerical experiments, it was confirmed that the application of both methods resulted in a cost reduction of at least approximately 30% compared to the basic stock policy.
In this paper, optimal control problems with constraints on summation of auxiliary utility function are called constrained cost optimal control problems and a constrained cost policy iteration adaptive dynamic program...
详细信息
ISBN:
(纸本)9780738133669
In this paper, optimal control problems with constraints on summation of auxiliary utility function are called constrained cost optimal control problems and a constrained cost policy iteration adaptive dynamicprogramming (ADP) algorithm is developed to solve constrained cost optimal control problems for discrete-time nonlinear systems. A convergence analysis is developed to guarantee that the iterative value functions non-increasingly convergent to the approximate optimal value function. It is also proven that any of the iterative control policy is feasible and can stabilize the nonlinear systems. Finally, a simulation example is given to illustrate the performance of the developed constrained cost policy iteration algorithm.
We perform a comparison study on Bayesian sequential optimal experimental design algorithms applied to linear regression in two unknowns. We transform the Bayesian sequential optimal experimental design problem into a...
详细信息
We perform a comparison study on Bayesian sequential optimal experimental design algorithms applied to linear regression in two unknowns. We transform the Bayesian sequential optimal experimental design problem into a reinforcementlearning problem to determine the power of deep reinforcementlearning algorithms against baselines including batch design, greedy design, dynamicprogramming, and approximatedynamicprogramming. Using KL-divergence to measure information gain in the unknown parameters, we construct objectives for each algorithm to maximize information gain. This work showcases novel comparisons between the aforementioned algorithms and provides a new application of reinforcementlearning to Bayesian sequential optimal experimental design for inverse problems in linear regression with multiple parameters.
In this paper, a safe adaptive dynamicprogramming (SADP) method based on the barrier function (BF) is proposed for the optimal control problem of nonlinear safety-critical systems with the safety constraints and exte...
详细信息
Traffic Engineering (TE) has been applied to optimize network performance by routing/rerouting flows based on traffic loads and network topologies. To cope with network dynamics from emerging applications, it is essen...
详细信息
ISBN:
(纸本)9781665414944
Traffic Engineering (TE) has been applied to optimize network performance by routing/rerouting flows based on traffic loads and network topologies. To cope with network dynamics from emerging applications, it is essential to reroute flows more frequently than today's TE to maintain network performance. However, existing TE solutions may introduce considerable Quality of Service (QoS) degradation and service disruption since they do not take the potential negative impact of flow rerouting into account. In this paper, we apply a new QoS metric named network disturbance to gauge the impact of flow rerouting while optimizing network load balancing in backbone networks. To employ this metric in TE design, we propose a disturbance-aware TE called DATE, which uses reinforcementlearning (RL) to intelligently select some critical flows between nodes for each traffic matrix and reroute them using Linear programming (LP) to jointly optimize network performance and disturbance. DATE is equipped with a customized actor-critic architecture and Graph Neural Networks (GNNs) to handle dynamic traffic and single link failures. Extensive evaluations show that DATE can outperform state-of-the-art TE methods with close-to-optimal load balancing performance while effectively mitigating the 99th percentile network disturbance by up to 31.6%.
暂无评论