检索结果-内蒙古大学图书馆

Optimal intra-day operations of behind-the-meter battery storage for primary frequency regulation provision: A hybrid lookahead method

引用

ENERGY 2022年第0期247卷 123482-123482页

作者： Wen, Kerui Li, Weidong Yu, Samson Shenglong Li, Ping Shi, Peng Dalian Univ Technol Fac Elect Informat & Elect Engn Dalian 116024 Peoples R China Deakin Univ Sch Engn 75 Pigdgon Rd Waurn Ponds Vic 3216 Australia State Grid Liaoning Elect Power Supply Co Ltd Elect Power Res Inst Shenyang 110006 Peoples R China Univ Adelaide Sch Elect & Elect Engn North Terrace Adelaide SA 5000 Australia

Battery energy storage systems (BESSs) are being widely installed behind-the-meter to reduce electricity bill. By providing grid ancillary services, behind-the-meter BESSs can increase potential revenue streams. This study targets the simultaneous electricity bill reduction and primary frequency regulation (PFR) provision. With the expansion of the application spectrum, the intra-day operations become more and more complicated. In this paper, a hybrid lookahead method with value function approximation strategy is proposed for intra-day operations, wherein the concept of "offline calculationdonline application" is devised and implemented. The approximate value function is trained offline to represent the expected long-term benefit. A two-stage robust approximate dynamic programming (ADP) model is formulated for one day operation which is optimized to adjust the power baseline with a forward rolling horizon. Furthermore, multi-dimensional indicators are introduced to evaluate the proposed strategy. Simulations and benchmarking comparisons are performed for a 0.5 MW/1.0 MWh BESS to verify the superior performance of the proposed strategy. The results show that the approximate value function can be obtained offline with 99.07% convergence precision. Moreover, the proposed strategy can ensure the economic benefit and PFR provision within a short online computing time. The resulting intra-day economic benefit can reach 95.55% of the theoretical optimum, and the online optimization consumes only 4.65s for a prediction horizon of 5 min, which ensures the feasibility of real-time predictive optimization. (C) 2022 Elsevier Ltd. All rights reserved.

关键词： Optimal intra-day operations Behind-the-meter battery storage Primary frequency regulation Electricity bill reduction value function approximation Offline calculation

来源：评论

学校读者我要写书评

暂无评论

value-gradient iteration with quadratic approximate value functions

引用

ANNUAL REVIEWS IN CONTROL 2023年 56卷

作者： Yang, Alan Boyd, Stephen Stanford Univ Dept Elect Engn Stanford CA 94305 USA

We propose a method for designing policies for convex stochastic control problems characterized by random linear dynamics and convex stage cost. We consider policies that employ quadratic approximate value functions as a substitute for the true value function. Evaluating the associated control policy involves solving a convex problem, typically a quadratic program, which can be carried out reliably in real-time. Such policies often perform well even when the approximate value function is not a particularly good approximation of the true value function. We propose value-gradient iteration, which fits the gradient of value function, with regularization that can include constraints reflecting known bounds on the true value function. Our value-gradient iteration method can yield a good approximate value function with few samples, and little hyperparameter tuning. We find that the method can find a good policy with computational effort comparable to that required to just evaluate a control policy via simulation.

关键词： Approximate dynamic programming Stochastic control Convex optimization value function approximation Supply chain optimization

来源：评论

学校读者我要写书评

暂无评论

The Dynamic Freight Routing Problem for Less-Than-Truckload Carriers

引用

TRANSPORTATION SCIENCE 2023年第3期57卷 717-740页

作者： Baubaid, Ahmad Boland, Natashia Savelsbergh, Martin King Fahd Univ Petr & Minerals Ind & Syst Engn Dept Dhahran 31261 Saudi Arabia King Fahd Univ Petr & Minerals Interdisciplinary Res Ctr Smart Mobil & Logist Dhahran 31261 Saudi Arabia Georgia Inst Technol H Milton Stewart Sch Ind & Syst Engn Atlanta GA 30332 USA

Less-than-truckload (LTL) carriers transport freight shipments from origins to destinations by consolidating freight using a network of terminals. As daily freight quantities are uncertain, carriers dynamically decide freight routes on the day of operations. We introduce the dynamic freight routing problem (DFRP) and model this problem as a Markov decision process (MDP). To overcome the curses of dimensionality of the MDP model, we introduce an approximate dynamic programming (ADP) solution approach that uses a lookup table to store value function approximations and present and compares a number of aggregation approaches that use features of the postdecision states (PDSs) to aggregate the PDS space and reduce the number of entries in the lookup table. Furthermore, because the decision subproblems are integer programs (IPs), we present a framework for integrating lookup tables into the decision subproblem IPs. This framework consists of (1) a modeling approach for the integration of lookup table value function approximations into subproblem IPs to form extended subproblem IPs;(2) a solution approach, PDS-IP-bounding, which decomposes the extended subproblem IPs into many smaller IPs and uses dynamic bounds to reduce the number of small IPs that have to be solved;and (3) an adaptation of the e-greedy exploration-exploitation algorithm for the IP setting. Our computational experiments show that despite the DFRP having high-dimensional PDS vectors, a two-dimensional aggregation of the space can produce policies that outperform standard myopic policies. Moreover, they demonstrate that the PDS-IP-bounding algorithm provides computational advantages over solving the extended subproblem IPs using a commercial solver.

关键词： less-than-truckload freight transportation approximate dynamic programming value function approximation load planning

来源：评论

学校读者我要写书评

暂无评论

Approximate dynamic programming for constrained linear systems: A piecewise quadratic approximation approach☆

引用

AUTOMATICA 2024年 160卷

作者： He, Kanghui Shi, Shengling van den Boom, Ton De Schutter, Bart Delft Univ Technol Delft Ctr Syst & Control Delft Netherlands

Approximate dynamic programming (ADP) faces challenges in dealing with constraints in control problems. Model predictive control (MPC) is, in comparison, well-known for its accommodation of constraints and stability guarantees, although its computation is sometimes prohibitive. This paper introduces an approach combining the two methodologies to overcome their individual limitations. The predictive control law for constrained linear quadratic regulation (CLQR) problems has been proven to be piecewise affine (PWA) while the value function is piecewise quadratic. We exploit these formal results from MPC to design an ADP method for CLQR problems with a known model. A novel convex and piecewise quadratic neural network with a local-global architecture is proposed to provide an accurate approximation of the value function, which is used as the cost-to-go function in the online dynamic programming problem. An efficient decomposition algorithm is developed to generate the control policy and speed up the online computation. Rigorous stability analysis of the closed-loop system is conducted for the proposed control scheme under the condition that a good approximation of the value function is achieved. Comparative simulations are carried out to demonstrate the potential of the proposed method in terms of online computation and optimality.(c) 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://***/licenses/by/4.0/).

关键词： Approximate dynamic programming Reinforcement learning Model predictive control value function approximation Neural networks Constrained linear quadratic regulation

来源：评论

学校读者我要写书评

暂无评论

F-Discrepancy for Efficient Sampling in Approximate Dynamic Programming

引用

IEEE TRANSACTIONS ON CYBERNETICS 2016年第7期46卷 1628-1639页

作者： Cervellera, Cristiano Maccio, Danilo CNR Inst Intelligent Syst Automat I-16149 Genoa Italy

In this paper, we address the problem of generating efficient state sample points for the solution of continuous-state finite-horizon Markovian decision problems through approximate dynamic programming. It is known that the selection of sampling points at which the value function is observed is a key factor when such function is approximated by a model based on a finite number of evaluations. A standard approach consists in generating these points through a random or deterministic procedure, aiming at a balanced covering of the state space. Yet, this solution may not be efficient if the state trajectories are not uniformly distributed. Here, we propose to exploit F-discrepancy, a quantity that measures how closely a set of random points represents a probability distribution, and introduce an example of an algorithm based on such concept to automatically select point sets that are efficient with respect to the underlying Markovian process. An error analysis of the approximate solution is provided, showing how the proposed algorithm enables convergence under suitable regularity hypotheses. Then, simulation results are provided concerning an inventory forecasting test problem. The tests confirm in general the important role of F-discrepancy, and show how the proposed algorithm is able to yield better results than uniform sampling, using sets even 50 times smaller.

关键词： Approximate dynamic programming (ADP) F-discrepancy Markovian decision problem (MDP) state sampling value function approximation

来源：评论

学校读者我要写书评

暂无评论

Attitude control for hypersonic reentry vehicles: An efficient deep reinforcement learning method

引用

APPLIED SOFT COMPUTING 2022年 123卷

作者： Liu, Yiheng Wang, Honglun Wu, Tiancai Lun, Yuebin Fan, Jiaxuan Wu, Jianfa Beihang Univ Sch Automat Sci & Elect Engn Beijing 100191 Peoples R China Beihang Univ Shenyuan Honors Coll Beijing 100191 Peoples R China Beihang Univ Sci & Technol Aircraft Control Lab Beijing 100191 Peoples R China

Aiming at the attitude control problem of hypersonic reentry vehicles (HRVs), a deep reinforcement learning (DRL) based anti-disturbance control method is proposed. First, a compound control framework consisting of a DRL-based auxiliary controller and a fixed-time anti-disturbance controller is proposed to improve the control performance under the premise of ensuring stability. Then, a novel value function approximation mechanism, named experience-based value expansion (EVE), is proposed to modify the value function update equation based on a two-dimensional replay buffer, which solves the DRL convergence problem brought by the HRV's strong nonlinearities, tight coupling, and big flight envelope. Furthermore, a result-oriented encoder (ROE) is proposed to solve the DRL generalization problem brought by the HRV's high uncertainties and unavailable real training environment. A bottleneck shape neural network structure is used for the DRL's network structure to extract high dimensional features and prevent overfitting to the training environment. Finally, abundant numerical comparative simulations demonstrate the effectiveness of the proposed efficient DRL algorithms and the DRL-based attitude controller.(c) 2022 Elsevier B.V. All rights reserved.

关键词： Deep reinforcement learning value function approximation Attitude control Anti-disturbance control Hypersonic reentry vehicle

来源：评论

学校读者我要写书评

暂无评论

Distributed Gradient Temporal Difference Off-policy Learning With Eligibility Traces: Weak Convergence

引用

IFAC-PapersOnLine 2020年第2期53卷 1563-1568页

作者： Miloš S. Stanković Marko Beko Srdjan S. Stanković Innovation Center School of Electrical Engineering University of Belgrade Vlatacom Institute Belgrade and Singidunum University Belgrade Serbia COPELABS Universidade Lusófona de Humanidades e Tecnologias Lisboa Portugal and UNINOVA Caparica Portugal School of Electrical Engineering University of Belgrade Serbia and Vlatacom Institute Belgrade Serbia

In this paper we propose two novel distributed algorithms for multi-agent off-policy learning of linear approximation of the value function in Markov decision processes. The algorithms differ in the way of how distributed consensus iterations are incorporated in a basic, recently proposed, single agent scheme. The proposed completely decentralized off-policy learning schemes subsume local eligibility traces, and allow applications in which all the agents may have different behavior policies while evaluating a single target policy. Under nonrestrictive assumptions on the time-varying network topology and the individual state-visiting distributions of the agents, we prove that the parameter estimates of the algorithms weakly converge to a consensus. The variance reduction properties of the proposed algorithms are demonstrated. We also formulate specific guidelines on how to design the network weights and topology. The results are illustrated using simulations.

关键词： Reinforcement learning Distributed consensus value function approximation Convergence Eligibility traces Off-policy learning Weak convergence Multi-agent systems

来源：评论

学校读者我要写书评

暂无评论

Sarsa(lambda)-based Logistics Planning Approximated by value function With Policy Iteration

引用

JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY 2015年第4期9卷 449-466页

作者： Tang, Yu Taizhou Univ Taizhou Chunhui Rd 100 Taizhou 225300 Jiangsu Peoples R China

The logistics planning problem has been extensively investigated for a long time. However, with the increasing number of stochastic events occurred in road, increasing number of stochastic factors should be taken into consideration. A dynamic approach is used in this paper to solve the logistics planning problem in the common form of stochastic demand with the reinforcement learning framework which is able to optimize policy in unknown environments and uncertain cases. We take advantage of clustering method to extract states as main features for basis function so as to solve the dimensionality curse problems caused by stochastic settings. We also propose an approximation approach with the policy iteration restricted by the goal of minimal time differential error to approximate the stochastic cases of the real world, and then use the attained approximation parameters as input for the proposed Sarsa(lambda)-based logistics planning algorithm to determine the policy and action in accordance with the real world stochastic events. The benchmarking experimental results showed that the proposed algorithm has achieved improvements in almost all the test cases.

关键词： Logistics Planning Reinforcement Learning Sarsa(lambda) value function approximation Policy Iteration

来源：评论

学校读者我要写书评

暂无评论

Sarsa(Λ)-Based Logistics Planning Approximated by value function with Policy Iteration

引用

Journal of Algorithms & Computational Technology 2015年第4期9卷 449-466页

作者： Yu Tang Taizhou University Taizhou Jiangsu 225300 China

The logistics planning problem has been extensively investigated for a long time. However, with the increasing number of stochastic events occurred in road, increasing number of stochastic factors should be taken into consideration. A dynamic approach is used in this paper to solve the logistics planning problem in the common form of stochastic demand with the reinforcement learning framework which is able to optimize policy in unknown environments and uncertain cases. We take advantage of clustering method to extract states as main features for basis function so as to solve the dimensionality curse problems caused by stochastic settings. We also propose an approximation approach with the policy iteration restricted by the goal of minimal time differential error to approximate the stochastic cases of the real world, and then use the attained approximation parameters as input for the proposed Sarsa(Λ)-based logistics planning algorithm to determine the policy and action in accordance with the real world stochastic events. The benchmarking experimental results showed that the proposed algorithm has achieved improvements in almost all the test cases.

关键词： Logistics Planning Reinforcement Learning Sarsa(Λ) value function approximation Policy Iteration

来源：评论

学校读者我要写书评

暂无评论

An Exemplar Test Problem on Parameter Convergence Analysis of Temporal Difference Algorithms

An Exemplar Test Problem on Parameter Convergence Analysis o...

引用

World Congress on Intelligent Control and Automation

作者： Martin Brown Onder Tutsoy Control Systems Group School of Electrical and Electronic Engineering The University of Manchester

ISBN: (纸本)9781467313971

Reinforcement learning techniques have been developed to solve difficult learning control problems having small amount of a priori knowledge about the system dynamics. In this paper, a simple unstable exemplar test problem is proposed to investigate issues in parametric convergence of the value function. A specific closed-form solution for the value function is determined which has a polynomial form. It is proved that the temporal difference error introduces a null space associated with the finite horizon basis function during the control trajectory. The learning problem can be only non-singular if the termination is handled correctly, and a number of possible solutions are introduced. This result was only revealed because of the derived closed form solution for the value function.

关键词： Reinforcement learning Temporal difference learning value function approximation Polynomial basis functions Rate of convergence

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：