检索结果-内蒙古大学图书馆

value iteration algorithm for continuous-time linear quadratic stochastic optimal control problems

Science China(Information Sciences) 2024年第2期67卷 170-180页

作者： Guangchen WANG Heng ZHANG School of Control Science and Engineering Shandong University

In this study, we investigate a continuous-time infinite-horizon linear quadratic stochastic optimal control problem with multiplicative noise in control and state variables. Using the techniques of stochastic stability, exact observability, and stochastic approximation, a value iteration algorithm is developed to solve the corresponding generalized algebraic Riccati equation. Unlike the existing policy iteration algorithm, this algorithm does not rely on an initial stabilizing control. Further, this algorithm can also be used to compute policy evaluation steps that arise in the policy iteration algorithm. Herein, a simulation example is provided to validate the obtained results.

关键词： stochastic systems optimal control linear quadratic stochastic problem generalized algebraic Riccati equation value iteration algorithm

来源：评论

学校读者我要写书评

暂无评论

Data-driven optimal tracking control of discrete-time linear systems with multiple delays via the value iteration algorithm

引用

INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE 2022年第14期53卷 2845-2859页

作者： Hao, Longyan Wang, Chaoli Zhang, Guang Jing, Chonglin Shi, Yibo Univ Shanghai Sci & Technol Dept Control Sci & Engn Shanghai 200093 Peoples R China Univ Shanghai Sci & Technol Business Sch Shanghai Peoples R China Univ Shanghai Sci & Technol Coll Sci Shanghai Peoples R China

In this paper, the optimal tracking problem for discrete-time linear systems with multiple delays is studied without system dynamics. It is known that the total state of a system without specific dynamic characteristics is hard to be measured, unless such a system is equipped with massive sensors, which, however, may lead to an increase in cost and complexity for analysing. To deal with this problem and avoid adverse effects caused by using system state information, a new data-driven value iteration (DDVI) algorithm is proposed by considering three factors: past control inputs, system outputs, and external reference trajectories. Before the algorithm is proposed, a transformation is made to the original system according to the characteristics of the time-delay system, so that the time-delay number can be reduced or become a delay-free system. A novel data-driven state equation is derived from the historical data of the three factors, and then, it is adopted to solve the optimal control of multi-delay systems. Further results show that the proposed DDVI algorithm is convergent and the tracking error is asymptotically stable. Finally, simulations are provided to show the effectiveness of the controller.

关键词： Optimal tracking control value iteration algorithm multiple delays system

来源：评论

学校读者我要写书评

暂无评论

value iteration algorithm for mean-field games

引用

SYSTEMS & CONTROL LETTERS 2020年 143卷 104744-104744页

作者： Anahtarci, Berkay Kariksiz, Can Deha Saldi, Naci Ozyegin Univ Istanbul Turkey

In the literature, existence of mean-field equilibria has been established for discrete-time mean field games under both the discounted cost and the average cost optimality criteria. In this paper, we provide a value iteration algorithm to compute stationary mean-field equilibrium for both the discounted cost and the average cost criteria, whose existence proved previously. We establish that the value iteration algorithm converges to the fixed point of a mean-field equilibrium operator. Then, using this fixed point, we construct a stationary mean-field equilibrium. In our value iteration algorithm, we use Q-functions instead of value functions. (C) 2020 Elsevier B.V. All rights reserved.

关键词： Mean-field games value iteration algorithm Discounted cost Average cost

来源：评论

学校读者我要写书评

暂无评论

Two-Player Stackelberg Game for Linear System via value iteration algorithm 28

Two-Player Stackelberg Game for Linear System via Value Iter...

引用

28th IEEE International Symposium on Industrial Electronics (IEEE-ISIE)

作者： Li, Man Qin, Jiahu Ding, Lei Univ Sci & Technol China Dept Automat Hefei 230027 Anhui Peoples R China Nanjing Univ Posts & Telecommun Inst Adv Technol Nanjing 210023 Jiangsu Peoples R China

ISBN: (纸本)9781728136660

This paper investigates a hierarchical decision-making problem for two players governed by a continuous-time linear system. Such a problem is formulated as a Stackelberg game, in which one player, called leader, has the priority to make its decision first and the other player, called follower, reacts optimally to the leader's decision subsequently. We first establish two Hamilton-Jacobi-Bellman (HJB) equations in coupled forms, and show that the solutions to these HJB equations not only stabilize the system but also constitute the Stackelberg equilibrium policy. Due to the difficulty to analytically solve the HJB equations, we develop a new partially model-free value iteration (VI) algorithm with a two-level decision-making structure. To implement the proposed VI algorithm, we employ neural networks (NNs) to approximate the value functions, and use a least-square method to update weights of NNs. Finally, one simulation example is presented to verify the effectiveness of the proposed algorithm.

关键词： Stackelberg game linear system value iteration algorithm neural networks

来源：评论

学校读者我要写书评

暂无评论

A Modified value iteration algorithm for Discounted Markov Decision Processes

引用

JOURNAL OF ELECTRONIC COMMERCE IN ORGANIZATIONS 2015年第3期13卷 47-57页

作者： Chafik, Sanaa Daoui, Cherki Univ Sultan Moulay Slimane Lab Informat Proc & Decis Support Beni Mellal Morocco

As many real applications need a large amount of states, the classical methods are intractable for solving large Markov Decision Processes. The decomposition technique basing on the topology of each state in the associated graph and the parallelization technique are very useful methods to cope with this problem. In this paper, the authors propose a Modified value iteration algorithm, adding the parallelism technique. They test their implementation on artificial data using an Open MP that offers a significant speed-up.

关键词： Discounted Reward Criterion Markov Decision Processes Open MP Parallelizing value iteration algorithm

来源：评论

学校读者我要写书评

暂无评论

Incremental value iteration for optimal output regulation of linear systems with unknown exosystems

引用

NEUROCOMPUTING 2025年 626卷

作者： Jing, Chonglin Wang, Chaoli Liang, Dong Xu, Yujing Hao, Longyan Univ Shanghai Sci & Technol Dept Control Sci & Engn Shanghai 200093 Peoples R China High Tech Inst Fan Gong Ting South St 12th Weifang 261000 Peoples R China

This paper addresses the optimal output regulation problem for discrete-time linear systems with completely unknown dynamics and unmeasurable exosystem states. The primary objective is to design incremental dataset-based value iteration (VI) reinforcement learning algorithms to derive both state feedback and output feedback controllers. In the context of data-driven optimal control, existing approaches typically require either the exosystem state to be measurable or the design of an autonomous system to reconstruct it. In contrast, this work proposes an incremental dataset-based VI algorithm, which eliminates the need for exosystem state measurement or reconstruction. Additionally, the proposed method allows for the selection of an arbitrary initial admissible control policy, thereby overcoming the challenge of requiring an initial admissible control in policy iteration algorithms. Furthermore, the system state is reconstructed using the incremental dataset, and an optimal output feedback controller is developed based on the proposed VI algorithm. The theoretical convergence of the dataset-based incremental VI algorithm is rigorously analyzed, and comprehensive simulations are conducted to validate its effectiveness.

关键词： value iteration algorithm Output regulation Incremental dataset Optimal control

来源：评论

学校读者我要写书评

暂无评论

Zero-sum risk-sensitive continuous-time stochastic games with unbounded reward and transition rates in Borel spaces

引用

AUTOMATICA 2025年 177卷

作者： Zhang, Junyu Guo, Xianping Xia, Li Sun Yat Sen Univ Sch Math Guangzhou 510275 Peoples R China Sun Yat Sen Univ Sch Business Guangzhou 510275 Peoples R China Sun Yat Sen Univ Guangdong Prov Key Lab Computat Sci Guangzhou Peoples R China

This paper investigates a finite-horizon two-player zero-sum risk-sensitive stochastic game in continuous-time Markov chains with Borel state and action spaces. The model accommodates unbounded reward rates, transition rates, and terminal reward functions, while permitting historydependent policies. The risk metric is the exponential utility function. Under appropriate conditions, we establish the existence of a solution to the corresponding Shapley equation (SE) through an approximation technique. Using the SE and an extension of Dynkin's formula, we prove the existence of saddle-point equilibrium and demonstrate that the stochastic game's value is the unique solution to the SE. Furthermore, we develop a value iteration algorithm for approximating the stochastic game's value, with convergence guaranteed by a specialized contraction operator within our risk-sensitive stochastic game framework. Finally, we illustrate our main findings through an example. (c) 2025 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.

关键词： Zero-sum stochastic game Risk-sensitive criterion Unbounded transition rates Unbounded reward rates Saddle-point equilibrium value iteration algorithm

来源：评论

学校读者我要写书评

暂无评论

Risk probability optimization of finite horizon piecewise deterministic Markov decision processes

引用

OPTIMIZATION 2025年第7期74卷 1697-1721页

作者： Huo, Haifeng Wen, Xian Guangxi Univ Sci & Technol Sch Sci Liuzhou Peoples R China

This paper investigates the piecewise deterministic Markov decision processes (PDMDPs) under the risk probability criterion. The optimality problem is to minimize the probability that the finite horizon total costs are no more than the cost goal. Under some suitable conditions, the value iteration algorithm is adopted to verify the existence of a solution to the optimality problem. Meanwhile, some new facts are established to prove that the value function is the unique solution to the optimality problem, and the existence of an optimal policy. Finally, two examples are presented to explain the application of risk probability PDMDPs, where the first one illustrates the verification of the main conditions, and the second one shows the calculation of the value function and an optimal risk probability policy.

关键词： Piecewise deterministic Markov decision processes risk probability criterion optimal policy value iteration algorithm

来源：评论

学校读者我要写书评

暂无评论

Dynamic soft-kill weapon-target assignment in naval environments

引用

COMPUTERS & INDUSTRIAL ENGINEERING 2024年 197卷

作者： Tashakori, Sadegh Ranjbar, Mohammad Balochian, Saeed Sharif-Razavian, Javad Peymankar, Mahboobeh Ferdowsi Univ Mashhad Fac Engn Dept Ind Engn Mashhad Iran Islamic Azad Univ Dept Elect Engn Mashhad Branch Mashhad Iran Islamic Azad Univ Dept Comp Engn Mashhad Branch Mashhad Iran Hakim Sabzevari Univ Dept Ind Engn Sabzevar Iran

One of the most significant threats faced by ships is anti-ship missiles. Nowadays, these missiles, equipped with diverse guidance systems, can locate their trajectory and attack the ship. Consequently, ships need to utilize their weapons to attempt to neutralize these threats. This article aims to develop dynamic assignment algorithms to assign a ship's defensive soft-kill weapons to a set of incoming missiles, to minimize the average damage inflicted on the ship. To this end, initially, a binary linear programming model is developed to solve the static weapontarget assignment problem. Subsequently, a simulation-optimization algorithm and a reinforcement learningbased approach, grounded in the value iteration algorithm, are developed to solve the dynamic weapon-target assignment problem. To compare and evaluate the performance of the developed solution methods, we employ a set of randomly generated test instances. Computational results indicate that the reinforcement learning approach, due to its inherent foresight, outperforms the simulation-optimization approach in reducing the inflicted damages. However, in terms of CPU run time, the simulation-optimization approach is more efficient.

关键词： Dynamic weapon-target assignment Simulation-based optimization value iteration algorithm

来源：评论

学校读者我要写书评

暂无评论

Optimal control of a dynamic production-inventory system with various cost criteria

引用

ANNALS OF OPERATIONS RESEARCH 2024年第1期337卷 75-103页

作者： Golui, Subrata Pal, Chandan Manikandan, R. Sobhanan, Abhay Indian Inst Technol Guwahati Dept Math Gauhati 781039 Assam India Cent Univ Kerala Dept Math Kasaragod 671320 Kerala India Univ S Florida Dept Ind & Management Syst Engn Tampa FL 33620 USA

In this article, we investigate the dynamic control problem of a production-inventory system. Here, demands arrive at the production unit according to a Poisson process and are processed in an FCFS manner. The processing time of the customer's demand is exponentially distributed. Production manufacturers produce items on a make-to-order basis to meet customer demands. The production is run until the inventory level becomes sufficiently large. We assume that the production time of an item follows an exponential distribution and that the amount of time for the produced item to reach the retail shop is negligible. In addition, we assume that no new customer joins the queue when there is void inventory. Moreover, when a customer is waiting in an infinite FIFO queue for service, he/she does not leave the queue even if the inventory is exhausted. This yields an explicit product-form solution for the steady-state probability vector of the system. The optimal policy that minimizes the discounted/average/pathwise average total cost per production is derived using a Markov decision process approach. We find an optimal policy using value/policy iteration algorithms. Numerical examples are discussed to verify the proposed algorithms.

关键词： Production-inventory system Controlled Markov chain Cost criterion value iteration algorithm Policy iteration algorithm

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：