检索结果-内蒙古大学图书馆

Optimal decision strategy for discrete-time Markovian jump linear systems

INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE 2023年第3期54卷 565-582页

作者： Zhu, Jin Zhang, Qingkun Univ Sci & Technol China Dept Automat Hefei Peoples R China Hefei Comprehens Natl Sci Ctr Inst Artificial Intelligence Hefei Peoples R China

This paper investigates the discrete-time Markovian jump linear systems (MJLSs) whose mode transition probability matrix (MTPM) can be adjusted by decisions. Motivated by switching law design in switched systems, the optimal decision strategy is proposed for stabilisation and optimisation of such MJLSs where decision cost is taken into account. First, aiming at system stability, the feasible domain of decision is given for stable and unstable MJLSs with initial MTPM. Second, a generalised performance index is put forward which contains both state cost and decision cost, and we obtain the quantitative relationship between the index and decision via stochastic dynamic programming. Finally, for the optimisation of the performance index, a value iteration algorithm is proposed with its convergence proof. This algorithm, searching for the optimal decision within the feasible domain, achieves superior performance on the basis of ensuring stability. Simulation results illustrate the effectiveness of the proposed optimal decision strategy.

关键词： Discrete-time MJLSs optimal decision strategy stochastic stability generalised performance index value iteration algorithm

来源：评论

学校读者我要写书评

暂无评论

Convergence and Numerical Complexity of Policy and value iterations in Linear-Quadratic Discrete-Time Reinforcement Learning 4

Convergence and Numerical Complexity of Policy and Value Ite...

引用

4th Modeling, Estimation, and Control Conference (MECC)

作者： Xu, Lingyi Gajic, Zoran Rutgers State Univ Dept Elect & Comp Engn Piscataway NJ 08854 USA

This paper demonstrates that the value iteration (VI) algorithm of reinforcement learning of discrete -time (DT) linear-quadratic (LQ) optimal control problem converges very slowly mostly linearly, compared to the quadratic rate of convergence of the corresponding policy iteration (PI) algorithm. The VI algorithm produces non-monotonically decreasing or increasing sequences that converge to the optimal value either from below or from above depending on the choice of initial conditions. It is remarkable that the VI algorithm converges even in the case when the initial condition is very far from the optimal value by several orders of magnitude, and when the initial condition is not stabilizing. The PI algorithm generates a non-increasing sequence that monotonically converges from above to the optimal value assuming the initial condition (feedback gain) is stabilizing. The convergence rate for the PI algorithm is quadratic, which assures its fast convergence. It is shown in this paper that the convergence of the VI algorithm can be made quadratic by using the doubling algorithm. We precisely state a condition needed for convergence of the VI algorithm, which is milder than the corresponding convergence condition for the PI algorithm. We have also shown that the newly proposed VI algorithm requires less computational effort than the PI algorithm. Several numerical examples are solved to document the presented results.

关键词： Reinforcement learning policy iteration algorithm value iteration algorithm linear-quadratic discrete -time optimal control numerical complexity

来源：评论

学校读者我要写书评

暂无评论

Prospect-theoretic DRL Approach for Container Provisioning in Energy-constrained Edge Platforms 97

Prospect-theoretic DRL Approach for Container Provisioning i...

引用

97th IEEE Vehicular Technology Conference (VTC-Spring)

作者： Hlophe, M. C. Maharaj, B. T. Univ Pretoria Dept Elect Elect & Comp Engn Pretoria South Africa

ISBN: (纸本)9798350311143

Due to the increase in resource-constrained internet of things (IoT) devices, the multi-access edge computing (MEC) have become very competitive environments in terms of successful data offloading and allocation of computational resources. This competition together with the varying workload and service requests makes the real deployment of computational resources in edge servers a major challenge. In order to address this challenge, an intelligent offloading and container provisioning scheme is developed using a prospect-theoretic deep reinforcement learning (DRL) strategy. An offloading utility function is formulated by exploiting offloading overhead options, and a neural network (NN) is used to monitor queue states and workload matching to construct a behavioral function that will influence accurate container provisioning. A scaled cost function is formulated to balance the energy consumption and the quality of service (QoS) cost subject to hard per-task latency constraints, which lead to better container utilization and energy consumption.

关键词： 6G Container provisioning DRL Energy consumption IoT Latency MEC Prospect theory value iteration algorithm

来源：评论

学校读者我要写书评

暂无评论

A Hybrid Handover Scheme for Vehicular VLC/RF Communication Networks

引用

SENSORS 2024年第13期24卷 4323页

作者： Jia, Linqiong Feng, Shicheng Zhang, Yijin Wang, Jin-Yuan Nanjing Univ Sci & Technol Sch Elect & Opt Engn Nanjing 210094 Peoples R China Nanjing Univ Posts & Telecommun Sch Commun & Informat Engn Nanjing 210003 Peoples R China

Visible light communication (VLC) is a promising complementary technology to its radio frequency (RF) counterpart to satisfy the high quality-of-service (QoS) requirements of intelligent vehicular communications by reusing LED street lights. In this paper, a hybrid handover scheme for vehicular VLC/RF communication networks is proposed to balance QoS and handover costs by considering the vertical handover and horizontal handover together judging from the mobile state of the vehicle. A Markov decision process (MDP) is formulated to describe this hybrid handover problem, with a cost function balancing the handover consumption, delay, and reliability. A value iteration algorithm was applied to solve the optimal handover policy. The simulation results demonstrated the performance of the proposed hybrid handover scheme in comparison to other benchmark schemes.

关键词： vehicular VLC/RF communication networks vertical handover scheme horizontal handover Markov decision process (MDP) value iteration algorithm

来源：评论

学校读者我要写书评

暂无评论

value iteration Networks with Double Estimator for Planetary Rover Path Planning

引用

SENSORS 2021年第24期21卷 8418页

作者： Jin, Xiang Lan, Wei Wang, Tianlin Yu, Pengyao Dalian Maritime Univ Sch Naval Architecture & Ocean Engn Dalian 116026 Peoples R China

Path planning technology is significant for planetary rovers that perform exploration missions in unfamiliar environments. In this work, we propose a novel global path planning algorithm, based on the value iteration network (VIN), which is embedded within a differentiable planning module, built on the value iteration (VI) algorithm, and has emerged as an effective method to learn to plan. Despite the capability of learning environment dynamics and performing long-range reasoning, the VIN suffers from several limitations, including sensitivity to initialization and poor performance in large-scale domains. We introduce the double value iteration network (dVIN), which decouples action selection and value estimation in the VI module, using the weighted double estimator method to approximate the maximum expected value, instead of maximizing over the estimated action value. We have devised a simple, yet effective, two-stage training strategy for VI-based models to address the problem of high computational cost and poor performance in large-size domains. We evaluate the dVIN on planning problems in grid-world domains and realistic datasets, generated from terrain images of a moon landscape. We show that our dVIN empirically outperforms the baseline methods and generalize better to large-scale environments.

关键词： planetary rover path planning reinforcement learning value iteration algorithm deep neural network double estimator method

来源：评论

学校读者我要写书评

暂无评论

Convergence and Numerical Complexity of Policy and value iterations in Linear-Quadratic Discrete-Time Reinforcement Learning

引用

IFAC-PapersOnLine 2024年第28期58卷 96-101页

作者： Lingyi Xu Zoran Gajić Department of Electrical & Computer Engineering Rutgers The State University of New Jersey Piscataway NJ 08854 USA

This paper demonstrates that the value iteration (VI) algorithm of reinforcement learning of discrete-time (DT) linear-quadratic (LQ) optimal control problem converges very slowly mostly linearly, compared to the quadratic rate of convergence of the corresponding policy iteration (PI) algorithm. The VI algorithm produces non-monotonically decreasing or increasing sequences that converge to the optimal value either from below or from above depending on the choice of initial conditions. It is remarkable that the VI algorithm converges even in the case when the initial condition is very far from the optimal value by several orders of magnitude, and when the initial condition is not stabilizing. The PI algorithm generates a non-increasing sequence that monotonically converges from above to the optimal value assuming the initial condition (feedback gain) is stabilizing. The convergence rate for the PI algorithm is quadratic, which assures its fast convergence. It is shown in this paper that the convergence of the VI algorithm can be made quadratic by using the doubling algorithm. We precisely state a condition needed for convergence of the VI algorithm, which is milder than the corresponding convergence condition for the PI algorithm. We have also shown that the newly proposed VI algorithm requires less computational effort than the PI algorithm. Several numerical examples are solved to document the presented results.

关键词： Reinforcement learning policy iteration algorithm value iteration algorithm linear-quadratic discrete-time optimal control numerical complexity

来源：评论

学校读者我要写书评

暂无评论

value iteration Solver Networks 3

Value Iteration Solver Networks

引用

3rd International Conference on Intelligent Autonomous Systems (ICoIAS)

作者： Urtans, Evalds Vecins, Valters Riga Tech Univ Riga Latvia

ISBN: (纸本)9781728160788

value iteration algorithm is iterative and can't be parallelized. Computation time grows exponentially when the size of the input maps is increased. We propose UNet-RNN-Skip artificial neural network architecture that can be used to parallelize value iteration algorithm results. The proposed model can solve value iteration problem in fewer iterations than the original algorithm and computation time increases by only a small amount when increasing the size of the input map. Fundamental UNet-RNN-Skip architecture can be used also to solve and parallelize other sequential problems. With this paper synthetic dataset of maps and generator has been published to enable further studies in mapping and path planning tasks.

关键词： ResNet ConvNet RNN value iteration algorithm

来源：评论

学校读者我要写书评

暂无评论

Model-free optimal tracking policies for Markov jump systems by solving non-zero-sum games

引用

INFORMATION SCIENCES 2023年第1期647卷

作者： Zhou, Peixin Xue, Huiwen Wen, Jiwei Shi, Peng Luan, Xaoli Jiangnan Univ Sch Internet Things Engn Key Lab Adv Proc Control Light Ind Minist Educ Wuxi 214122 Peoples R China Univ Adelaide Sch Elect & Mech Engn Adelaide SA 5005 Australia Obuda Univ Res & Innovat Ctr H-1034 Budapest Hungary

This paper develops model-free optimal tracking policies for Markov jump systems by solving nonzero-sum games (NZSGs). First, coupled action and mode-dependent value functions (CAMDVFs) are built for solving a two-player NZSG and getting Nash equilibrium solutions. Second, we propose a value iteration (VI) algorithm to parallelly update policies under each mode by collecting data on different operation modes within each iterative window. Moreover, the iterative increasing convergence of the CAMDVFs is proved by introducing auxiliary functions between two adjacent iterations. It is worth pointing out that an influence function is introduced to remove abnormal data to improve the learning capability of the VI algorithm effectively. Finally, the tracking policies' validity, self-adaptability and application potential are verified by a numerical example and a generalized economic model.

关键词： value iteration algorithm Influence function Adaptive optimal tracking Non-zero-sum game Nash equilibrium

来源：评论

学校读者我要写书评

暂无评论

Optimal rearrangement and preventive maintenance policies for heterogeneous balanced systems with three failure modes

引用

RELIABILITY ENGINEERING & SYSTEM SAFETY 2023年第1期238卷

作者： Wang, Jingjing Liu, Huimin Lin, Tianran Qingdao Univ Technol Sch Management Engn Qingdao 266525 Peoples R China Qingdao Univ Technol Ctr Struct Acoust & Machine Fault Diag Qingdao 266525 Peoples R China

This paper studies a heterogeneous balanced system composed of multiple interchangeable components. The degradation process of components is described by a gamma process, and deterioration rates in different positions are different due to the effect of loading stress or temperature. Three competing failures may occur: a) shock failure caused by environment shocks, b) soft failure when the deterioration level of any component exceeds a critical value, and c) out of balance when the difference value among components reaches the failure threshold. To avoid system failure, a rearrangement action is adopted to change the position of components. Besides, preventive maintenance is considered when a component deteriorates severely. A semi-Markov decision process (SMDP) is developed to obtain the optimal policy by minimizing the average maintenance cost. To facilitate calculation, the data transmission method is used to convert the semi-Markov decision model to the Markov decision model, and a value iteration algorithm is established to obtain the optimal maintenance action at each state. Considering practical implications, an imperfect preventive and opportunistic maintenance model is formulated under the SMDP framework. Finally, a typical tire rotation problem in motorcycles proves that the imperfect maintenance policy outperforms the other policies when replacement fees are more expensive.

关键词： Rearrangement policy Preventive maintenance Heterogeneous balanced systems value iteration algorithm Competing failures

来源：评论

学校读者我要写书评

暂无评论

Reinforcement learning approach to the control of heavy material for robots

引用

COMPUTERS & ELECTRICAL ENGINEERING 2022年第PartB期104卷

作者： Wu, Xiaoming Chi, Jing Jin, Xiao-Zheng Deng, Chao Qilu Univ Technol Shandong Acad Sci Sch Comp Sci & Technol Jinan 250353 Shandong Peoples R China Nat Supercomp Ctr Jinan Shandong Comp Sci Ctr Jinan 250014 Shandong Peoples R China Shandong Prov Key Lab Comp Networks Jinan 250014 Shandong Peoples R China Shandong Univ Finance & Econ Dept Comp Sci & Technol Jinan 250014 Shandong Peoples R China 3501 Daxue Rd Jinan Shandong Peoples R China

In this paper, we consider the optimal control problem of heavy material handling manipulators for agricultural robots. Unlike the existing results on agricultural robots, the robot parameters may be unknown for the designer in this paper. To learn the linear quadratic control gain under unknown robot parameters, two reinforcement learning algorithms, i.e., policy iteration (PI) algorithm and value iteration (VI) algorithm, are proposed. Then, through combining the advantages of PI algorithm and VI algorithm, i.e., satisfactory convergence rate and without the restriction on feasibility initial control policy, respectively, a hybrid iteration (HI) algorithm is proposed, which can both achieve a satisfactory convergence rate and remove restrictions on feasibility initial control policy. It is shown that the convergence of the proposed HI algorithm can be achieved in theory. Finally, a simulation example is given to show that our designed HI algorithm can achieve a satisfactory simulation time.

关键词： Agricultural robots Policy iteration algorithm value iteration algorithm Optimal control

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：