检索结果-内蒙古大学图书馆

Influence Enhanced Sparse Coordination Graphs for Multi-Agent Reinforcement Learning

Neural Networks 2025年 188卷 107454页

作者： Zhang, Xiwen Chen, Jie Gan, Ming-Gang Chen, Haoxiang State Key Laboratory of Intelligent Control and Decision of Complex Systems School of Automation Beijing Institute of Technology Beijing100081 China

In contemporary Multi-Agent Reinforcement Learning (MARL), effectively enhancing the expressive capacity of value functions has been a persistent research focus. Many studies have employed value decomposition methods;however, due to the neglect of inter-agent collaboration, these methods fall short of achieving optimal performance. Subsequent research introduced coordination graphs into value decomposition methods;nevertheless, these approaches often rely on simplistic rules to evaluate inter-agent collaboration and fail to adequately describe the collaborative relationships of agents in complex environments. Consequently, we propose Influence Enhanced Sparse Coordination Graphs (IESCG) as a solution to provide insights into the aforementioned problem. In this study, we propose influence networks as quantitative descriptions of the importance of collaboration among agents, utilizing them as crucial basis for constructing the topology of Sparse Time-Varying Coordination Graphs. Additionally, we propose Recurrent Payoff Function Networks (RPFN) to incorporate temporal information while providing necessary input to influence networks. Furthermore, Sparse Graph Advantage Selection Coefficients (SGASC) are introduced to stabilize the overall value function across different time steps, ensuring training stability. Experimental investigations conducted on the StarCraft II micromanagement and MACO benchmark indicate that our algorithm not only accelerates convergence and improves winning probabilities but also exhibits more pronounced advantages in complex scenarios. © 2025

关键词： Markov processes

来源：评论

学校读者我要写书评

暂无评论

Data-Driven Learning and control with Event-Triggered Measurements

引用

IEEE Transactions on Automatic control 2025年

作者： Feng, Shilun Shi, Dawei Chen, Tongwen Shi, Ling Beijing Institute of Technology State Key Laboratory of Intelligent Control and Decision of Complex Systems MIIT Key Laboratory of Servo Motion System Drive and Control School of Automation Beijing100081 China University of Alberta Department of Electrical and Computer Engineering EdmontonABT6G 1H9 Canada Hong Kong University of Science and Technology Department of Electronic and Computer Engineering Clear Water Bay Kowloon Hong Kong

Event-triggered control has attracted considerable attention for its effectiveness in resource-restricted applications. To make event-triggered control as an end-to-end solution, a key issue is how to effectively learn unknown system dynamics from event-triggered measurements and consequently, develop a learning-based event-triggered controller. Existing works learn system dynamics based on periodic time-triggered measurements, and it is yet to know how to learn a controller with performance guarantee based on event-triggered measurements. To address this issue, we consider the problem of learning an event-triggered state feedback controller for an unknown linear system based on event-triggered state measurements in this work. In particular, we first analyze the event-triggered measurements within a set-membership framework. We prove that the estimation error belongs to a bounded ellipsoid determined by the historical measurements and the event-triggering condition. Subsequently, we demonstrate that all admissible systems compatible with the collected data samples can be explicitly represented in the form of quadratic matrix inequalities using the state estimates. With the acquired set of admissible systems, a co-design problem for the data-driven controller and event-triggering condition is solved using the linear matrix inequality technique, with guaranteed closed-loop stability and L2-gain performance. Finally, numerical examples and comparisons are provided to illustrate the effectiveness of the proposed event-triggered learning and control approach. © 1963-2012 IEEE.

关键词： Networked control systems

来源：评论

学校读者我要写书评

暂无评论

Distributed Online Convex Optimization with Time-Varying Constraints: Tighter Cumulative Constraint Violation Bounds under Slater's Condition

引用

IEEE Transactions on Automatic control 2025年

作者： Yi, Xinlei Li, Xiuxian Yang, Tao Xie, Lihua Hong, Yiguang Chai, Tianyou Johansson, Karl H. National Key Laboratory of Autonomous Intelligent Unmanned Systems Shanghai Institute of Intelligent Science and Technology China Ministry of Education Frontiers Science Center for Intelligent Autonomous Systems China Massachusetts Institute of Technology Lab for Information & Decision Systems CambridgeMA02139 United States Northeastern University State Key Laboratory of Synthetical Automation for Process Industries Shenyang110819 China Nanyang Technological University School of Electrical and Electronic Engineering 50 Nanyang Avenue 639798 Singapore Kth Royal Institute of Technology Division of Decision and Control Systems School of Electrical Engineering and Computer Science Sweden Digital Futures Stockholm10044 Sweden

This paper considers distributed online convex optimization with time-varying constraints. In this setting, a network of agents makes decisions at each round, and then only a portion of the loss function and a coordinate block of the constraint function are privately revealed to each agent. The loss and constraint functions are convex and can vary arbitrarily across rounds. The agents collaborate to minimize static network regret and network cumulative constraint violation. A novel distributed online algorithm with a vanishing stepsize is proposed and it achieves an O(Tmax {c,1-c}) static network regret bound and an O(T1-c/2) network cumulative constraint violation bound, where T is the number of rounds and c∈ (0,1) is a user-defined trade-off parameter. When Slater's condition holds (i.e, there is a point that strictly satisfies the inequality constraints), the network cumulative constraint violation bound is reduced to O(T1-c). Moreover, if the loss functions are strongly convex, then static network regret bound is reduced to O(log (T)), and the network cumulative constraint violation bound is reduced to O(√log (T)T) and O(log (T)) without and with Slater's condition, respectively. To the best of our knowledge, this paper is the first to achieve tighter (network) cumulative constraint violation bounds for (distributed) online convex optimization with time-varying constraints under Slater's condition. Finally, the theoretical results are verified through numerical simulations. © 1963-2012 IEEE.

关键词： Convex optimization

来源：评论

学校读者我要写书评

暂无评论

Path Planning for Distribution Network Robots with Adaptive Simulated Annealing-Enhanced Artificial Potential Field Method 26

Path Planning for Distribution Network Robots with Adaptive ...

引用

26th International Conference on Industrial Technology, ICIT 2025

作者： Zhang, Baichao Chen, Xin Chen, Jinge Li, Jinbin Wang, Xiaokai Jian, Xu School of Automation China University of Geosciences Wuhan430074 China Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems Wuhan430074 China Engineering Research Center of Intelligent Technology for Geo-Exploration Ministry of Education Wuhan430074 China State Grid Hubei Electric Power Co. Ltd. Electric Power Science Research Institute Wuhan430077 China Hubei Fangyuan Dongli Electric Power Science Research Co. Ltd Wuhan430077 China

ISBN: (纸本)9798331521950

In this paper, we propose a hybrid algorithm that combines an improved Artificial Potential Field (APF) method with the Simulated Annealing (SA) algorithm for path planning of an electric power operation robot manipulator in complex distribution grid environments. To address the unreachable target issue inherent in traditional APF, we introduce a distance regulation factor to optimize the repulsive function. This modification allows the manipulator to smoothly approach the target point as it nears, while the repulsion from obstacles gradually decreases. Additionally, to overcome the limitations of the traditional SA algorithm, such as its tendency to get trapped in local minimum solutions and its inefficiency in complex environments, we propose an adaptive temperature rise strategy. This strategy increases the temperature, enhancing the probability of escaping local optimal solutions. When the APF algorithm becomes trapped in a local optimum, the improved SA algorithm is applied to escape the local minimum. Once the local optimum is avoided, the algorithm switches back to APF to continue the path planning process. Simulation results demonstrate that the proposed improved APF-SA algorithm adapts effectively to various complex environments, achieving shorter planning times and higher success rates compared to traditional APF and SA algorithms. It successfully resolves the unreachable target and local minimum problems associated with APF. Finally, the feasibility of the proposed APF-SA fusion algorithm is validated through experiments conducted on an electric power operation robot experimental platform for distribution grid applications. © 2025 IEEE.

关键词： Motion planning

来源：评论

学校读者我要写书评

暂无评论

A memetic algorithm for path planning of curvature-constrained UAVs performing surveillance of multiple ground targets

引用

Chinese Journal of Aeronautics 2014年第3期27卷 622-633页

作者： Zhang Xing Chen Jie Xin Bin Peng Zhihong School of Automation Beijing Institute of Technology State Key Laboratory of Intelligent Control and Decision of Complex Systems

The problem of generating optimal paths for curvature-constrained unmanned aerial vehicles （UAVs） performing surveillance of multiple ground targets is addressed in this paper. UAVs are modeled as Dubins vehicles so that the constraints of UAVs＇ minimal turning radius can be taken into account. In view of the effective surveillance range of the sensors equipped on UAVs, the problem is formulated as a Dubins traveling salesman problem with neighborhood （DTSPN）. Considering its prohibitively high computational complexity, the Dubins paths in the sense of terminal heading relaxation are introduced to simplify the calculation of the Dubins distance, and a boundary-based encoding scheme is proposed to determine the visiting point of every target neighborhood. Then, an evolutionary algorithm is used to derive the optimal Dubins tour. To further enhance the quality of the solutions, a local search strategy based on approximate gradient is employed to improve the visiting points of target neighborhoods. Finally, by a minor modification to the individual encoding, the algorithm is easily extended to deal with other two more sophisticated DTSPN variants （multi-UAV scenario and multiple groups of targets scenario）. The performance of the algorithm is demonstrated through comparative experiments with other two state-of-the-art DTSPN algorithms identified in literature. Numerical simulations exhibit that the algorithm proposed in this paper can find high-quality solutions to the DTSPN with lower computational cost and produce significantly improved performance over the other algorithms.

关键词： Approximate gradient Dubins traveling salesmanproblem with neighborhood Local search Memetic algorithm Unmanned aerial vehicles

来源：评论

学校读者我要写书评

暂无评论

Interactive multiobjective evolutionary algorithm based on decomposition and compression

引用

Science China(Information Sciences) 2021年第10期64卷 166-181页

作者： Lu CHEN Bin XIN Jie CHEN School of Automation Beijing Institute of Technology State Key Laboratory of Intelligent Control and Decision of Complex Systems

Many real-world optimization problems involve multiple conflicting objectives. Such problems are called multiobjective optimization problems(MOPs). Typically, MOPs have a set of so-called Pareto optimal solutions rather than one unique optimal solution. To assist the decision maker(DM) in finding his/her most preferred solution, we propose an interactive multiobjective evolutionary algorithm(MOEA)called iDMOEA-εC, which utilizes the DM's preferences to compress the objective space directly and progressively for identifying the DM's preferred region. The proposed algorithm employs a state-of-the-art decomposition-based MOEA called DMOEA-εC as the search engine to search for solutions. DMOEA-εC decomposes an MOP into a series of scalar constrained subproblems using a set of evenly distributed upper bound vectors to approximate the entire Pareto front. To guide the population toward only the DM's preferred part on the Pareto front, an adaptive adjustment mechanism of the upper bound vectors and two-level feasibility rules are proposed and integrated into DMOEA-εC to control the spread of the population. To ease the DM's burden, only a small set of representative solutions is presented in each interaction to the DM,who is expected to specify a preferred one from the set. Furthermore, the proposed algorithm includes a two-stage selection procedure, allowing to elicit the DM's preferences as accurately as possible. To evaluate the performance of the proposed algorithm, it was compared with other interactive MOEAs in a series of experiments. The experimental results demonstrated the superiority of iDMOEA-εC over its competitors.

关键词： multiobjective optimization interactive decision making preference incorporation decomposition compression

来源：评论

学校读者我要写书评

暂无评论

Online adaptive Q-learning method for fully cooperative linear quadratic dynamic games

引用

Science China(Information Sciences) 2019年第12期62卷 164-177页

作者： Xinxing LI Zhihong PENG Lei JIAO Lele XI Junqi CAI School of Automation Beijing Institute of Technology State Key Laboratory of Intelligent Control and Decision of Complex Systems

A model-based offline policy iteration(PI) algorithm and a model-free online Q-learning algorithm are proposed for solving fully cooperative linear quadratic dynamic games. The PI-based adaptive Q-learning method can learn the feedback Nash equilibrium online using the state samples generated by behavior policies, without sending inquiries to the system model. Unlike the existing Q-learning methods, this novel Q-learning algorithm executes both policy evaluation and policy improvement in an adaptive *** prove the convergence of the offline PI algorithm by proving its equivalence to Newton's method while solving the game algebraic Riccati equation(GARE). Furthermore, we prove that the proposed Q-learning method will converge to the Nash equilibrium under a small learning rate if the method satisfies certain persistence of excitation conditions, which can be easily met by suitable behavior policies. Our simulation results demonstrate the good performance of the proposed online adaptive Q-learning algorithm.

关键词： adaptive dynamic programming reinforcement learning Q-learning fully cooperative linear quadratic dynamic games policy iteration off-policy

来源：评论

学校读者我要写书评

暂无评论

Contextual Multi-Armed Bandit-Based Dynamic Cooperative Link Configuration for AUV in UWASNs With Energy Harvesting

引用

IEEE Transactions on Cognitive Communications and Networking 2025年

作者： Dai, Jun Li, Xinbin Han, Song Yu, Junzhi Liu, Zhixin Yanshan University Key Lab of Industrial Computer Control Engineering of Hebei Province Qinhuangdao066004 China Yanshan University Key Laboratory of Intelligent Rehabilitation and Neuromodulation of Hebei Province Qinhuangdao066004 China Peking University State Key Laboratory for Turbulence and Complex Systems Department of Advanced Manufacturing and Robotics BIC-ESAT College of Engineering Beijing100871 China

This paper investigates the cooperative link configuration problem for Autonomous Underwater Vehicle (AUV) in Underwater Acoustic (UWA) sensor networks with Energy Harvesting (EH), which aims to maximize long-term cumulative capacity by jointly optimizing cooperation relay, AUV transmission power, and relay transmission power. Subject to unknown time-varying Channel state Information (CSI), unpredictable stochastic EH, and variable communication topology caused by AUV mobility, the proposed problem is an unknown dynamic combination decision optimization problem. To resist the change of the dynamic UWA communication topology with unknown time-varying CSI and EH, a novel collaborative contextual multi-armed bandit learning framework is proposed, which allows relay nodes to learn cooperatively and enriches the learning information of relays. Consequently, the proposed learning framework can resist the change of the dynamic UWA communication topology efficiently, thereby achieving the superior link configuration quickly. Besides, a learning-rate adjustment rule for the dynamic UWA communication topology is proposed to adaptively balance the exploration-exploitation, thereby avoiding missing the real superior joint relay-power configuration. Finally, simulation results show the significantly superiority of the proposed scheme. © 2015 IEEE.

关键词： Cooperative communication

来源：评论

学校读者我要写书评

暂无评论

Adaptive Robust Dead-Zone Compensation control of Electro-Hydraulic Servo systems with Load Disturbance Rejection

引用

Journal of systems Science & complexity 2015年第2期28卷 341-359页

作者： HE Yudong WANG Junzheng HAO Renjian Key Laboratory of Intelligent Control and Decision of Complex Systems School of AutomationBeijing Institute of Technology

A backstepping method based adaptive robust dead-zone compensation controller is pro- posed for the electro-hydraulic servo systems （EHSSs） with unknown dead-zone and uncertain system parameters. Variable load is seen as a sum of a constant and a variable part. The constant part is regarded as a parameter of the system to be estimated real time. The variable part together with the friction are seen as disturbance so that a robust term in the controller can be adopted to reject them. Compared with the traditional dead-zone compensation method, a dead-zone compensator is incor- porated in the EH$S without constructing a dead-zone inverse. Combining backstepping method, an adaptive robust controller （ARC） with dead-zone compensation is formed. An easy-to-use ARC tuning method is also proposed after a further analysis of the ARC structure. Simulations show that the proposed method has a splendid tracking performance, all the uncertain parameters can be estimated, and the disturbance has been rejected while the dead-zone term is well estimated and compensated.

关键词： Adaptive robust control dead-zone servo systems tuning method. compensation disturbance rejection electro-hydraulic

来源：评论

学校读者我要写书评

暂无评论

Optimal fusion estimation for stochastic systems with cross-correlated sensor noises

引用

Science China(Information Sciences) 2017年第12期60卷 57-70页

作者： Liping YAN Yuanqing XIA Mengyin FU School of Automation Key Laboratory of Intelligent Control and Decision of Complex SystemsBeijing Institute of Technology

This paper is concerned with the optimal fusion of sensors with cross-correlated sensor *** taking linear transformations to the measurements and the related parameters, new measurement models are established, where the sensor noises are decoupled. The centralized fusion with raw data, the centralized fusion with transformed data, and a distributed fusion estimation algorithm are introduced, which are shown to be equivalent to each other in estimation precision, and therefore are globally optimal in the sense of linear minimum mean square error(LMMSE). It is shown that the centralized fusion with transformed data needs lower communication requirements compared to the centralized fusion using raw data directly, and the distributed fusion algorithm has the best flexibility and robustness and proper communication requirements and computation complexity among the three algorithms(less communication and computation complexity compared to the existed distributed Kalman filtering fusion algorithms). An example is shown to illustrate the effectiveness of the proposed algorithms.

关键词： optimal estimation distributed fusion Kalman filter cross-correlated noises linear transformation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：