检索结果-内蒙古大学图书馆

ieee International symposium on Personal, Indoor and Mobile Radio Communications (PIMRC)

作者： Siavash Barqi Janiar Ping Wang York University

One of the security issues in a wireless network is jamming attacks, where the jammer causes congestion and significant decrement in the network throughput by obstructing channels and disrupting user signals. Deep reinforcement learning (DRL) models have been adopted to confront the jammer. However, training a DRL model from scratch may take a long time. In this paper, we propose a transfer learning (TL) approach to enable the DRL agent to learn fast in dynamic wireless networks to confront jamming attacks effectively. To make our proposed TL method adaptive to different network environments, we propose a novel method to quantitatively measure the difference between the source and target domains, using an integrated feature extractor. Based on the measured difference, we choose an optimal setting for the TL model. Experiment results show that the proposed TL method can effectively reduce the training time for the DRL model and outperforms other existing TL methods.

关键词：

来源：评论

学校读者我要写书评

暂无评论

MEWA: A Benchmark For Meta-learning in Collaborative Working Agents

MEWA: A Benchmark For Meta-Learning in Collaborative Working...

引用

ieee symposium Series on Computational Intelligence (SSCI)

作者： Radu Stoican Angelo Cangelosi Thomas H. Weisswange Manchester Centre for Robotics and AI University of Manchester Manchester United Kingdom Honda Research Institute Europe GmbH Offenbach Germany

Meta-reinforcement learning aims to overcome important limitations in reinforcement learning, like low sample efficiency and poor generalization, by creating agents that adapt to new tasks. The development of intelligent robots would benefit from such agents. Long-standing issues like data collection and generalization to real-world dynamic environments could be mitigated by sample-efficient adaptable algorithms. However, most such algorithms have only been proven to work in low-complexity environments. These provide no guarantee that a near-optimal global policy does not exist, which makes it difficult to evaluate adaptable policies. This hinders the in-depth analysis of an agent's potential to adapt, while also introducing a gap between controlled experiments and real-world applications. We propose MEWA, a collection of task distributions used as a benchmark for adaptable agents. Our tasks contain a shared structure that an agent can leverage to learn the task-specific structure of new tasks. To ensure our environment is adaptive, we select some of the task parameters using the solution to a constrained optimization problem. Other parameters are randomized, allowing the creation of arbitrary task distributions. We evaluate three state-of-the-art meta-reinforcement learning algorithms on our benchmark, that were previously shown to adapt to new tasks with a simpler structure. Results show that the algorithms can reach meaningful performance on the task, but cannot yet fully adapt to the task-specific structure. We believe this benchmark will help identify some of the issues that hinder adaptability, ultimately aiding in the design of new algorithms, more suitable for real-world human-robot applications.

关键词：

来源：评论

学校读者我要写书评

暂无评论

adaptive dynamic programming based on parallel control theory for underwater vehicles 1

Adaptive dynamic programming based on parallel control theor...

引用

1st ieee International Conference on Digital Twins and Parallel Intelligence, DTPI 2021

作者： Bo, Peng Tu, Xingbin Qu, Fengzhong Wang, Fei-Yue Zhejiang University Key Laboratory of Ocean Observation-Imaging Testbed of Zhejiang Province Zhoushan China Institute of Automation Chinese Academy of Sciences Beijing China

ISBN: (纸本)9781665433372

Parallel control theory can provide an effective solution for the control problem of complex system with unknown models and time-varying characteristics. The adaptive dynamic programming (ADP) method, which combines reinforcement learning and dynamic programming algorithms, is the most advanced method for implementing parallel control theory. In this paper, we systematically review the ADP-based parallel control theory, as well as how it can be developed for underwater vehicles. First, the foundation and fundamental principles of parallel control are outlined in detail. Second, the ADP method under parallel control theory is presented, along with an overview of ADP method in the control of underwater vehicles. At last, we review the latest development and forecast the prospects of ADP-based underwater vehicle parallel control. © 2021 ieee.

关键词： dynamic programming

来源：评论

学校读者我要写书评

暂无评论

adaptive Critic learning and Experience Replay for Decentralized Event-Triggered Control of Nonlinear Interconnected Systems

引用

ieee TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS 2020年第11期50卷 4043-4055页

作者： Yang, Xiong He, Haibo Tianjin Univ Sch Elect & Informat Engn Tianjin 300072 Peoples R China Univ Rhode Isl Dept Elect Comp & Biomed Engn Kingston RI 02881 USA

In this paper, we develop a decentralized event-triggered control (ETC) strategy for a class of nonlinear systems with uncertain interconnections. To begin with, we show that the decentralized ETC policy for the whole system can be represented by a group of optimal ETC laws of auxiliary subsystems. Then, under the framework of adaptive critic learning, we construct the critic networks to solve the event-triggered Hamilton-Jacobi-Bellman equations related to these optimal ETC laws. The weight vectors used in the critic networks are updated by using the gradient descent approach and the experience replay (ER) technique together. With the aid of the ER technique, we can conquer the difficulty arising in the persistence of excitation condition. Meanwhile, by using classic Lyapunov approaches, we prove that the estimated weight vectors used in the critic networks are uniformly ultimately bounded. Moreover, we demonstrate that the obtained decentralized ETC can force the overall system to be asymptotically stable. Finally, we present an interconnected nonlinear plant to validate the proposed decentralized ETC scheme.

关键词： Erbium Interconnected systems Optimal control Artificial neural networks adaptive systems Nonlinear systems dynamic programming adaptive critic learning (ACL) adaptive dynamic programming (ADP) event-triggered control (ETC) experience replay (ER) interconnected systems reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

Event-Triggered Decentralized Tracking Control of Modular Reconfigurable Robots Through adaptive dynamic programming

引用

ieee TRANSACTIONS ON INDUSTRIAL ELECTRONICS 2020年第4期67卷 3054-3064页

作者： Zhao, Bo Liu, Derong Beijing Normal Univ Sch Syst Sci Beijing 100875 Peoples R China Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Guangdong Univ Technol Sch Automat Guangzhou 510006 Peoples R China

This paper develops an event-triggered decentralized tracking control (DTC) approach for modular reconfigurable robots (MRRs) by using adaptive dynamic programming. By establishing a decentralized neural network (NN) observer, which uses local input-output data and desired states of coupling subsystems, the local dynamics of MRR subsystem can be obtained. In order to obtain the DTC, the tracking error subsystem is augmented by the exosystem with the desired trajectory. Based on the event-triggered mechanism and a modified local cost function, the DTC is derived by solving the local Hamilton-Jacobi-Bellman equation via a local critic NN with asymptotically stable structure. The stability of the entire closed-loop MRR system is analyzed by Lyapunovs direct method. The simulation of a two-degree of freedom MRR system ensures that the developed event-triggered DTC scheme is effective.

关键词： Decentralized control Couplings Optimal control Robots dynamic programming Artificial neural networks Trajectory adaptive dynamic programming decentralized tracking control event-triggered mechanism modular reconfigurable robots optimal control reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

A Model-Free Solution for Stackelberg Games Using reinforcement learning and Projection Approaches

A Model-Free Solution for Stackelberg Games Using Reinforcem...

引用

International Workshop on Robot Sensing (ROSE)

作者： Mohammed Abouheaf Wail Gueaieb Suruz Miah Esam H. Abdelhameed Robotics Engineering Bowling Green State University Bowling Green OH USA School of Electrical Engineering and Computer Science University of Ottawa Ottawa Ontario Canada Department of Electrical and Computer Engineering Bradley University Peoria Illinois USA Faculty of Energy Engineering Aswan University Aswan Egypt

ISBN: (数字)9798350362367

ISBN: (纸本)9798350362374

The Stackelberg game is adopted in many robotics applications. It features a dynamic multi-player setup based on a leader-follower structure. The main challenge involves implementing model-free strategies that can effectively respond to unstructured environments in a data-driven manner. This paper presents a model-free method for solving the Stackelberg game in real-time, wherein the follower's strategy assumes knowledge of the leader's tactics. Moreover, the strategies are implemented in real-time without knowledge of the players' dynamics. The optimization goals are expressed through coupled Bellman optimality equations, highlighting the dependency between leader and follower strategies. The method utilizes a linear adaptive critics framework, where the actor-critic weights are adjusted using a projection method to ensure stability and convergence. This approach is evaluated on systems with delays and unstructured disturbances to demonstrate its robustness.

关键词： Adaptation models Games reinforcement learning Mathematical models Real-time systems Robustness Stability analysis

来源：评论

学校读者我要写书评

暂无评论

Fault Diagnosis for Underactuated Surface Vessel 40

Fault Diagnosis for Underactuated Surface Vessel

引用

40th Chinese Control Conference (CCC)

作者： Mao, Ruiqi Cui, Rongin Northwestern Polytech Univ Sch Marine Sci & Technol Xian 710000 Peoples R China

ISBN: (纸本)9789881563804

In recent years deep neural networks have achieved state-of-the-art accuracy at classifying the running state of a robot. Yet we propose a composite learning model (CLM) that combines the strength of broad learning and conventional deep learning techniques to identify the fault types of underactuated surface vessels (USV). Considering the measurement noises in training and testing data, we develop a deep sparse auto-encoder (DSAE) stacked by denoising auto-encoder (DAE) and contractive auto-encoders (CAEs). To further reduce the computation time, a modified broad learning system (BLS) based classifier is developed, and the input layer receives the signal from the top layer of DSAE. We use the output of the classifier as feedback. Meanwhile value iterative (VI) based adaptive dynamic programming (ADP) is employed to calculate the near-optimal increment of connection weight. Finally, we validate the developed approach by experiments using simulation data of USV that compares the proposed CLM with the standard BLS and conventional deep learning methods.

关键词： dimension reduction broad learning reinforcement learning signal feedback deep sparse auto-encoder (DSAE) denoising auto-encoder (DAE) contractive auto-encoder (CAE) adaptive dynamic programming (ADP)

来源：评论

学校读者我要写书评

暂无评论

Safe adaptive dynamic programming Method for Nonlinear Safety-Critical Systems with Disturbance 6

Safe Adaptive Dynamic Programming Method for Nonlinear Safet...

引用

6th International Conference on Robotics and Automation Engineering, ICRAE 2021

作者： Wang, Jinguang Zhang, Dehua Zhang, Jishi Zhu, Heyang Hu, Shaolin Qin, Chunbin Henan University School of Artificial Intelligence Kaifeng China Guangdong University of Petrochemical Technology School of Automation Maoming China

ISBN: (纸本)9781665406970

In this paper, a safe adaptive dynamic programming (SADP) method based on the barrier function (BF) is proposed for the optimal control problem of nonlinear safety-critical systems with the safety constraints and external disturbance. Firstly, the barrier function is used to transform the nonlinear system with the security constraints into a transformed system without the security constraints. Secondly, based on the transformed system, a new barrier-disturbance-related term is proposed to approximate the effect of the external disturbance. On the premise of satisfying the security constraints and stability, the neural network (NN) approximation method is used to approximate the optimal cost function and optimal control strategy of the system online. Finally, the simulation results show that the proposed method can make the system state convergence well and does not violate the security constraints. © 2021 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

***: Power-Aware Traffic Engineering via Deep reinforcement learning 29

***: Power-Aware Traffic Engineering via Deep Reinforcement ...

引用

29th ieee/ACM International symposium on Quality of Service (IWQOS)

作者： Pan, Tian Peng, Xiaoyu Shi, Qianqian Bian, Zizheng Lin, Xingchen Song, Enge Li, Fuliang Xu, Yang Huang, Tao BUPT State Key Lab Networking & Switching Technol Beijing Peoples R China Sci & Technol Commun Networks Lab Shijiazhuang Hebei Peoples R China Northeastern Univ Shenyang Liaoning Peoples R China Fudan Univ Shanghai Peoples R China

ISBN: (纸本)9781665414944

Power-aware traffic engineering via coordinated sleeping is usually formulated into Integer programming problems, which are generally NP-hard with unbounded computation time for large-scale networks. This results in delayed control decision making in dynamic network environments. Motivated by advances in deep reinforcement learning, we consider building intelligent systems that learn to adaptively change router/switch's power state according to changing network conditions. Neural network's forward propagation can greatly speed up power on/off decision making. Generally, conducting RL requires a learning agent to iteratively explore and perform the "good" actions based on the feedback from the environment. By coupling Software-Defined Networking for performing centrally calculated actions to the environment and In-band Network Telemetry for collecting feedback from the environment, we develop ***, a closed-loop control/training system to automate power-aware traffic engineering. Furthermore, we propose novel techniques to enhance the learning ability and reduce the learning complexity. With both energy efficiency and traffic load balancing considered, *** can generate reasonable power saving actions within 276ms under a network testbed of 11 software P4 switches.

关键词： Green products Decision making reinforcement learning Telecommunication traffic Quality of service Control systems Energy efficiency

来源：评论

学校读者我要写书评

暂无评论

DATE: Disturbance-Aware Traffic Engineering with reinforcement learning in Software-Defined Networks 29

DATE: Disturbance-Aware Traffic Engineering with Reinforceme...

引用

29th ieee/ACM International symposium on Quality of Service (IWQOS)

作者： Ye, Minghao Zhang, Junjie Guo, Zehua Chao, H. Jonathan NYU Dept Elect & Comp Engn New York NY 11201 USA Fortinet Inc Sunnyvale CA 94086 USA Beijing Inst Technol Beijing 100081 Peoples R China

ISBN: (纸本)9781665414944

Traffic Engineering (TE) has been applied to optimize network performance by routing/rerouting flows based on traffic loads and network topologies. To cope with network dynamics from emerging applications, it is essential to reroute flows more frequently than today's TE to maintain network performance. However, existing TE solutions may introduce considerable Quality of Service (QoS) degradation and service disruption since they do not take the potential negative impact of flow rerouting into account. In this paper, we apply a new QoS metric named network disturbance to gauge the impact of flow rerouting while optimizing network load balancing in backbone networks. To employ this metric in TE design, we propose a disturbance-aware TE called DATE, which uses reinforcement learning (RL) to intelligently select some critical flows between nodes for each traffic matrix and reroute them using Linear programming (LP) to jointly optimize network performance and disturbance. DATE is equipped with a customized actor-critic architecture and Graph Neural Networks (GNNs) to handle dynamic traffic and single link failures. Extensive evaluations show that DATE can outperform state-of-the-art TE methods with close-to-optimal load balancing performance while effectively mitigating the 99th percentile network disturbance by up to 31.6%.

关键词： Traffic Engineering Software-Defined Networking reinforcement learning Routing Network Disturbance Link Failure

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：