检索结果-内蒙古大学图书馆

IEEE International Conference on Assured Autonomy (ICAA)

作者： Bhowmick, Chandreyee Shabbir, Mudassir Abbas, Waseem Koutsoukos, Xenofon Vanderbilt Univ Inst Software Integrated Syst 221 Kirkland Hall Nashville TN 37235 USA Univ Texas Dallas Dept Syst Engn Richardson TX USA

ISBN: (纸本)9781665485395

A network of reinforcement learning (RL) agents that cooperate with each other by sharing information can improve learning performance of control and coordination tasks when compared to non-cooperative agents. However, networked Multi-agent Reinforcement Learning (MARL) is vulnerable to adversarial agents that can compromise some agents and send malicious information to the network. In this paper, we consider the problem of resilient MARL in the presence of adversarial agents that aim to compromise the learning algorithm. First, the paper presents an attack model which aims to degrade the performance of a target agent by modifying the parameters shared by an attacked agent. In order to improve the resilience, the paper presents aggregation methods using medoid and soft-medoid. Our analysis shows that the medoid-based MARL algorithms converge to an optimal solution given standard assumptions, and improve the overall learning performance and robustness. Simulation results show the effectiveness of the aggregation methods compared with average and median-based aggregation.

关键词： actor-critic algorithm adversarial attacks multi-agent reinforcement learning neural networks resilient aggregation

来源：评论

学校读者我要写书评

暂无评论

An online Q-learning design for stochastic differential LQ game with completely unknown dynamics 41

An online Q-learning design for stochastic differential LQ g...

引用

第41届中国控制会议

作者： Baoqiang Zhang Bingchang Wang School of Control Science and Engineering Shandong University Shandong University

ISBN: (数字)9789887581536

ISBN: (纸本)9781665482561

In this paper,we design a reinforcement learning algorithm to solve the adaptive optimal control problem of linear quadratic stochastic non-zero sum differential game with n-players and completely unknown *** is difficult to solve a set of coupled Riccati equations by the traditional method,for the complete system dynamics are *** first use an action dependent value function Q for each actor to replace the state dependent value function *** each player,a critical network is used to estimate the Q function,an actor network is used to estimate the control *** states and the actions of the system constitute the turning ***,a model-free online Q-learning algorithm for solving this kind of problems is *** is proved that the algorithm converges to Nash equilibrium under some conditions.A simulation example with two players is given to verify the effectiveness of the algorithm.

关键词： nonzero-sum stochastic game actor-critic algorithm model-free adaptive control reinforcement learning(RL)

来源：评论

学校读者我要写书评

暂无评论

Reinforcement Learning Based Dynamic Resource Allocation for Massive MTC in Sliced Mobile Networks 14

Reinforcement Learning Based Dynamic Resource Allocation for...

引用

14th IEEE International Conference on Advanced Infocomm Technology (ICAIT)

作者： Yang, Bei Xu, Yiqian She, Xiaoming Zhu, Jianchi Wei, Fengsheng Cheri, Peng Wang, Jianxiu China Telecom Res Inst Beijing 102209 Peoples R China Univ Elect Sci & Technol China Chengdu 611731 Peoples R China

ISBN: (数字)9781665471565

ISBN: (纸本)9781665471565

With the rapid development of the Internet of Things (IoT) systems, the low latency requirement of massive Machine Type Communication ( mMTC) in the IoT is an urgent problem to be solved for future mobile communication networks. In this paper, we use a reasonable resource allocation strategy and set priority parameters for each slice according to the average access delay of each slice. We propose a dynamic resource allocation strategy based on Markov Decision Process (MDP) modeling of mMTC random access process and using actor-critic (AC) algorithm in reinforcement learning. Simulations show that the proposed resource block resource allocation algorithm can reasonably allocate resources for each mMTC access slice to ensure the Quality-of-Service (QoS) requirements of mMTC applications.

关键词： random access network slicing mMTC actor-critic algorithm

来源：评论

学校读者我要写书评

暂无评论

Leveraging UAVs for Coverage in Cell-Free Vehicular Networks: A Deep Reinforcement Learning Approach

引用

IEEE TRANSACTIONS ON MOBILE COMPUTING 2021年第9期20卷 2835-2847页

作者： Samir, Moataz Ebrahimi, Dariush Assi, Chadi Sharafeddine, Sanaa Ghrayeb, Ali Concordia Univ Montreal PQ H3G 1M8 Canada Lakehead Univ Thunder Bay ON P7B 5E1 Canada Lebanese Amer Univ Beirut 11022801 Lebanon Texas A&M Univ Doha 23874 Qatar

The success in transitioning towards smart cities relies on the availability of information and communication technologies that meet the demands of this transformation. The terrestrial infrastructure presents itself as a preeminent component in this change. Unmanned aerial vehicles (UAVs) empowered with artificial intelligence (AI) are expected to become an integral component of future smart cities that provide seamless coverage for vehicles on highways with poor cellular infrastructure. Motivated by the above, in this paper, we introduce UAVs cell-free network for providing coverage to vehicles entering a highway that is not covered by other infrastructure. However, UAVs have limited energy resources and cannot serve the entire highway all the time. Furthermore, the deployed UAVs have insufficient knowledge about the environment (e.g., the vehicles' instantaneous location). Therefore, it is challenging to control a swarm of UAVs to achieve efficient communication coverage. To address these challenges, we formulate the trajectories decisions making as a Markov decision process (MDP) where the system state space considers the vehicular network dynamics. Then, we leverage deep reinforcement learning (DRL) to propose an approach for learning the optimal trajectories of the deployed UAVs to efficiently maximize the vehicular coverage, where we adopt actor-critic algorithm to learn the vehicular environment and its dynamics to handle the complex continuous action space. Finally, simulations results are provided to verify our findings and demonstrate the effectiveness of the proposed design and show that during the mission time, the deployed UAVs adapt their velocities in order to cover the vehicles.

关键词： Trajectory Road transportation Reinforcement learning Vehicle dynamics Aerospace electronics Wireless networks Task analysis UAV coverage deep reinforcement learning UAVs' trajectories drive-thru actor-critic algorithm vehicular networks

来源：评论

学校读者我要写书评

暂无评论

TASAC: A twin-actor reinforcement learning framework with a stochastic with an to batch control

引用

CONTROL ENGINEERING PRACTICE 2023年第1期134卷

作者： Joshi, Tanuja Kodamana, Hariprasad Kandath, Harikumar Kaisare, Niket Indian Inst Technol Delhi Dept Chem Engn New Delhi 110016 India Indian Inst Technol Delhi Yardi Sch Artificial Intelligence New Delhi 110016 India Int Inst Informat Technol Hyderabad Hyderabad 500032 India Indian Inst Technol Madras Dept Chem Engn Chennai 600036 India

Due to their complex nonlinear dynamics and batch-to-batch variability, batch processes pose a challenge for process control. Due to the absence of accurate models and resulting plant-model mismatch, these problems become harder to address for advanced model-based control strategies. Reinforcement Learning (RL), wherein an agent learns the policy by directly interacting with the environment, offers a potential alternative in this context. RL frameworks with actor-critic architecture have recently become popular for controlling systems where state and action spaces are continuous. The current study proposes a stochastic actor-critic RL algorithm, termed Twin actor Soft actor-critic (TASAC), by incorporating an ensemble of actors in a maximum entropy framework to improve learning due to enhanced exploration. The efficacy of the proposed approach is showcased by applying the same for the control of batch transesterification.

关键词： Reinforcement learning actor-critic algorithm Deep learning Batch process

来源：评论

学校读者我要写书评

暂无评论

actor-critic Based Graphical Games for Discrete-time Linear Systems with Input Constraints 39

Actor-critic Based Graphical Games for Discrete-time Linear ...

引用

39th Chinese Control Conference (CCC)

作者： Wang, Tian-Xiang Liang, Yong Wang, Bing-Chang Shandong Univ Sch Control Sci & Engn Jinan Shandong Peoples R China

ISBN: (纸本)9789881563903

In dynamic graphical games, in order to obtain the optimal strategy for each agent, the traditional method is to solve a set of coupled HJB equations. It is very difficult to solve such problems by traditional methods, especially the input of each agent is constrained. actor-critic is a reinforcement learning method that can solve such problems through online iteration. This paper proposes an online iterative algorithm for solving linear discrete-time systems graphics games with input constraints, and this algorithm without the need for drift dynamics of agents. Each agent needs two neural networks to fit the agent's value function and control strategy, respectively. Finally, a simulation example is given to show the effectiveness of our method.

关键词： actor-critic algorithm Differential Games Input Constraints Neural Network (NN) Reinforcement Learning (RL)

来源：评论

学校读者我要写书评

暂无评论

How to use prior knowledge for injection molding in industry 4.0

引用

RESULTS IN ENGINEERING 2024年 23卷

作者： Parizs, Richard Dominik Torok, Daniel Budapest Univ Technol & Econ Fac Mech Engn Dept Polymer Engn Muegyetem Rkp 3 H-1111 Budapest Hungary MTA BME Lendulet Lightweight Polymer Composites Re Muegyetem Rkp 3 H-1111 Budapest Hungary

Searching for the optimal injection molding settings for a new product usually requires much time and money. This article proposes a new method that uses reinforcement learning with prior knowledge for the optimization of settings. This method uses an actor-critic algorithm for the optimization of the filling phase and the holding phase. For five different injection molded products, the filling phase and holding phase were adjusted with the above-mentioned method. The learning algorithm optimized the settings for one product (pre-learning) and used this acquired knowledge (prior knowledge) to optimize the injection molding settings for a new product (post- learning). This research shows that the method is able to optimize the injection molding parameters in a reasonable time when prior knowledge is derived from a product with a different material, gate design or even geometry. On average, less than 16 injection molding cycles were needed for the algorithm to optimize the filling phase and less than 10 cycles to optimize the holding phase. The presented method can greatly facilitate the development of self-adjusting injection molding machines.

关键词： Injection molding Reinforcement learning actor-critic algorithm Industry 4.0 Self-adjustment

来源：评论

学校读者我要写书评

暂无评论

Drone Elevation Control Based on Python-Unity Integrated Framework for Reinforcement Learning Applications

引用

DRONES 2023年第4期7卷 225-225页

作者： Abbass, Mahmoud Abdelkader Bashery Kang, Hyun-Soo Chungbuk Natl Univ Sch Elect & Comp Engn Dept Informat & Commun Engn Cheongju 28644 South Korea Helwan Univ Dept Mech Power Engn Cairo 11772 Egypt

Reinforcement learning (RL) applications require a huge effort to become established in real-world environments, due to the injury and break down risks during interactions between the RL agent and the environment, in the online training process. In addition, the RL platform tools (e.g., Python OpenAI's Gym, Unity ML-Agents, PyBullet, DART, MoJoCo, RaiSim, Isaac, and AirSim), that are required to reduce the real-world challenges, suffer from drawbacks (e.g., the limited number of examples and applications, and difficulties in implementation of the RL algorithms, due to difficulties with the programing language). This paper presents an integrated RL framework, based on Python-Unity interaction, to demonstrate the ability to create a new RL platform tool, based on making a stable user datagram protocol (UDP) communication between the RL agent algorithm (developed using the Python programing language as a server), and the simulation environment (created using the Unity simulation software as a client). This Python-Unity integration process, increases the advantage of the overall RL platform (i.e., flexibility, scalability, and robustness), with the ability to create different environment specifications. The challenge of RL algorithms' implementation and development is also achieved. The proposed framework is validated by applying two popular deep RL algorithms (i.e., Vanilla Policy Gradient (VPG) and actor-critic (A2C)), on an elevation control challenge for a quadcopter drone. The validation results for these experimental tests, prove the innovation of the proposed framework, to be used in RL applications, because both implemented algorithms achieve high stability, by achieving convergence to the required performance through the semi-online training process.

关键词： reinforcement learning agent simulation platform Python programming language Unity simulation software UDP communication protocol Vanilla Policy Gradient algorithm actor-critic algorithm

来源：评论

学校读者我要写书评

暂无评论

Manipulator Motion Planning based on actor-critic Reinforcement Learning

Manipulator Motion Planning based on Actor-Critic Reinforcem...

引用

第40届中国控制会议

作者： Qiang Li Jun Nie Haixia Wang Xiao Lu Shibin Song College of Electrical Engineering and Automation Shandong University of Science and Technology

The manipulator control model has the characteristics of high-order,nonlinear,multivariable and strong coupling,which makes it difficult for the manipulator to have good adaptability and *** at the problem of poor reusability and poor autonomy of manipulator applications,a motion planning algorithm based on reinforcement learning is *** this paper,the reinforcement learning continuous control algorithm actor-critic is applied to the motion planning of the manipulator to increase the environmental applicability and autonomy of the manipulator,and realize the intelligent control of the manipulator under simple kinematics *** first,the simulation environment of the hand-eye system of the manipulator is constructed,then the reinforcement learning algorithm model is established according to the simulation environment,and finally,the motion planning training of the manipulator is completed in the simulation *** results demonstrate that the proposed manipulator motion planning algorithm based on actor-critic reinforcement learning has good environmental adaptability and stability.

关键词： Reinforcement Learning actor-critic algorithm Manipulator Motion Planning Value evaluation Policy evaluation

来源：评论

学校读者我要写书评

暂无评论

A Scalable algorithm for Anomaly Detection via Learning-Based Controlled Sensing

A Scalable Algorithm for Anomaly Detection via Learning-Base...

引用

IEEE International Conference on Communications (ICC)

作者： Joseph, Geethu Gursoy, M. Cenk Varshney, Pramod K. Syracuse Univ Dept Elect Engn & Comp Sci Syracuse NY 13244 USA

ISBN: (纸本)9781728171227

We address the problem of sequentially selecting and observing processes from a given set to find the anomalies among them. The decision maker observes one process at a time and obtains a noisy binary indicator of whether or not the corresponding process is anomalous. In this setting, we develop an anomaly detection algorithm that chooses the process to be observed at a given time instant, decides when to stop taking observations, and makes a decision regarding the anomalous processes. The objective of the detection algorithm is to arrive at a decision with an accuracy exceeding a desired value while minimizing the delay in decision making. Our algorithm relies on a Markov decision process defined using the marginal probability of each process being normal or anomalous, conditioned on the observations. We implement the detection algorithm using the deep actor-critic reinforcement learning framework. Unlike prior work on this topic that has exponential complexity in the number of processes, our algorithm has computational and memory requirements that are both polynomial in the number of processes. We demonstrate the efficacy of our algorithm using numerical experiments by comparing it with the state-of-the-art methods.

关键词： Active hypothesis testing anomaly detection deep learning reinforcement learning actor-critic algorithm quickest state estimation sequential decision-making sequential sensing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：