A network of reinforcement learning (RL) agents that cooperate with each other by sharing information can improve learning performance of control and coordination tasks when compared to non-cooperative agents. However...
详细信息
ISBN:
(纸本)9781665485395
A network of reinforcement learning (RL) agents that cooperate with each other by sharing information can improve learning performance of control and coordination tasks when compared to non-cooperative agents. However, networked Multi-agent Reinforcement Learning (MARL) is vulnerable to adversarial agents that can compromise some agents and send malicious information to the network. In this paper, we consider the problem of resilient MARL in the presence of adversarial agents that aim to compromise the learning algorithm. First, the paper presents an attack model which aims to degrade the performance of a target agent by modifying the parameters shared by an attacked agent. In order to improve the resilience, the paper presents aggregation methods using medoid and soft-medoid. Our analysis shows that the medoid-based MARL algorithms converge to an optimal solution given standard assumptions, and improve the overall learning performance and robustness. Simulation results show the effectiveness of the aggregation methods compared with average and median-based aggregation.
In this paper,we design a reinforcement learning algorithm to solve the adaptive optimal control problem of linear quadratic stochastic non-zero sum differential game with n-players and completely unknown *** is diffi...
详细信息
ISBN:
(数字)9789887581536
ISBN:
(纸本)9781665482561
In this paper,we design a reinforcement learning algorithm to solve the adaptive optimal control problem of linear quadratic stochastic non-zero sum differential game with n-players and completely unknown *** is difficult to solve a set of coupled Riccati equations by the traditional method,for the complete system dynamics are *** first use an action dependent value function Q for each actor to replace the state dependent value function *** each player,a critical network is used to estimate the Q function,an actor network is used to estimate the control *** states and the actions of the system constitute the turning ***,a model-free online Q-learning algorithm for solving this kind of problems is *** is proved that the algorithm converges to Nash equilibrium under some conditions.A simulation example with two players is given to verify the effectiveness of the algorithm.
With the rapid development of the Internet of Things (IoT) systems, the low latency requirement of massive Machine Type Communication ( mMTC) in the IoT is an urgent problem to be solved for future mobile communicatio...
详细信息
ISBN:
(数字)9781665471565
ISBN:
(纸本)9781665471565
With the rapid development of the Internet of Things (IoT) systems, the low latency requirement of massive Machine Type Communication ( mMTC) in the IoT is an urgent problem to be solved for future mobile communication networks. In this paper, we use a reasonable resource allocation strategy and set priority parameters for each slice according to the average access delay of each slice. We propose a dynamic resource allocation strategy based on Markov Decision Process (MDP) modeling of mMTC random access process and using actor-critic (AC) algorithm in reinforcement learning. Simulations show that the proposed resource block resource allocation algorithm can reasonably allocate resources for each mMTC access slice to ensure the Quality-of-Service (QoS) requirements of mMTC applications.
The success in transitioning towards smart cities relies on the availability of information and communication technologies that meet the demands of this transformation. The terrestrial infrastructure presents itself a...
详细信息
The success in transitioning towards smart cities relies on the availability of information and communication technologies that meet the demands of this transformation. The terrestrial infrastructure presents itself as a preeminent component in this change. Unmanned aerial vehicles (UAVs) empowered with artificial intelligence (AI) are expected to become an integral component of future smart cities that provide seamless coverage for vehicles on highways with poor cellular infrastructure. Motivated by the above, in this paper, we introduce UAVs cell-free network for providing coverage to vehicles entering a highway that is not covered by other infrastructure. However, UAVs have limited energy resources and cannot serve the entire highway all the time. Furthermore, the deployed UAVs have insufficient knowledge about the environment (e.g., the vehicles' instantaneous location). Therefore, it is challenging to control a swarm of UAVs to achieve efficient communication coverage. To address these challenges, we formulate the trajectories decisions making as a Markov decision process (MDP) where the system state space considers the vehicular network dynamics. Then, we leverage deep reinforcement learning (DRL) to propose an approach for learning the optimal trajectories of the deployed UAVs to efficiently maximize the vehicular coverage, where we adopt actor-critic algorithm to learn the vehicular environment and its dynamics to handle the complex continuous action space. Finally, simulations results are provided to verify our findings and demonstrate the effectiveness of the proposed design and show that during the mission time, the deployed UAVs adapt their velocities in order to cover the vehicles.
Due to their complex nonlinear dynamics and batch-to-batch variability, batch processes pose a challenge for process control. Due to the absence of accurate models and resulting plant-model mismatch, these problems be...
详细信息
Due to their complex nonlinear dynamics and batch-to-batch variability, batch processes pose a challenge for process control. Due to the absence of accurate models and resulting plant-model mismatch, these problems become harder to address for advanced model-based control strategies. Reinforcement Learning (RL), wherein an agent learns the policy by directly interacting with the environment, offers a potential alternative in this context. RL frameworks with actor-critic architecture have recently become popular for controlling systems where state and action spaces are continuous. The current study proposes a stochastic actor-critic RL algorithm, termed Twin actor Soft actor-critic (TASAC), by incorporating an ensemble of actors in a maximum entropy framework to improve learning due to enhanced exploration. The efficacy of the proposed approach is showcased by applying the same for the control of batch transesterification.
In dynamic graphical games, in order to obtain the optimal strategy for each agent, the traditional method is to solve a set of coupled HJB equations. It is very difficult to solve such problems by traditional methods...
详细信息
ISBN:
(纸本)9789881563903
In dynamic graphical games, in order to obtain the optimal strategy for each agent, the traditional method is to solve a set of coupled HJB equations. It is very difficult to solve such problems by traditional methods, especially the input of each agent is constrained. actor-critic is a reinforcement learning method that can solve such problems through online iteration. This paper proposes an online iterative algorithm for solving linear discrete-time systems graphics games with input constraints, and this algorithm without the need for drift dynamics of agents. Each agent needs two neural networks to fit the agent's value function and control strategy, respectively. Finally, a simulation example is given to show the effectiveness of our method.
Searching for the optimal injection molding settings for a new product usually requires much time and money. This article proposes a new method that uses reinforcement learning with prior knowledge for the optimizatio...
详细信息
Searching for the optimal injection molding settings for a new product usually requires much time and money. This article proposes a new method that uses reinforcement learning with prior knowledge for the optimization of settings. This method uses an actor-critic algorithm for the optimization of the filling phase and the holding phase. For five different injection molded products, the filling phase and holding phase were adjusted with the above-mentioned method. The learning algorithm optimized the settings for one product (pre-learning) and used this acquired knowledge (prior knowledge) to optimize the injection molding settings for a new product (post- learning). This research shows that the method is able to optimize the injection molding parameters in a reasonable time when prior knowledge is derived from a product with a different material, gate design or even geometry. On average, less than 16 injection molding cycles were needed for the algorithm to optimize the filling phase and less than 10 cycles to optimize the holding phase. The presented method can greatly facilitate the development of self-adjusting injection molding machines.
Reinforcement learning (RL) applications require a huge effort to become established in real-world environments, due to the injury and break down risks during interactions between the RL agent and the environment, in ...
详细信息
Reinforcement learning (RL) applications require a huge effort to become established in real-world environments, due to the injury and break down risks during interactions between the RL agent and the environment, in the online training process. In addition, the RL platform tools (e.g., Python OpenAI's Gym, Unity ML-Agents, PyBullet, DART, MoJoCo, RaiSim, Isaac, and AirSim), that are required to reduce the real-world challenges, suffer from drawbacks (e.g., the limited number of examples and applications, and difficulties in implementation of the RL algorithms, due to difficulties with the programing language). This paper presents an integrated RL framework, based on Python-Unity interaction, to demonstrate the ability to create a new RL platform tool, based on making a stable user datagram protocol (UDP) communication between the RL agent algorithm (developed using the Python programing language as a server), and the simulation environment (created using the Unity simulation software as a client). This Python-Unity integration process, increases the advantage of the overall RL platform (i.e., flexibility, scalability, and robustness), with the ability to create different environment specifications. The challenge of RL algorithms' implementation and development is also achieved. The proposed framework is validated by applying two popular deep RL algorithms (i.e., Vanilla Policy Gradient (VPG) and actor-critic (A2C)), on an elevation control challenge for a quadcopter drone. The validation results for these experimental tests, prove the innovation of the proposed framework, to be used in RL applications, because both implemented algorithms achieve high stability, by achieving convergence to the required performance through the semi-online training process.
The manipulator control model has the characteristics of high-order,nonlinear,multivariable and strong coupling,which makes it difficult for the manipulator to have good adaptability and *** at the problem of poor reu...
详细信息
The manipulator control model has the characteristics of high-order,nonlinear,multivariable and strong coupling,which makes it difficult for the manipulator to have good adaptability and *** at the problem of poor reusability and poor autonomy of manipulator applications,a motion planning algorithm based on reinforcement learning is *** this paper,the reinforcement learning continuous control algorithmactor-critic is applied to the motion planning of the manipulator to increase the environmental applicability and autonomy of the manipulator,and realize the intelligent control of the manipulator under simple kinematics *** first,the simulation environment of the hand-eye system of the manipulator is constructed,then the reinforcement learning algorithm model is established according to the simulation environment,and finally,the motion planning training of the manipulator is completed in the simulation *** results demonstrate that the proposed manipulator motion planning algorithm based on actor-critic reinforcement learning has good environmental adaptability and stability.
We address the problem of sequentially selecting and observing processes from a given set to find the anomalies among them. The decision maker observes one process at a time and obtains a noisy binary indicator of whe...
详细信息
ISBN:
(纸本)9781728171227
We address the problem of sequentially selecting and observing processes from a given set to find the anomalies among them. The decision maker observes one process at a time and obtains a noisy binary indicator of whether or not the corresponding process is anomalous. In this setting, we develop an anomaly detection algorithm that chooses the process to be observed at a given time instant, decides when to stop taking observations, and makes a decision regarding the anomalous processes. The objective of the detection algorithm is to arrive at a decision with an accuracy exceeding a desired value while minimizing the delay in decision making. Our algorithm relies on a Markov decision process defined using the marginal probability of each process being normal or anomalous, conditioned on the observations. We implement the detection algorithm using the deep actor-critic reinforcement learning framework. Unlike prior work on this topic that has exponential complexity in the number of processes, our algorithm has computational and memory requirements that are both polynomial in the number of processes. We demonstrate the efficacy of our algorithm using numerical experiments by comparing it with the state-of-the-art methods.
暂无评论