In recent years, deep graph neural networks (GNNs) have been used as solvers or helper functions for the traveling salesman problem (TSP), but they are usually used as encoders to generate static node representations ...
详细信息
In recent years, deep graph neural networks (GNNs) have been used as solvers or helper functions for the traveling salesman problem (TSP), but they are usually used as encoders to generate static node representations for downstream tasks and are incapable of obtaining the dynamic permutational information in completely updating solutions. For addressing this problem, we propose a permutational encoding graph attention encoder and attention-based decoder (PEG2A) model for the TSP that is trained by the advantage actor-critic algorithm. In this work, the permutational encoding graph attention (PEGAT) network is designed to encode node embeddings for gathering information from neighbors and obtaining the dynamic graph permutational information simultaneously. The attention-based decoder is tailored to compute probability distributions over picking pair nodes for 2-opt moves. The experimental results show that our method outperforms the compared learning-based algorithms and traditional heuristic methods.
In the pursuit of ubiquitous broadband connectivity, there has been a significant shift towards the vertical expansion of communication networks into space, particularly through the exploitation of low Earth orbit (LE...
详细信息
In the pursuit of ubiquitous broadband connectivity, there has been a significant shift towards the vertical expansion of communication networks into space, particularly through the exploitation of low Earth orbit (LEO) satellite constellations, which are favored for their relatively low latency. However, this approach faces many challenges that need to be addressed, including atmospheric turbulence, high path loss, and dynamic cloud formations. High-altitude pseudo-satellites (HAPS) have emerged as promising relaying layers between LEO satellites and ground stations, enhancing coverage, latency, and direct terrestrial user connectivity. While radio frequency (RF) bands suffer from congestion and limited bandwidth, free space optical (FSO) communications offer higher data rates, but are susceptible to misalignment and weather-induced signal degradation. To address these challenges, a hybrid RF/FSO approach has been proposed to take advantage of both technologies by dynamic switching between RF and FSO based on propagation channel conditions. This paper introduces a reinforcement learning-based algorithm designed to optimize the trajectory of HAPS, maneuver around cloudy areas, and seamlessly switch between the RF and FSO communication modes to maximize the achievable capacity. The proposed approach aims to maximize system performance by intelligently adapting to environmental conditions and offering a promising solution for next-generation space communication networks.
This article presents a comprehensive approach to integrate formation tracking control and optimal control for a fleet of multiple surface vehicles (SVs), accounting for both kinematic and dynamic models of each SV ag...
详细信息
This article presents a comprehensive approach to integrate formation tracking control and optimal control for a fleet of multiple surface vehicles (SVs), accounting for both kinematic and dynamic models of each SV agent. The proposed control framework comprises two core components: a high-level displacement-based formation controller and a low-level reinforcement learning (RL)-based optimal control strategy for individual SV agents. The high-level formation control law, employing a modified gradient method, is introduced to guide the SVs in achieving desired formations. Meanwhile, the low-level control structure, featuring time-varying references, incorporates the RL algorithm by transforming the time-varying closed agent system into an equivalent autonomous system. The application of Lyapunov's direct approach, along with the existence of the Bellman function, guarantees the stability and optimality of the proposed design. Through extensive numerical simulations, encompassing various comparisons and scenarios, this study demonstrates the efficacy of the novel formation control strategy for multiple SV agent systems, showcasing its potential for real-world applications.
Uncertainty in the evolution of opponent behavior creates a non-stationary environment for the agent, reducing the reliability of value estimation and strategy selection while compromising security during the explorat...
详细信息
Uncertainty in the evolution of opponent behavior creates a non-stationary environment for the agent, reducing the reliability of value estimation and strategy selection while compromising security during the exploration process. Previous studies have developed various uncertainty quantification techniques and designed uncertainty-aware exploration methods for multi-agent reinforcement learning (MARL). However, existing methods have gaps in theoretical research and experimental verification of decoupling uncertainty between opponents and environment, which can decrease learning efficiency and lead to an unstable training process. Due to inaccurate opponent modeling, the agent is vulnerable to harm from opponents, which is undesirable in real-world tasks. To address these issues, this study proposes a novel uncertainty-guided safe exploration strategy for MARL that decouples the two types of uncertainty originating from the environment and opponents. Specifically, we introduce an uncertainty decoupling quantification technique based on a novel variance decomposition method for action-value functions. Furthermore, we present an uncertainty-aware policy optimization mechanism to facilitate safe exploration in MARL. Finally, we propose a new adaptive parameter scaling method to ensure efficient exploration by the agents. Theoretical analysis establishes the proposed approach's convergence rate, and its effectiveness is demonstrated empirically. Extensive experiments on benchmark tasks spanning differential games, multi-agent particle environments, and RoboSumo validate the proposed uncertainty-guided method's significant advantages in attaining higher scores and facilitating safe agent exploration.
This paper introduces a novel approach aimed at enhancing the control performance of a specific class of unknown multiple-input and multiple-output nonlinear systems. The proposed method involves the utilization of a ...
详细信息
This paper introduces a novel approach aimed at enhancing the control performance of a specific class of unknown multiple-input and multiple-output nonlinear systems. The proposed method involves the utilization of a fractional-order fuzzy sliding mode controller, which is implemented through online fractional-order reinforcement learning (FOFSMC-FRL). First, the proposed approach leverages two Takagi-Sugeno-Kang (TSK) fuzzy neural network actors. These actors approximate both the equivalent and switch control parts of the sliding mode control. Additionally, a critic TSK fuzzy neural network is employed to approximate the value function of the reinforcement learning process. Second, the FOFSMC-FRL parameters undergo online adaptation using an innovative fractional-order Levenberg-Marquardt learning method. This adaptive mechanism allows the controller to continuously update its parameters based on the system's behavior, optimizing its control strategy accordingly. Third, the stability and convergence of the proposed approach are rigorously examined using Lyapunov theorem. Notably, the proposed structure offers several key advantages as it does not depend on knowledge of the system dynamics, uncertainty bounds, or disturbance characteristics. Moreover, the chattering phenomenon, often associated with sliding mode control, is effectively eliminated without compromising the system's robustness. Finally, a comparative simulation study is conducted to demonstrate the feasibility and superiority of the proposed method over other control methods. Through this comparison, the effectiveness and performance advantages of the approach are validated.
We develop an approach for solving time-consistent risk-sensitive stochastic optimization problems using model-free reinforcement learning (RL). Specifically, we assume agents assess the risk of a sequence of random v...
详细信息
We develop an approach for solving time-consistent risk-sensitive stochastic optimization problems using model-free reinforcement learning (RL). Specifically, we assume agents assess the risk of a sequence of random variables using dynamic convex risk measures. We employ a time-consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules that aid in obtaining optimal policies. We further develop an actor-critic style algorithm using neural networks to optimize over policies. Finally, we demonstrate the performance and flexibility of our approach by applying it to three optimization problems: statistical arbitrage trading strategies, financial hedging, and obstacle avoidance robot control.
The potential for the use of drones in logistics and transportation is continuously growing, with multiple applications both in urban and rural environments. The safe navigation of drones in such environments is a maj...
详细信息
The potential for the use of drones in logistics and transportation is continuously growing, with multiple applications both in urban and rural environments. The safe navigation of drones in such environments is a major challenge that requires sophisticated algorithms and systems, that can quickly and efficiently estimate the situation, find the shortest path to the target, and detect and avoid obstacles. Traditional path planning algorithms are unable to handle the dynamic and uncertain nature of real environments, while traditional machine learning models are insufficient due to the constantly changing conditions that affect the drone location and the location of obstacles. Reinforcement learning (RL) algorithms have been widely used for autonomous navigation problems, however, computational complexity and energy demands of such methods can become a bottleneck to the execution of UAV flights. In this paper, we propose the use of a minimum set of sensors and reinforcement learning (RL) algorithms for the safe and efficient navigation of drones in urban and rural environments. Our approach considers the complex and dynamic nature of such environments by incorporating real-time data from low-cost onboard sensors. After performing a thorough review of the existing solutions in drone path planning and navigation in 3-D environments, we experimentally evaluate our proposed approach in a simulated environment, and in various scenarios. The test results demonstrate the effectiveness of the proposed RL-based approach in the navigation of drones in complex and unconstrained environments. The implemented approach can serve as a basis for the development of advanced and robust navigation systems for drones, which can improve safety and efficiency in transportation applications in the near future.
Model-free reinforcement learning algorithms based on entropy regularized have achieved good performance in control tasks. Those algorithms consider using the entropy-regularized term for the policy to learn a stochas...
详细信息
Model-free reinforcement learning algorithms based on entropy regularized have achieved good performance in control tasks. Those algorithms consider using the entropy-regularized term for the policy to learn a stochastic policy. This work provides a new perspective that aims to explicitly learn a representation of intrinsic information in state transition to obtain a multimodal stochastic policy, for dealing with the tradeoff between exploration and exploitation. We study a class of Markov decision processes (MDPs) with divergence maximization, called divergence MDPs. The goal of the divergence MDPs is to find an optimal stochastic policy that maximizes the sum of both the expected discounted total rewards and a divergence term, where the divergence function learns the implicit information of state transition. Thus, it can provide better-off stochastic policies to improve both in robustness and performance in a high-dimension continuous setting. Under this framework, the optimality equations can be obtained, and then a divergence actor-critic algorithm is developed based on the divergence policy iteration method to address large-scale continuous problems. The experimental results, compared to other methods, show that our approach achieved better performance and robustness in the complex environment particularly. The code of DivAC can be found in https://***/yzyvl/DivAC.
This paper investigates a model-free reinforcement learning-based approach that enables the quadruped robot to manipulate objects while maintaining its balance and dynamic stability during walking. At first, the dynam...
详细信息
This paper investigates a model-free reinforcement learning-based approach that enables the quadruped robot to manipulate objects while maintaining its balance and dynamic stability during walking. At first, the dynamics of quadruped robots in two sub-spaces, position control space and force control space, are developed. Then, a new long-term performance index is introduced, and a radial basis function neural network as a critic network is presented to estimate the unobtainable long-term performance index. Based on the exported reinforcement signal, the actor neural network is introduced to generate the feedforward compensation term to cope with the nonlinear dynamics and the system uncertainties. The robustness of the actor-critic reinforcement learning algorithm is enhanced by using a fractional-order sliding-mode controller in the closed-loop system. The online adaptive laws for both the critic and actor-network weights are obtained using the Lyapunov stability theory. As a result, the uniformly ultimately boundedness of the position and the force tracking errors are proven. Finally, numerical simulations are conducted to illustrate the feasibility and effectiveness of the proposed adaptive actor-critic learning-based control scheme.
We address the problem of sequentially selecting and observing processes from a given set to find the anomalies among them. The decision-maker observes a subset of the processes at any given time instant and obtains a...
详细信息
We address the problem of sequentially selecting and observing processes from a given set to find the anomalies among them. The decision-maker observes a subset of the processes at any given time instant and obtains a noisy binary indicator of whether or not the corresponding process is anomalous. We develop an anomaly detection algorithm that chooses the processes to be observed at a given time instant, decides when to stop taking observations, and declares the decision on anomalous processes. The objective of the detection algorithm is to identify the anomalies with an accuracy exceeding the desired value while minimizing the delay in decision making. We devise a centralized algorithm where the processes are jointly selected by a common agent as well as a decentralized algorithm where the decision of whether to select a process is made independently for each process. Our algorithms rely on a Markov decision process defined using the marginal probability of each process being normal or anomalous, conditioned on the observations. We implement the detection algorithms using the deep actor-critic reinforcement learning framework. Unlike prior work on this topic that has exponential complexity in the number of processes, our algorithms have computational and memory requirements that are both polynomial in the number of processes. We demonstrate the efficacy of these algorithms using numerical experiments by comparing them with state-of-the-art methods.
暂无评论