One of the security issues in a wireless network is jamming attacks, where the jammer causes congestion and significant decrement in the network throughput by obstructing channels and disrupting user signals. Deep rei...
One of the security issues in a wireless network is jamming attacks, where the jammer causes congestion and significant decrement in the network throughput by obstructing channels and disrupting user signals. Deep reinforcementlearning (DRL) models have been adopted to confront the jammer. However, training a DRL model from scratch may take a long time. In this paper, we propose a transfer learning (TL) approach to enable the DRL agent to learn fast in dynamic wireless networks to confront jamming attacks effectively. To make our proposed TL method adaptive to different network environments, we propose a novel method to quantitatively measure the difference between the source and target domains, using an integrated feature extractor. Based on the measured difference, we choose an optimal setting for the TL model. Experiment results show that the proposed TL method can effectively reduce the training time for the DRL model and outperforms other existing TL methods.
Meta-reinforcementlearning aims to overcome important limitations in reinforcementlearning, like low sample efficiency and poor generalization, by creating agents that adapt to new tasks. The development of intellig...
Meta-reinforcementlearning aims to overcome important limitations in reinforcementlearning, like low sample efficiency and poor generalization, by creating agents that adapt to new tasks. The development of intelligent robots would benefit from such agents. Long-standing issues like data collection and generalization to real-world dynamic environments could be mitigated by sample-efficient adaptable algorithms. However, most such algorithms have only been proven to work in low-complexity environments. These provide no guarantee that a near-optimal global policy does not exist, which makes it difficult to evaluate adaptable policies. This hinders the in-depth analysis of an agent's potential to adapt, while also introducing a gap between controlled experiments and real-world applications. We propose MEWA, a collection of task distributions used as a benchmark for adaptable agents. Our tasks contain a shared structure that an agent can leverage to learn the task-specific structure of new tasks. To ensure our environment is adaptive, we select some of the task parameters using the solution to a constrained optimization problem. Other parameters are randomized, allowing the creation of arbitrary task distributions. We evaluate three state-of-the-art meta-reinforcementlearning algorithms on our benchmark, that were previously shown to adapt to new tasks with a simpler structure. Results show that the algorithms can reach meaningful performance on the task, but cannot yet fully adapt to the task-specific structure. We believe this benchmark will help identify some of the issues that hinder adaptability, ultimately aiding in the design of new algorithms, more suitable for real-world human-robot applications.
Parallel control theory can provide an effective solution for the control problem of complex system with unknown models and time-varying characteristics. The adaptivedynamicprogramming (ADP) method, which combines r...
详细信息
In this paper, we develop a decentralized event-triggered control (ETC) strategy for a class of nonlinear systems with uncertain interconnections. To begin with, we show that the decentralized ETC policy for the whole...
详细信息
In this paper, we develop a decentralized event-triggered control (ETC) strategy for a class of nonlinear systems with uncertain interconnections. To begin with, we show that the decentralized ETC policy for the whole system can be represented by a group of optimal ETC laws of auxiliary subsystems. Then, under the framework of adaptive critic learning, we construct the critic networks to solve the event-triggered Hamilton-Jacobi-Bellman equations related to these optimal ETC laws. The weight vectors used in the critic networks are updated by using the gradient descent approach and the experience replay (ER) technique together. With the aid of the ER technique, we can conquer the difficulty arising in the persistence of excitation condition. Meanwhile, by using classic Lyapunov approaches, we prove that the estimated weight vectors used in the critic networks are uniformly ultimately bounded. Moreover, we demonstrate that the obtained decentralized ETC can force the overall system to be asymptotically stable. Finally, we present an interconnected nonlinear plant to validate the proposed decentralized ETC scheme.
This paper develops an event-triggered decentralized tracking control (DTC) approach for modular reconfigurable robots (MRRs) by using adaptivedynamicprogramming. By establishing a decentralized neural network (NN) ...
详细信息
This paper develops an event-triggered decentralized tracking control (DTC) approach for modular reconfigurable robots (MRRs) by using adaptivedynamicprogramming. By establishing a decentralized neural network (NN) observer, which uses local input-output data and desired states of coupling subsystems, the local dynamics of MRR subsystem can be obtained. In order to obtain the DTC, the tracking error subsystem is augmented by the exosystem with the desired trajectory. Based on the event-triggered mechanism and a modified local cost function, the DTC is derived by solving the local Hamilton-Jacobi-Bellman equation via a local critic NN with asymptotically stable structure. The stability of the entire closed-loop MRR system is analyzed by Lyapunovs direct method. The simulation of a two-degree of freedom MRR system ensures that the developed event-triggered DTC scheme is effective.
The Stackelberg game is adopted in many robotics applications. It features a dynamic multi-player setup based on a leader-follower structure. The main challenge involves implementing model-free strategies that can eff...
详细信息
ISBN:
(数字)9798350362367
ISBN:
(纸本)9798350362374
The Stackelberg game is adopted in many robotics applications. It features a dynamic multi-player setup based on a leader-follower structure. The main challenge involves implementing model-free strategies that can effectively respond to unstructured environments in a data-driven manner. This paper presents a model-free method for solving the Stackelberg game in real-time, wherein the follower's strategy assumes knowledge of the leader's tactics. Moreover, the strategies are implemented in real-time without knowledge of the players' dynamics. The optimization goals are expressed through coupled Bellman optimality equations, highlighting the dependency between leader and follower strategies. The method utilizes a linear adaptive critics framework, where the actor-critic weights are adjusted using a projection method to ensure stability and convergence. This approach is evaluated on systems with delays and unstructured disturbances to demonstrate its robustness.
In recent years deep neural networks have achieved state-of-the-art accuracy at classifying the running state of a robot. Yet we propose a composite learning model (CLM) that combines the strength of broad learning an...
详细信息
ISBN:
(纸本)9789881563804
In recent years deep neural networks have achieved state-of-the-art accuracy at classifying the running state of a robot. Yet we propose a composite learning model (CLM) that combines the strength of broad learning and conventional deep learning techniques to identify the fault types of underactuated surface vessels (USV). Considering the measurement noises in training and testing data, we develop a deep sparse auto-encoder (DSAE) stacked by denoising auto-encoder (DAE) and contractive auto-encoders (CAEs). To further reduce the computation time, a modified broad learning system (BLS) based classifier is developed, and the input layer receives the signal from the top layer of DSAE. We use the output of the classifier as feedback. Meanwhile value iterative (VI) based adaptivedynamicprogramming (ADP) is employed to calculate the near-optimal increment of connection weight. Finally, we validate the developed approach by experiments using simulation data of USV that compares the proposed CLM with the standard BLS and conventional deep learning methods.
In this paper, a safe adaptivedynamicprogramming (SADP) method based on the barrier function (BF) is proposed for the optimal control problem of nonlinear safety-critical systems with the safety constraints and exte...
详细信息
Power-aware traffic engineering via coordinated sleeping is usually formulated into Integer programming problems, which are generally NP-hard with unbounded computation time for large-scale networks. This results in d...
详细信息
ISBN:
(纸本)9781665414944
Power-aware traffic engineering via coordinated sleeping is usually formulated into Integer programming problems, which are generally NP-hard with unbounded computation time for large-scale networks. This results in delayed control decision making in dynamic network environments. Motivated by advances in deep reinforcementlearning, we consider building intelligent systems that learn to adaptively change router/switch's power state according to changing network conditions. Neural network's forward propagation can greatly speed up power on/off decision making. Generally, conducting RL requires a learning agent to iteratively explore and perform the "good" actions based on the feedback from the environment. By coupling Software-Defined Networking for performing centrally calculated actions to the environment and In-band Network Telemetry for collecting feedback from the environment, we develop ***, a closed-loop control/training system to automate power-aware traffic engineering. Furthermore, we propose novel techniques to enhance the learning ability and reduce the learning complexity. With both energy efficiency and traffic load balancing considered, *** can generate reasonable power saving actions within 276ms under a network testbed of 11 software P4 switches.
Traffic Engineering (TE) has been applied to optimize network performance by routing/rerouting flows based on traffic loads and network topologies. To cope with network dynamics from emerging applications, it is essen...
详细信息
ISBN:
(纸本)9781665414944
Traffic Engineering (TE) has been applied to optimize network performance by routing/rerouting flows based on traffic loads and network topologies. To cope with network dynamics from emerging applications, it is essential to reroute flows more frequently than today's TE to maintain network performance. However, existing TE solutions may introduce considerable Quality of Service (QoS) degradation and service disruption since they do not take the potential negative impact of flow rerouting into account. In this paper, we apply a new QoS metric named network disturbance to gauge the impact of flow rerouting while optimizing network load balancing in backbone networks. To employ this metric in TE design, we propose a disturbance-aware TE called DATE, which uses reinforcementlearning (RL) to intelligently select some critical flows between nodes for each traffic matrix and reroute them using Linear programming (LP) to jointly optimize network performance and disturbance. DATE is equipped with a customized actor-critic architecture and Graph Neural Networks (GNNs) to handle dynamic traffic and single link failures. Extensive evaluations show that DATE can outperform state-of-the-art TE methods with close-to-optimal load balancing performance while effectively mitigating the 99th percentile network disturbance by up to 31.6%.
暂无评论