The rapid growth of the Internet of Things (IoT) networks has led to the deployment of large-scale networks, enabling seamless connectivity and data exchange among various devices. To manage the complexity and ensure ...
详细信息
Controlling 6 Degrees-of-Freedom (DoF) robotic manipulators in an online, model-free manner poses significant challenges due to their complex coupling, non-linearities, and the need to account for unmodeled dynamics. ...
详细信息
Irregularly structured registers are hard to abstract and allocate. Partitioned Boolean quadratic programming (PBQP) is a useful abstraction to represent complex register constraints, even those in highly irregular pr...
详细信息
ISBN:
(纸本)9781665405843
Irregularly structured registers are hard to abstract and allocate. Partitioned Boolean quadratic programming (PBQP) is a useful abstraction to represent complex register constraints, even those in highly irregular processors of automated test equipment (ATE) of DRAM memory chips. The PBQP problem is NP-hard, requiring a heuristic solution. If no spill is allowed as in ATE, however, we have to enumerate more to find a solution rather than to approximate, since a spill means a total compilation failure. We propose solving the PBQP problem with deep reinforcementlearning (Deep-RL), more specifically, a model-based approach using Monte Carlo tree search and deep neural network as used in Alphazero, a proven Deep-RL technology. Through elaborate training with random PBQP graphs, our Deep-RL solver could cut the search space sharply, making an enumeration-based solution more affordable. Furthermore, by employing backtracking with a proper coloring order, Deep-RL can find a solution with modestly-trained neural networks with even less search space. Our experiments show that Deep-RL can successfully find a solution for 10 product-level ATE programs while searching much fewer (e.g., 1/3,500) states than the previous PBQP enumeration solver. Also, when applied to C programs in Byrn-test-suite for regular CPUs, it achieves a competitive performance to the existing PBQP register allocator in LLVM.
This paper studies the application of Lookup-Table reinforcementlearning method into the continuous state space control of quadrotor simulator and designs a attitude controller for the quadrotor simulator based on Q-...
详细信息
Multi-robot task allocation has an important impact on the efficiency of multi-robot collaboration. For single-shot allocation without complicated constraints, some exact algorithms and heuristic algorithms can find t...
详细信息
The pursuit-evasion game of non-cooperative spacecrafts under nonlinear dynamics is currently a hot topic in orbital gaming. We describe the above pursuit-evasion game model using differential game theory, transformin...
详细信息
ISBN:
(数字)9798331506056
ISBN:
(纸本)9798331506063
The pursuit-evasion game of non-cooperative spacecrafts under nonlinear dynamics is currently a hot topic in orbital gaming. We describe the above pursuit-evasion game model using differential game theory, transforming the gaming problem into a bilateral optimal control problem. Using elliptical orbit line-of-sight (LOS) dynamics with simple field-of-view constrains as the system model, we solve the Nash equilibrium solution for the two-body pursuit-evasion under the assumption of complete information. Due to the difficulty in obtaining analytical solutions for the Nash equilibrium, we adopt a reinforcementlearning (RL)-based adaptive dynamicprogramming method. We obtain the approximate Nash equilibrium solution with RL method eventually and provide a successful simulation example.
With the increasing penetration of renewable energy and electric vehicles (EVs), the behavior of EVs' charging and discharging has shown great impact on the Micro Grid power load, motivating the development of Veh...
详细信息
The proceedings contain 237 papers. The topics discussed include: robust device position and pose detection using visible light without model knowledge: a branch-structured residual learning method;access point cluste...
ISBN:
(纸本)9781665480536
The proceedings contain 237 papers. The topics discussed include: robust device position and pose detection using visible light without model knowledge: a branch-structured residual learning method;access point clustering in cell-free massive MIMO using multi-agent reinforcementlearning;joint data and model driven channel-free signal detection based learned factor graph;sum-rate maximization in RIS-aided wireless-powered D2D communication networks;physical layer security in spherical-wave channel using massive MIMO;spatially-coupled faster-than-Nyquist signaling;indoor localization with CSI fingerprint utilizing depthwise separable convolution neural network;a hybrid machine learning based model for congestion prediction in mobile networks;mobile traffic forecasting for network slices: a federated-learning approach;a vector-based dynamicprogramming approach for small cell placement in dense urban;spatiotemporal graph attention networks for urban traffic flow prediction;deep learning based minimum length scheduling for half duplex wireless powered communication networks;on the effectiveness of semantic addressing for wake-up radio-enabled wireless sensor networks;and physical layer authentication based on continuous channel polarization response in low snr scenes.
We study automated intrusion prevention using reinforcementlearning. In a novel approach, we formulate the problem of intrusion prevention as an optimal stopping problem. This formulation allows us insight into the s...
详细信息
ISBN:
(纸本)9783903176362
We study automated intrusion prevention using reinforcementlearning. In a novel approach, we formulate the problem of intrusion prevention as an optimal stopping problem. This formulation allows us insight into the structure of the optimal policies, which turn out to be threshold based. Since the computation of the optimal defender policy using dynamicprogramming is not feasible for practical cases, we approximate the optimal policy through reinforcementlearning in a simulation environment. To define the dynamics of the simulation, we emulate the target infrastructure and collect measurements. Our evaluations show that the learned policies are close to optimal and that they indeed can be expressed using thresholds.
Optimal LQR feedback gains can be learned using reinforcementlearning (RL) framework for systems with unknown dynamics using policy iteration methods. However, policy iteration in the case of inherently unstable syst...
Optimal LQR feedback gains can be learned using reinforcementlearning (RL) framework for systems with unknown dynamics using policy iteration methods. However, policy iteration in the case of inherently unstable systems becomes challenging. In this study we establish reinforcementlearning of optimal feedback gains in the case of a nonlinear double inverted-pendulum (DIP) biomechanical model. Using an admissible initial policy, the biomechanical model was simulated in MATLAB and trajectory data were recorded. The state variables were transformed to quadratic basis function and used in approximatedynamicprogramming (ADP) to learn the solution to the algebraic Riccati equation (ARE) underlying the LQR problem. The RL results obtained in the case of an inherently unstable DIP system indicate relatively fast convergence and demonstrate the potential to apply RL techniques to more complex systems.
暂无评论