Exploiting and sharing unlicensed spectrum resources among cellular and WiFi networks is critical for the fifth-generation (5G) and beyond networks due to the severe spectrum shortage and huge traffic demands. While d...
Exploiting and sharing unlicensed spectrum resources among cellular and WiFi networks is critical for the fifth-generation (5G) and beyond networks due to the severe spectrum shortage and huge traffic demands. While distributed consensus with blockchain has been considered to realize fair and efficient spectrum sharing, the existing mechanism is not adaptive to wireless network traffic with diverse QoS requirements in dynamic environments, which can result in significant consensus overhead and low levels of QoS. To tackle the above problems of static consensus adopted by the existing works, we propose a two-layer blockchain framework with intelligent consensus scheme for distributed spectrum sharing. Specifically, we proposed a two-layer blockchain architecture including a global blockchain and a local blockchain, and adopt a lightweight Proof of strategy (PoG) consensus mechanism. The local blockchain is dedicated to making spectrum allocation strategies, while the global blockchain is responsible for the management and coordination of the local blockchain. Deep reinforcementlearning model is designed for the global blockchain to learn the relationship between the consensus period of the local blockchain and the utilization of the allocated spectrum and maximize the throughput of local heterogeneous networks. Furthermore, we model and analyze the performance of PoG in complicated interference environments. The Lagrange method and the relaxation method are used to transform an NP-hard problem into a fractional programming problem that can be solved iteratively. Simulation results show that the proposed architecture and intelligent consensus mechanism can significantly improve system throughput and adapt to the dynamic environment with complicated interference.
Self-adaptive systems continuously adapt to internal and external changes in their execution environment. In context-based self-adaptation, adaptations take place in response to the characteristics of the execution en...
详细信息
ISBN:
(纸本)9781450379625
Self-adaptive systems continuously adapt to internal and external changes in their execution environment. In context-based self-adaptation, adaptations take place in response to the characteristics of the execution environment, captured as a context. However, in large-scale adaptive systems operating in dynamic environments, multiple contexts are often active at the same time, requiring simultaneous execution of multiple adaptations. Complex interactions between such adaptations might not have been foreseen or accounted for at design time. For example, adaptations can partially overlap, requiring only partial execution of each, or they can be conflicting, requiring some of the adaptations not to be executed at all, in order to preserve system execution. To ensure a correct composition of adaptations, we propose ComInA, a novel reinforcementlearning based approach, which autonomously learns interactions between adaptations as well as the most appropriate adaptation composition for each combination of active contexts, as they arise. We present an initial evaluation of ComInA in an urban public transport network simulation, where multiple adaptations to buses, routes, and stations are required. Early results show that ComInA correctly identifies whether adaptations are compatible or conflicting and learns to execute adaptations which maximize system performance. However, further investigation is needed into how best to utilize such identified relationships to optimize a wider range of metrics and utilize more complex composition strategies.
For the uncertain time-delay system, this paper investigates a novel robust adaptivedynamicprogramming (ADP) to guarantee the stability and performance of the system. By devising a novel cost function which integrat...
详细信息
The past decade has witnessed a surge in research activities related to adaptivedynamicprogramming (ADP) and reinforcementlearning (RL), particularly for control applications. Several books [item 1)–5) in the Appe...
详细信息
The past decade has witnessed a surge in research activities related to adaptivedynamicprogramming (ADP) and reinforcementlearning (RL), particularly for control applications. Several books [item 1)–5) in the Appendix] and survey papers [item 6)–10) in the Appendix] have been published on the subject. Both ADP and RL provide approximate solutions to dynamicprogramming problems. In a 1995 article by Barto et al. [item 11) in the Appendix], they introduced the so-called “adaptive real-time dynamicprogramming,” which was specifically to apply ADP for real-time control. Later, in 2002, Murray et al. [item 12) in the Appendix] developed an ADP algorithm for optimal control of continuous-time affine nonlinear systems. On the other hand, the most famous algorithms in RL are the temporal difference algorithm [item 13) in the Appendix] and the Q-learning algorithm [item 14) and 15) in the Appendix].
This letter provides an approximate online adaptive solution to the infinite-horizon optimal control problem for control-affine continuous-time nonlinear systems while formalizing system safety using barrier certifica...
详细信息
This letter provides an approximate online adaptive solution to the infinite-horizon optimal control problem for control-affine continuous-time nonlinear systems while formalizing system safety using barrier certificates. The use of a barrier function transform provides safety certificates to formalize system behavior. Specifically, using a barrier function, the system is transformed to aid in developing a controller which maintains the system in a pre-defined constrained region. To aid in online learning of the value function, the state-space is segmented into a number of user-defined segments. Off-policy trajectories are selected in each segment, and sparse Bellman error extrapolation is performed within each respective segment to generate an optimal policy within each segment. A Lyapunov-like stability analysis is included which proves uniformly ultimately bounded regulation in the presence of the barrier function transform and discontinuities. Simulation results are provided for a two-state dynamical system to compare the performance of the developed method to existing methods.
Parallel control theory can provide an effective solution for the control problem of complex system with unknown models and time-varying characteristics. The adaptivedynamicprogramming (ADP) method, which combines r...
详细信息
Parallel control theory can provide an effective solution for the control problem of complex system with unknown models and time-varying characteristics. The adaptivedynamicprogramming (ADP) method, which combines reinforcementlearning and dynamicprogramming algorithms, is the most advanced method for implementing parallel control theory. In this paper, we systematically review the ADP-based parallel control theory, as well as how it can be developed for underwater vehicles. First, the foundation and fundamental principles of parallel control are outlined in detail. Second, the ADP method under parallel control theory is presented, along with an overview of ADP method in the control of underwater vehicles. At last, we review the latest development and forecast the prospects of ADP-based underwater vehicle parallel control.
In this paper, we introduce a novel reinforcementlearning(RL) scheme for linear continuous-time dynamical systems. Different from traditional batch learning algorithms,an incremental learning approach is developed, w...
详细信息
In this paper, we introduce a novel reinforcementlearning(RL) scheme for linear continuous-time dynamical systems. Different from traditional batch learning algorithms,an incremental learning approach is developed, which provides a more efficient way to tackle the on-line learning problem in realworld applications. We provide concrete convergence and robust analysis on this incremental-learning algorithm. An extension to solving robust optimal control problems is also given. Two simulation examples are also given to illustrate the effectiveness of our theoretical result.
We provide new perspectives and inference algorithms for Maximum Entropy (MaxEnt) Inverse reinforcementlearning (IRL), which provides a principled method to find a most non-committal reward function consistent with g...
详细信息
ISBN:
(纸本)9781728125473
We provide new perspectives and inference algorithms for Maximum Entropy (MaxEnt) Inverse reinforcementlearning (IRL), which provides a principled method to find a most non-committal reward function consistent with given expert demonstrations, among many consistent reward functions. We first present a generalized MaxEnt formulation based on minimizing a KL-divergence instead of maximizing an entropy. This improves the previous heuristic derivation of the MaxEnt IRL model (for stochastic MDPs), allows a unified view of MaxEnt IRL and Relative Entropy IRL, and leads to a model-free learning algorithm for the MaxEnt IRL model. Second, a careful review of existing inference algorithms and implementations showed that they approximately compute the marginals required for learning the model. We provide examples to illustrate this, and present an efficient and exact inference algorithm. Our algorithm can handle variable length demonstrations;in addition, while a basic version takes time quadratic in the maximum demonstration length an improved version of this algorithm reduces this to linear using a padding trick. Experiments show that our exact algorithm improves reward learning as compared to the approximate ones. Furthermore, our algorithm scales up to a large, real-world dataset involving driver behaviour forecasting. We provide an optimized implementation compatible with the OpenAl Gym interface. Our new insight and algorithms could possibly lead to further interest and exploration of the original MaxEnt IRL model.
A connected space comprises embedded systems that are attached to the physical space and cloud systems through the Internet. Using the connected space, various services can be continuously provided. These services can...
详细信息
U This paper is concerned with an optimal coordination control problem for nonlinear multi-agent systems (MASs) with constraints of the control inputs. The idea of daptive dynamicprogramming (ADP) algorithm is to use...
详细信息
ISBN:
(纸本)9781728159225
U This paper is concerned with an optimal coordination control problem for nonlinear multi-agent systems (MASs) with constraints of the control inputs. The idea of daptive dynamicprogramming (ADP) algorithm is to use the policy iteration to solve the coupled Hamilton-Jacobi equations. First, a suitable non-quadratic functional is introduced into the cost function to transform the question into an optimization problem. Second, a distributed control law is designed for each agent, which aims that the cost function of the MASs converge to Nash equilibrium. Next, the analysis of the convergence is indicated that the iterative cost functions of nonlinear multi-agent systems is convergent. Neural network (NNs) are used to approximate the cost functions for the calculation of the control laws. Finally, simulation results show the effectiveness of the coordination control algorithm.
暂无评论