检索结果-内蒙古大学图书馆

International Conference on Communication Technology (ICCT)

作者： Hanwen Zhang Supeng Leng Pengcheng Zhao Jianhua He School of Information and Communication Enginering University of Electronic Science and Technology of China Chengdu China School of Computer Science and Electronic Engineering University of Essex UK

Exploiting and sharing unlicensed spectrum resources among cellular and WiFi networks is critical for the fifth-generation (5G) and beyond networks due to the severe spectrum shortage and huge traffic demands. While distributed consensus with blockchain has been considered to realize fair and efficient spectrum sharing, the existing mechanism is not adaptive to wireless network traffic with diverse QoS requirements in dynamic environments, which can result in significant consensus overhead and low levels of QoS. To tackle the above problems of static consensus adopted by the existing works, we propose a two-layer blockchain framework with intelligent consensus scheme for distributed spectrum sharing. Specifically, we proposed a two-layer blockchain architecture including a global blockchain and a local blockchain, and adopt a lightweight Proof of strategy (PoG) consensus mechanism. The local blockchain is dedicated to making spectrum allocation strategies, while the global blockchain is responsible for the management and coordination of the local blockchain. Deep reinforcement learning model is designed for the global blockchain to learn the relationship between the consensus period of the local blockchain and the utilization of the allocated spectrum and maximize the throughput of local heterogeneous networks. Furthermore, we model and analyze the performance of PoG in complicated interference environments. The Lagrange method and the relaxation method are used to transform an NP-hard problem into a fractional programming problem that can be solved iteratively. Simulation results show that the proposed architecture and intelligent consensus mechanism can significantly improve system throughput and adapt to the dynamic environment with complicated interference.

关键词：

来源：评论

学校读者我要写书评

暂无评论

learning Run-time Compositions of Interacting Adaptations 15

Learning Run-time Compositions of Interacting Adaptations

引用

ieee/ACM 15th International symposium on Software Engineering for adaptive and Self-Managing Systems (SEAMS)

作者： Cardozo, Nicolas Dusparic, Ivana Univ Los Andes Syst & Comp Engn Dept Bogota Colombia Trinity Coll Dublin Sch Comp Sci & Stat Dublin Ireland

ISBN: (纸本)9781450379625

Self-adaptive systems continuously adapt to internal and external changes in their execution environment. In context-based self-adaptation, adaptations take place in response to the characteristics of the execution environment, captured as a context. However, in large-scale adaptive systems operating in dynamic environments, multiple contexts are often active at the same time, requiring simultaneous execution of multiple adaptations. Complex interactions between such adaptations might not have been foreseen or accounted for at design time. For example, adaptations can partially overlap, requiring only partial execution of each, or they can be conflicting, requiring some of the adaptations not to be executed at all, in order to preserve system execution. To ensure a correct composition of adaptations, we propose ComInA, a novel reinforcement learning based approach, which autonomously learns interactions between adaptations as well as the most appropriate adaptation composition for each combination of active contexts, as they arise. We present an initial evaluation of ComInA in an urban public transport network simulation, where multiple adaptations to buses, routes, and stations are required. Early results show that ComInA correctly identifies whether adaptations are compatible or conflicting and learns to execute adaptations which maximize system performance. However, further investigation is needed into how best to utilize such identified relationships to optimize a wider range of metrics and utilize more complex composition strategies.

关键词： dynamic software composition reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

adaptive dynamic programming Based on Multi-dimensional Taylor Network for Time-Delay Nonlinear System with Uncertainties 2

Adaptive Dynamic Programming Based on Multi-dimensional Tayl...

引用

2nd World symposium on Artificial Intelligence, WSAI 2020

作者： Duan, Zheng-Yi Yan, Hong-Sen Southeast University School of Automation Nanjing Jiangsu210096 China

ISBN: (纸本)9781728167794

For the uncertain time-delay system, this paper investigates a novel robust adaptive dynamic programming (ADP) to guarantee the stability and performance of the system. By devising a novel cost function which integrates the effects of time delay and uncertainties, the uncertain time-delay system is transformed into a control problem of the nominal system through ADP. By applying the control policy designed for the nominal system, the uncertain system is guaranteed to be asymp-totically stable. Besides, in order to increase the computation efficiency, multi-dimensional Taylor network is utilized as the approximating architecture to estimate the optimal value function and optimal control. A simulation example is provided to verify the effectiveness of the presented control approach. © 2020 ieee.

关键词： reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Editorial Special Issue on adaptive dynamic programming and reinforcement learning

引用

ieee Transactions on Systems, Man, and Cybernetics: Systems 2020年第11期50卷 3944-3947页

作者： Liu, Derong Lewis, Frank L. Wei, Qinglai School of Automation Guangdong University of Technology Guangzhou510006 China Uta Research Institute University of Texas at Arlington Fort WorthTX76118 United States State Key Laboratory of Management and Control for Complex Systems Istitute of Automation Chinese Academy of Sciences Beijing100190 China University of Chinese Academy of Sciences Beijing100049 China

The past decade has witnessed a surge in research activities related to adaptive dynamic programming (ADP) and reinforcement learning (RL), particularly for control applications. Several books [item 1)–5) in the Appendix] and survey papers [item 6)–10) in the Appendix] have been published on the subject. Both ADP and RL provide approximate solutions to dynamic programming problems. In a 1995 article by Barto et al. [item 11) in the Appendix], they introduced the so-called “adaptive real-time dynamic programming,” which was specifically to apply ADP for real-time control. Later, in 2002, Murray et al. [item 12) in the Appendix] developed an ADP algorithm for optimal control of continuous-time affine nonlinear systems. On the other hand, the most famous algorithms in RL are the temporal difference algorithm [item 13) in the Appendix] and the Q-learning algorithm [item 14) and 15) in the Appendix].

关键词： Special issues and sections reinforcement learning learning systems Control systems dynamic programming Real-time systems Optimal control

来源：评论

学校读者我要写书评

暂无评论

Sparse learning-Based Approximate dynamic programming With Barrier Constraints

引用

ieee CONTROL SYSTEMS LETTERS 2020年第3期4卷 743-748页

作者： Greene, Max L. Deptula, Patryk Nivison, Scott Dixon, Warren E. Univ Florida Dept Mech & Aerosp Engn Gainesville FL 32611 USA Charles Stark Draper Lab Inc Percept & Auton Grp Cambridge MA 02139 USA Air Force Res Lab Munit Directorate Eglin AFB FL 32542 USA

This letter provides an approximate online adaptive solution to the infinite-horizon optimal control problem for control-affine continuous-time nonlinear systems while formalizing system safety using barrier certificates. The use of a barrier function transform provides safety certificates to formalize system behavior. Specifically, using a barrier function, the system is transformed to aid in developing a controller which maintains the system in a pre-defined constrained region. To aid in online learning of the value function, the state-space is segmented into a number of user-defined segments. Off-policy trajectories are selected in each segment, and sparse Bellman error extrapolation is performed within each respective segment to generate an optimal policy within each segment. A Lyapunov-like stability analysis is included which proves uniformly ultimately bounded regulation in the presence of the barrier function transform and discontinuities. Simulation results are provided for a two-state dynamical system to compare the performance of the developed method to existing methods.

关键词： Extrapolation Safety Optimal control Function approximation Symmetric matrices Transforms Trajectory Data-based control dynamic programming nonlinear control optimal control reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

adaptive dynamic programming Based on Parallel Control Theory for Underwater Vehicles

Adaptive Dynamic Programming Based on Parallel Control Theor...

引用

Digital Twins and Parallel Intelligence (DTPI), ieee International Conference on

作者： Peng Bo Xingbin Tu Fengzhong Qu Fei-Yue Wang Key Laboratory of Ocean Observation-Imaging Testbed of Zhejiang Province Zhejiang University Zhoushan China Institute of Automation Chinese Academy of Sciences Beijing China

Parallel control theory can provide an effective solution for the control problem of complex system with unknown models and time-varying characteristics. The adaptive dynamic programming (ADP) method, which combines reinforcement learning and dynamic programming algorithms, is the most advanced method for implementing parallel control theory. In this paper, we systematically review the ADP-based parallel control theory, as well as how it can be developed for underwater vehicles. First, the foundation and fundamental principles of parallel control are outlined in detail. Second, the ADP method under parallel control theory is presented, along with an overview of ADP method in the control of underwater vehicles. At last, we review the latest development and forecast the prospects of ADP-based underwater vehicle parallel control.

关键词： Adaptation models Heuristic algorithms Digital twin Conferences reinforcement learning Control theory dynamic programming

来源：评论

学校读者我要写书评

暂无评论

reinforcement learning for Linear Continuous-time Systems: an Incremental learning Approach

引用

ieee/CAA Journal of Automatica Sinica 2019年第2期6卷 433-440页

作者： Tao Bian Zhong-Ping Jiang Bank of America Merrill Lynch IEEE the Control and Networks Lab Department of Electrical and Computer Engineering Tandon School of Engineering New York University

In this paper, we introduce a novel reinforcement learning(RL) scheme for linear continuous-time dynamical systems. Different from traditional batch learning algorithms,an incremental learning approach is developed, which provides a more efficient way to tackle the on-line learning problem in realworld applications. We provide concrete convergence and robust analysis on this incremental-learning algorithm. An extension to solving robust optimal control problems is also given. Two simulation examples are also given to illustrate the effectiveness of our theoretical result.

关键词： adaptive optimal control robust dynamic programming value iteration(Ⅵ)

来源：评论

学校读者我要写书评

暂无评论

Revisiting Maximum Entropy Inverse reinforcement learning: New Perspectives and Algorithm

Revisiting Maximum Entropy Inverse Reinforcement Learning: N...

引用

ieee symposium Series on Computational Intelligence (ieee SSCI)

作者： Snoswell, Aaron J. Singh, Surya P. N. Ye, Nan Univ Queensland Sch Informat Technol & Elect Engn Brisbane Qld Australia Intuit Surg Sunnyvale CA USA Univ Queensland Sch Math & Phys Brisbane Qld Australia

ISBN: (纸本)9781728125473

We provide new perspectives and inference algorithms for Maximum Entropy (MaxEnt) Inverse reinforcement learning (IRL), which provides a principled method to find a most non-committal reward function consistent with given expert demonstrations, among many consistent reward functions. We first present a generalized MaxEnt formulation based on minimizing a KL-divergence instead of maximizing an entropy. This improves the previous heuristic derivation of the MaxEnt IRL model (for stochastic MDPs), allows a unified view of MaxEnt IRL and Relative Entropy IRL, and leads to a model-free learning algorithm for the MaxEnt IRL model. Second, a careful review of existing inference algorithms and implementations showed that they approximately compute the marginals required for learning the model. We provide examples to illustrate this, and present an efficient and exact inference algorithm. Our algorithm can handle variable length demonstrations;in addition, while a basic version takes time quadratic in the maximum demonstration length an improved version of this algorithm reduces this to linear using a padding trick. Experiments show that our exact algorithm improves reward learning as compared to the approximate ones. Furthermore, our algorithm scales up to a large, real-world dataset involving driver behaviour forecasting. We provide an optimized implementation compatible with the OpenAl Gym interface. Our new insight and algorithms could possibly lead to further interest and exploration of the original MaxEnt IRL model.

关键词： Inverse reinforcement learning reinforcement learning Maximum Entropy dynamic programming Algorithms

来源：评论

学校读者我要写书评

暂无评论

Toward autonomous adaptive embedded systems for sustainable services using reinforcement learning (WiP report) 8

Toward autonomous adaptive embedded systems for sustainable ...

引用

8th International symposium on Computing and Networking Workshops, CANDARW 2020

作者： Nakamoto, Yukikazu Kumalija, Elhard Zhang, Menglei University of Hyogo Graduate School of Applied Informatics Kobe Japan

ISBN: (纸本)9781728199191

A connected space comprises embedded systems that are attached to the physical space and cloud systems through the Internet. Using the connected space, various services can be continuously provided. These services can be dynamic and flexible based on user requirements and usage environments. The systems need to adapt to various changes in the need of users, service providers, and environments. However, embedded systems that implement elements of the connected space have resource constraints and difficulty in updating software;this is a significant challenge for embedded systems in providing dynamic and flexible services continuously. To tackle this challenge, this study considers reinforcement learning technologies for autonomous adaptive embedded systems for sustainable usage. We discuss the requirements of embedded systems and the rationale of the selected reinforcement learning method. © 2020 ieee.

关键词： Internet of things

来源：评论

学校读者我要写书评

暂无评论

Distributed Optimal Coordination Control for Continuous-Time Nonlinear Multi-Agent Systems With Input Constraints 9

Distributed Optimal Coordination Control for Continuous-Time...

引用

9th ieee Data Driven Control and learning Systems Conference (DDCLS)

作者： Deng, Yunhong Xiao, Jun Wei, Qinglai Univ Chinese Acad Sci Beijing 100049 Peoples R China

ISBN: (纸本)9781728159225

U This paper is concerned with an optimal coordination control problem for nonlinear multi-agent systems (MASs) with constraints of the control inputs. The idea of daptive dynamic programming (ADP) algorithm is to use the policy iteration to solve the coupled Hamilton-Jacobi equations. First, a suitable non-quadratic functional is introduced into the cost function to transform the question into an optimization problem. Second, a distributed control law is designed for each agent, which aims that the cost function of the MASs converge to Nash equilibrium. Next, the analysis of the convergence is indicated that the iterative cost functions of nonlinear multi-agent systems is convergent. Neural network (NNs) are used to approximate the cost functions for the calculation of the control laws. Finally, simulation results show the effectiveness of the coordination control algorithm.

关键词： adaptive dynamic programming (ADP) multi-agent systems Nash equilibrium reinforcement learning optimal control

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：