检索结果-内蒙古大学图书馆

Discrete-Time Stable Generalized Self-learning Optimal Control With Approximation Errors

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2018年第4期29卷 1226-1238页

作者： Wei, Qinglai Li, Benkai Song, Ruizhuo Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China Univ Chinese Acad Sci Beijing 100049 Peoples R China Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China

In this paper, a generalized policy iteration (GPI) algorithm with approximation errors is developed for solving infinite horizon optimal control problems for nonlinear systems. The developed stable GPI algorithm provides a general structure of discrete-time iterative adaptive dynamic programming algorithms, by which most of the discrete-time reinforcement learning algorithms can be described using the GPI structure. It is for the first time that approximation errors are explicitly considered in the GPI algorithm. The properties of the stable GPI algorithm with approximation errors are analyzed. The admissibility of the approximate iterative control law can be guaranteed if the approximation errors satisfy the admissibility criteria. The convergence of the developed algorithm is established, which shows that the iterative value function is convergent to a finite neighborhood of the optimal performance index function, if the approximate errors satisfy the convergence criterion. Finally, numerical examples and comparisons are presented.

关键词： adaptive critic designs adaptive dynamic programming (ADP) approximate dynamic programming generalized policy iteration (GPI) neural networks neurodynamic programming nonlinear systems optimal control reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

AutoScale: Energy Efficiency Optimization for Stochastic Edge Inference Using reinforcement learning

AutoScale: Energy Efficiency Optimization for Stochastic Edg...

引用

ieee/ACM International symposium on Microarchitecture (MICRO)

作者： Young Geun Kim Carole-Jean Wu Arizona State University Facebook AI

ISBN: (数字)9781728173832

ISBN: (纸本)9781728173849

Deep learning inference is increasingly run at the edge. As the programming and system stack support becomes mature, it enables acceleration opportunities in a mobile system, where the system performance envelope is scaled up with a plethora of programmable co-processors. Thus, intelligent services designed for mobile users can choose between running inference on the CPU or any of the co-processors in the mobile system, and exploiting connected systems such as the cloud or a nearby, locally connected mobile system. By doing so, these services can scale out the performance and increase the energy efficiency of edge mobile systems. This gives rise to a new challenge-deciding when inference should run where. Such execution scaling decision becomes more complicated with the stochastic nature of mobile-cloud execution environment, where signal strength variation in the wireless networks and resource interference can affect real-time inference performance and system energy efficiency. To enable energy efficient deep learning inference at the edge, this paper proposes AutoScale, an adaptive and lightweight execution scaling engine built on the custom-designed reinforcement learning algorithm. It continuously learns and selects the most energy efficient inference execution target by considering characteristics of neural networks and available systems in the collaborative cloud-edge execution environment while adapting to stochastic runtime variance. Real system implementation and evaluation, considering realistic execution scenarios, demonstrate an average of 9.8x and 1.6x energy efficiency improvement over the baseline mobile CPU and cloud offloading, respectively, while meeting the real-time performance and accuracy requirements.

关键词： Deep learning Cloud computing Runtime reinforcement learning Quality of service Energy efficiency Real-time systems

来源：评论

学校读者我要写书评

暂无评论

Block-Decentralized Model-Free reinforcement learning Control of Two Time-Scale Networks

Block-Decentralized Model-Free Reinforcement Learning Contro...

引用

American Control Conference (ACC)

作者： Mukherjee, Sayak Bai, He Chakrabortty, Aranya North Carolina State Univ Dept Elect & Comp Engn Raleigh NC 27695 USA Oklahoma State Univ Sch Mech & Aerosp Engn Stillwater OK 74078 USA

ISBN: (纸本)9781538679265

In this paper, we present a cluster-wise decentralized model-free reinforcement learning (RL) based control design for a linear time-invariant consensus network. We assume that the fast dynamics of the network is stable and design the control to shape the slow dynamics. The design exploits timescale separation properties inherent in the slow dynamics of the clusters and the weak couplings between the clusters. The aggregated slow variable from each cluster is used for feedback and decentralized controllers are learned for each cluster. Using singular perturbation theory, we show the sub-optimality of the learned controller and provide closed-loop stability conditions. We prove that this decentralized learning design will produce close-to-optimal performance if the clustering is strong with weak inter-cluster couplings. This design reduces the learning time and the amount of communication links required. The effectiveness of the design is demonstrated using a numerical example.

关键词： Clustered network reinforcement learning adaptive dynamic programming Block-decentralized control Two time-scale

来源：评论

学校读者我要写书评

暂无评论

A Service Migration Method Based on dynamic Awareness in Mobile Edge Computing

A Service Migration Method Based on Dynamic Awareness in Mob...

引用

ieee symposium on Network Operations and Management

作者： Menglei Zhang Haoqiu Huang LanLan Rui Guo Hui Ying Wang Xuesong Qiu State Key Laboratory of Networking and Switching Technology Beijing University of Posts and Telecommunications Beijing China Information Department China Aerospace Science and Industry Corporation Limited Network Beijing China

ISBN: (数字)9781728149738

ISBN: (纸本)9781728149745

Cloud computing technologies can not satisfy the requirements of applications on the mobile terminals because of their disadvantages in delay, link load and energy. So Mobile Edge Computing (MEC) is proposed as a kind of novel computing technology. As an important research direction of MEC, service migration methods still have limitations that they cannot learn migration paths and be adaptive in dynamic situation and user movement. In this paper, we propose a novel service migration policy method based on reinforcement learning. We firstly investigate user movement, four different edge network situations and traditional migration policies. Then we formulate the system requirements by Satisfiability Modulo Theory (SMT) logic to acquire the migration policy space. We further propose a dynamic-awareness deep Q-learning algorithm to select paths from the policy space iteratively and conduct dynamic awareness to adjust learning rate adaptively. Meanwhile, the optimal convergence of our algorithm is proved theoretically. Finally, the experimental results highlight the effectiveness as migration successful rate, service interruption time and load balance of our method compared to the other solutions.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Manifold Regularized reinforcement learning

引用

ieee TRANSACTIONS ON NEURAL NETWORKS AND learning SYSTEMS 2018年第4期29卷 932-943页

作者： Li, Hongliang Liu, Derong Wang, Ding Tencent Inc AI Platform Dept Shenzhen 518057 Peoples R China Univ Sci & Technol Beijing Sch Automat & Elect Engn Beijing 100083 Peoples R China Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China

This paper introduces a novel manifold regularized reinforcement learning scheme for continuous Markov decision processes. Smooth feature representations for value function approximation can be automatically learned using the unsupervised manifold regularization method. The learned features are data-driven, and can be adapted to the geometry of the state space. Furthermore, the scheme provides a direct basis representation extension for novel samples during policy learning and control. The performance of the proposed scheme is evaluated on two benchmark control tasks, i.e., the inverted pendulum and the energy storage problem. Simulation results illustrate the concepts of the proposed scheme and show that it can obtain excellent performance.

关键词： adaptive dynamic programming approximate dynamic programming approximate policy iteration (API) manifold regularization reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

AVAC: A Machine learning Based adaptive RRAM Variability-Aware Controller for Edge Devices

AVAC: A Machine Learning Based Adaptive RRAM Variability-Awa...

引用

ieee International symposium on Circuits and Systems (ISCAS)

作者： Shikhar Tuli Shreshth Tuli Department of Electrical Engineering Indian Institute of Technology Delhi Department of Computer Science and Engineering Indian Institute of Technology Delhi

ISBN: (数字)9781728133201

ISBN: (纸本)9781728133218

Recently, the Edge Computing paradigm has gained significant popularity both in industry and academia. Researchers now increasingly target to improve performance and reduce energy consumption of such devices. Some recent efforts focus on using emerging RRAM technologies for improving energy efficiency, thanks to their no leakage property and high integration density. As the complexity and dynamism of applications supported by such devices escalate, it has become difficult to maintain ideal performance by static RRAM controllers. Machine learning provides a promising solution for this, and hence, this work focuses on extending such controllers to allow dynamic parameter updates. In this work we propose an adaptive RRAM Variability-Aware Controller, AVAC, which periodically updates Wait Buffer and batch sizes using on-the-fly learning models and gradient ascent. AVAC allows Edge devices to adapt to different applications and their stages, to improve computation performance and reduce energy consumption. Simulations demonstrate that the proposed model can provide up to 29% increase in performance and 19% decrease in energy, compared to static controllers, using traces of real-life healthcare applications on a Raspberry-Pi based Edge deployment.

关键词： programming Feature extraction Buffer storage Performance evaluation Machine learning Adaptation models Switches

来源：评论

学校读者我要写书评

暂无评论

learning Run-Time Compositions of Interacting Adaptations

Learning Run-Time Compositions of Interacting Adaptations

引用

SEAMS International Workshop on Software Engineering for adaptive and Self-Managing Systems, ICSE

作者： Nicolás Cardozo Ivana Dusparic Systems and Computing Engineering Department Universidad de los Andes Colombia School of Computer Science and Statistics Trinity College Dublin Ireland

Self-adaptive systems continuously adapt to internal and external changes in their execution environment. In context-based self-adaptation, adaptations take place in response to the characteristics of the execution environment, captured as a context. However, in large-scale adaptive systems operating in dynamic environments, multiple contexts are often active at the same time, requiring simultaneous execution of multiple adaptations. Complex interactions between such adaptations might not have been foreseen or accounted for at design time. For example, adaptations can partially overlap, requiring only partial execution of each, or they can be conflicting, requiring some of the adaptations not to be executed at all, in order to preserve system execution. To ensure a correct composition of adaptations, we propose ComInA, a novel reinforcement learning based approach, which autonomously learns interactions between adaptations as well as the most appropriate adaptation composition for each combination of active contexts, as they arise. We present an initial evaluation of ComInA in an urban public transport network simulation, where multiple adaptations to buses, routes, and stations are required. Early results show that ComInA correctly identifies whether adaptations are compatible or conflicting and learns to execute adaptations which maximize system performance. However, further investigation is needed into how best to utilize such identified relationships to optimize a wider range of metrics and utilize more complex composition strategies.

关键词：

来源：评论

学校读者我要写书评

暂无评论

A dynamic Energy-saving Deployment Algorithm for Virtual Data Centers 4

A Dynamic Energy-saving Deployment Algorithm for Virtual Dat...

引用

4th ieee International Conference on Smart Cloud (ieee SmartCloud) / 3rd ieee International symposium on reinforcement learning (ieee ISRL)

作者： Han, Shujun Li, Jun Ma, Yuxiang Dong, Qian Wu, Di Chinese Acad Sci Comp Network Informat Ctr Beijing 100190 Peoples R China Univ Chinese Acad Sci Beijing 100049 Peoples R China Henan Univ Sch Comp & Informat Engn Kaifeng 475004 Peoples R China Henan Univ Henan Key Lab Big Data Anal & Proc Kaifeng 475004 Peoples R China Foshan Univ Sch Elect Informat Engn Foshan 528000 Peoples R China Peoples Bank China Chengdu Branch Chengdu 610041 Peoples R China

ISBN: (纸本)9781728155050

Network Function Virtualization (NFV) is a rapidly evolving network technology in recent years. The purpose of NFV is to use virtualization technology to softwareize network functions, and dynamically deploy virtual network functions (VNFs) according to the usage status of network links and the service requirements of users. NFV can increase the flexibility of network services and the utilization of network resources. In the proposed paper, we analyze the user data of urban computing, and propose that the time and location of the user's use of the network service is subject to regular changes. Based on this judement, we propose a new energy-saving deployment method for virtual data centers (vDCs). In this paper, we formalize the placement problem of vDC into a multicommodity flow problem and address it as an integer linear programming (ILP). We design a centrality-based greedy algorithm and evaluate its effectiveness by comparing the proposed algorithm with the ILP optimal solution. The evaluation results show that the greedy algorithm proposed in this paper can obtain the approximate optimal solution of ILP, and the running time of the proposed algorithm is shorter than the ILP solution when the number of network nodes increases.

关键词： Network function virtualization

来源：评论

学校读者我要写书评

暂无评论

A Combined Policy Gradient and Q-learning Method for Data-driven Optimal Control Problems 9

A Combined Policy Gradient and Q-learning Method for Data-dr...

引用

9th International Conference on Information Science and Technology (ICIST)

作者： Lin, Mingduo Liu, Derong Zhao, Bo Dai, Qionghai Dong, Yi Guangdong Univ Technol Sch Automat Guangzhou Peoples R China Beijing Normal Univ Sch Syst Sci Beijing Peoples R China Tsinghua Univ Dept Automat Beijing Peoples R China Beijing Inst Technol Sch Opt & Photon Beijing Peoples R China

ISBN: (纸本)9781728121062

This paper focuses on the data-driven controller design for optimal control problems of nonlinear nonaffine discrete-time systems. A novel policy gradient and Q-learning (PGQL) adaptive algorithm which learns the optimal control policy from real empirical data is developed without requiring system dynamics. A policy iteration scheme is designed to iteratively update the approximate Q-function, and the control policy is improved via gradient method until they converge to the bounded regions of the optimal Q-function and the optimal control policy, respectively. Two neural networks (NNs) are employed to realize the developed algorithm. Moreover, the convergence analysis of approximate Q-function is established. Since the control policy is parameterized, it can be upgraded through updating the actor-NN parameters in the direction of the performance gradient. Finally, the simulation results are given to verify the performance of the developed PGQL adaptive algorithm.

关键词： adaptive dynamic programming optimal control reinforcement learning policy gradient Q-learning data-driven

来源：评论

学校读者我要写书评

暂无评论

learning Without External Reward

引用

ieee COMPUTATIONAL INTELLIGENCE MAGAZINE 2018年第3期13卷 48-54页

作者： He, Haibo Zhong, Xiangnan Univ Rhode Isl Dept Elect Comp & Biomed Engn Kingston RI 02881 USA Univ North Texas Dept Elect Engn Denton TX 76203 USA

In the traditional reinforcement learning paradigm, a reward signal is applied to define the goal of the task. Usually, the reward signal is a "hand-crafted" numerical value or a pre-defined function: it tells the agent how good or bad a specific action is. However, we believe there exist situations in which the environment cannot directly provide such a reward signal to the agent. Therefore, the question is whether an agent can still learn without the external reward signal or not. To this end, this article develops a self-learning ap-proach which enables the agent to adaptively develop an internal reward signal based on a given ultimate goal, without requiring an explicit external reward signal from the environment. In this article, we aim to convey the self-learning idea in a broad sense, which could be used in a wide range of existing reinforcement learning and adaptive dynamic programming algorithms and architectures. We describe the idealized forms of this method mathematically, and also demonstrate its effectiveness through a triple-link inverted pendulum case study.

关键词： Neural networks Robot learning learning (artificial intelligence) Task analysis dynamic programming Machine learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：