检索结果-内蒙古大学图书馆

SEAMS international Workshop on Software Engineering for Adaptive and Self-Managing Systems, ICSE

作者： Mateo Sanabria Ivana Dusparic Nicolás Cardozo Universidad de los Andes Colombia Trinity College Dublin Ireland

ISBN: (数字)9798400705854

ISBN: (纸本)9798350363838

Self-healing systems depend on following a set of predefined instructions to recover from a known failure state. Failure states are generally detected based on domain specific specialized metrics. Failure fixes are applied at predefined application hooks that are not sufficiently expressive to manage different failure types. Self-healing is usually applied in the context of distributed systems, where the detection of failures is constrained to communication problems, and resolution strategies often consist of replacing complete components. However, current complex systems may reach failure states at a fine granularity not anticipated by developers (for example, value range changes for data streaming in IoT systems), making them unsuitable for existing self-healing techniques. To counter these problems, in this paper we propose a new self-healing framework that learns recovery strategies for healing fine-grained system behavior at run time. Our proposal targets complex reactive systems, defining monitors as predicates specifying satisfiability conditions of system properties. Such monitors are functionally expressive and can be defined at run time to detect failure states at any execution point. Once failure states are detected, we use a reinforcement learning-based technique to learn a recovery strategy based on users' corrective sequences. Finally, to execute the learned strategies, we extract them as Context-oriented programming variations that activate dynamically whenever the failure state is detected, overwriting the base system behavior with the recovery strategy for that state. We validate the feasibility and effectiveness of our framework through a prototypical reactive application for tracking mouse movements, and the DeltaIoT exemplar for self-healing systems. Our results demonstrate that with just the definition of monitors, the system is effective in detecting and recovering from failures between 55% - 92% of the cases in the first application, and at pa

关键词： Measurement Adaptive systems Tracking Autonomous systems programming Proposals Complex systems

来源：评论

学校读者我要写书评

暂无评论

Effective Elastic Scaling of Deep learning Workloads 28

Effective Elastic Scaling of Deep Learning Workloads

引用

28th ieee international symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (ieee MASCOTS)

作者： Saxena, Vaibhav Jayaram, K. R. Basu, Saurav Sabharwal, Yogish Verma, Ashish IBM Res Delhi India Microsoft Hyderabad India

ISBN: (纸本)9781728192383

We examine the elastic scaling of Deep learning (DL) jobs and propose a novel resource allocation strategy for DL training jobs, resulting in improved job run time performance as well as increased cluster utilization. We begin by analyzing DL workloads and exploit the fact that DL jobs can be run with a range of batch sizes without affecting their final accuracy. We formulate an optimization problem that explores a dynamic batch size allocation to individual DL jobs based on their scaling efficiency, when running on multiple nodes. We design a fast dynamic programming based optimizer to solve this problem in real-time to determine jobs that can be scaled up/down, and use this optimizer in an autoscaler to dynamically change the allocated resources and batch sizes of individual DL jobs. We demonstrate empirically that our elastic scaling algorithm can complete up to approximate to 2x as many jobs as compared to a strong baseline algorithm that also scales the number of GPUs but does not change the batch size, with average completion times up to approximate to 10x faster.

关键词： elasticity deep learning variable batch size

来源：评论

学校读者我要写书评

暂无评论

Revisiting Maximum Entropy Inverse reinforcement learning: New Perspectives and Algorithm

Revisiting Maximum Entropy Inverse Reinforcement Learning: N...

引用

ieee symposium Series on Computational Intelligence (ieee SSCI)

作者： Snoswell, Aaron J. Singh, Surya P. N. Ye, Nan Univ Queensland Sch Informat Technol & Elect Engn Brisbane Qld Australia Intuit Surg Sunnyvale CA USA Univ Queensland Sch Math & Phys Brisbane Qld Australia

ISBN: (纸本)9781728125473

We provide new perspectives and inference algorithms for Maximum Entropy (MaxEnt) Inverse reinforcement learning (IRL), which provides a principled method to find a most non-committal reward function consistent with given expert demonstrations, among many consistent reward functions. We first present a generalized MaxEnt formulation based on minimizing a KL-divergence instead of maximizing an entropy. This improves the previous heuristic derivation of the MaxEnt IRL model (for stochastic MDPs), allows a unified view of MaxEnt IRL and Relative Entropy IRL, and leads to a model-free learning algorithm for the MaxEnt IRL model. Second, a careful review of existing inference algorithms and implementations showed that they approximately compute the marginals required for learning the model. We provide examples to illustrate this, and present an efficient and exact inference algorithm. Our algorithm can handle variable length demonstrations;in addition, while a basic version takes time quadratic in the maximum demonstration length an improved version of this algorithm reduces this to linear using a padding trick. Experiments show that our exact algorithm improves reward learning as compared to the approximate ones. Furthermore, our algorithm scales up to a large, real-world dataset involving driver behaviour forecasting. We provide an optimized implementation compatible with the OpenAl Gym interface. Our new insight and algorithms could possibly lead to further interest and exploration of the original MaxEnt IRL model.

关键词： Inverse reinforcement learning reinforcement learning Maximum Entropy dynamic programming Algorithms

来源：评论

学校读者我要写书评

暂无评论

Trajectory Tracking of Underactuated Sea Vessels With Uncertain dynamics: An Integral reinforcement learning Approach

Trajectory Tracking of Underactuated Sea Vessels With Uncert...

引用

ieee international Conference on Systems, Man, and Cybernetics (SMC)

作者： Abouheaf, Mohammed Gueaieb, Wail Miah, Md Suruz Spinello, Davide Univ Ottawa Sch Elect Engn & Comp Sci Ottawa ON Canada Bradley Univ Dept Elect & Comp Engn Peoria IL USA Univ Ottawa Dept Mech Engn Ottawa ON Canada

ISBN: (纸本)9781728185262

Underactuated systems like sea vessels have degrees of motion that are insufficiently matched by a set of independent actuation forces. In addition, the underlying trajectory-tracking control problems grow in complexity in order to decide the optimal rudder and thrust control signals. This enforces several difficult-to-solve constraints that are associated with the error dynamical equations using classical optimal tracking and adaptive control approaches. An online machine learning mechanism based on integral reinforcement learning is proposed to find a solution for a class of nonlinear tracking problems with partial prior knowledge of the system dynamics. The actuation forces are decided using innovative forms of temporal difference equations relevant to the vessel's surge and angular velocities. The solution is implemented using an online value iteration process which is realized by employing means of the adaptive critics and gradient descent approaches. The adaptive learning mechanism exhibited well-functioning and interactive features in react to different desired reference-tracking scenarios.

关键词： approximate dynamic programming Integral reinforcement learning Adaptive Critics Underactuated Vessels

来源：评论

学校读者我要写书评

暂无评论

Actor-Critic Algorithm for Optimal Synchronization of Kuramoto Oscillator 7

Actor-Critic Algorithm for Optimal Synchronization of Kuramo...

引用

7th international Conference on Control, Decision and Information Technologies (CoDIT)

作者： Vrushabh, D. Shalini, K. Sonam, K. Veermata Jijabai Technol Inst EED Mumbai Maharashtra India

ISBN: (纸本)9781728159539

This paper constructs a reinforcement learning (RL) based algorithm of Actor-Critic (AC) for the optimal synchronism of the Kuramoto oscillator. This is accomplished through the Ott-Antonsen ansatz framework for the dynamics of large interactive unit networks. Besides, this approach reduces the infinite-dimensional dynamics to phase space flow, i.e., low dimensional dynamics for certain systems of globally coupled phase oscillators. The resulting Hamiltonian-Jacobi-Bellman (HJB) expression is extremely difficult to solve in general, therefore this paper introduces the AC method for learning approximate optimal control laws for the Kuramoto oscillator model. RL has been contemplated as one of the efficient methods to solve optimal control of non-linear systems. For a collection of non-homogeneous oscillators, the states are elucidated as phase angles, which is the modification of the model for a coupled Kuramoto oscillator. An admissible initial control policy for the Kuramoto oscillator model is designed and solved using RL giving an approximate solution of the optimal control problem. Finally, local synchronism of the coupled Kuramoto oscillator model is supported through simulations analysis.

关键词： reinforcement learning Hamilton-Jacobi-Bellman approximate dynamic programming Kuramoto oscillator Mean-field game Order parameter

来源：评论

学校读者我要写书评

暂无评论

Editorial Special Issue on Adaptive dynamic programming and reinforcement learning

引用

ieee Transactions on Systems, Man, and Cybernetics: Systems 2020年第11期50卷 3944-3947页

作者： Liu, Derong Lewis, Frank L. Wei, Qinglai School of Automation Guangdong University of Technology Guangzhou510006 China Uta Research Institute University of Texas at Arlington Fort WorthTX76118 United States State Key Laboratory of Management and Control for Complex Systems Istitute of Automation Chinese Academy of Sciences Beijing100190 China University of Chinese Academy of Sciences Beijing100049 China

The past decade has witnessed a surge in research activities related to adaptive dynamic programming (ADP) and reinforcement learning (RL), particularly for control applications. Several books [item 1)–5) in the Appendix] and survey papers [item 6)–10) in the Appendix] have been published on the subject. Both ADP and RL provide approximate solutions to dynamic programming problems. In a 1995 article by Barto et al. [item 11) in the Appendix], they introduced the so-called “adaptive real-time dynamic programming,” which was specifically to apply ADP for real-time control. Later, in 2002, Murray et al. [item 12) in the Appendix] developed an ADP algorithm for optimal control of continuous-time affine nonlinear systems. On the other hand, the most famous algorithms in RL are the temporal difference algorithm [item 13) in the Appendix] and the Q-learning algorithm [item 14) and 15) in the Appendix].

关键词： Special issues and sections reinforcement learning learning systems Control systems dynamic programming Real-time systems Optimal control

来源：评论

学校读者我要写书评

暂无评论

Resource Provisioning in Fog Computing through Deep reinforcement learning

Resource Provisioning in Fog Computing through Deep Reinforc...

引用

IFIP/ieee international symposium on Integrated Network Management

作者： José Santos Tim Wauters Bruno Volckaert Filip De Turck IDLab Ghent University - imec Gent Belgium

The massive growth of connected devices has made traditional cloud systems inadequate to sustain the scalability, mobility, and heterogeneous nature of the Internet of Things (oT). Distributed clouds have become a potential business opportunity for many service providers enabling the deployment of services on computational resources from the cloud up to the edge. However, challenges persist in fog-cloud infrastructures. One of them is known as Service Function Chaining (SFC), where providers benefit from network softwarization to create virtual chains of connected micro-services. Research has tackled SFC Allocation (SFCA) through theoretical modeling and heuristic algorithms, which often cannot cope with the dynamic behavior of the network. Recent works have addressed these challenges through Machine learning (ML), which can be capable of dynamically reconfiguring cloud-native service requirements over the continuum of virtual resources in next-generation networks. Thus, in this paper, a Deep reinforcement learning (DRL) approach is proposed for SFCA in Fog Computing focused on energy efficiency. Our agent learns about the best resource allocation decisions, focused on reducing costs from a previously presented Mixed-integer linear programming (MILP) formulation. Results show that our agent achieves comparable performance to state-of-the-art MILP formulations during dynamic use cases, obtaining 95% of request acceptance.

关键词： Training Radio frequency Cloud computing Service function chaining Heuristic algorithms Computational modeling Scalability

来源：评论

学校读者我要写书评

暂无评论

Bridging Hamilton-Jacobi Safety Analysis and reinforcement learning

Bridging Hamilton-Jacobi Safety Analysis and Reinforcement L...

引用

ieee international Conference on Robotics and Automation (ICRA)

作者： Fisac, Jaime E. Lugovoy, Neil E. Rubies-Royo, Vicenc Ghosh, Shromona Tomlin, Claire J. Univ Calif Berkeley Dept Elect Engn & Comp Sci Berkeley CA 94720 USA

ISBN: (纸本)9781538660263

Safety analysis is a necessary component in the design and deployment of autonomous robotic systems. Techniques from robust optimal control theory, such as Hamilton-Jacobi reachability analysis, allow a rigorous formalization of safety as guaranteed constraint satisfaction. Unfortunately, the computational complexity of these tools for general dynamical systems scales poorly with state dimension, making existing tools impractical beyond small problems. Modern reinforcement learning methods have shown promising ability to find approximate yet proficient solutions to optimal control problems in complex and high-dimensional systems, however their application has in practice been restricted to problems with an additive payoff over time, unsuitable for reasoning about safety. In recent work, we introduced a time-discounted modification of the problem of maximizing the minimum payoff over time, central to safety analysis, through a modified dynamic programming equation that induces a contraction mapping. Here, we show how a similar contraction mapping can render reinforcement learning techniques amenable to quantitative safety analysis as tools to approximate the safe set and optimal safety policy. This opens a new avenue of research connecting control-theoretic safety analysis and the reinforcement learning domain. We validate the correctness of our formulation by comparing safety results computed through Q-learning to analytic and numerical solutions, and demonstrate its scalability by learning safe sets and control policies for simulated systems of up to 18 state dimensions using value learning and policy gradient techniques.

关键词： Safety Automation reinforcement learning Robots Optimal control Jacobian matrices Reachability analysis

来源：评论

学校读者我要写书评

暂无评论

Event-trigger-based robust control for nonlinear constrained-input systems using reinforcement learning method

引用

NEUROCOMPUTING 2019年 340卷 158-170页

作者： Yang, Dongsheng Li, Ting Zhang, Huaguang Xie, Xiangpeng Northeastern Univ Coll Informat Sci & Engn Shenyang 110819 Liaoning Peoples R China Nanjing Univ Posts & Telecommun Inst Adv Technol Nanjing 210023 Jiangsu Peoples R China

In this paper, an online integral reinforcement learning strategy is proposed to deal with robust constrained control problems using event-triggered mechanism for nonlinear Continuous-Time (C-T) systems with external disturbances. The novel design of constrained control law is addressed together with the adaptive event-triggered condition by guaranteeing the optimal performance and system stability. An adaptive online actor-critic Neural Network (NN) reinforcement learning scheme is developed to approximate the optimal solution of the complicated Hamilton-Jacobi-Isaacs equation. Meanwhile, the convergence of NN weight errors and the event-triggered closed-loop system stability are demonstrated to be uniform ultimate bounded by Lyapunov analysis under the proposed triggering condition. Moreover, event-triggered H-infinity tracking control with input constrains and limited network communication is also presented by establishing an augmented system. Finally, simulation results are provided to show the algorithm validity. (C) 2019 Elsevier B.V. All rights reserved.

关键词： Event-triggered control Robust H-infinity control Hamilton-Jacobi-Isaacs (HJI) equation Neural networks Input constrains

来源：评论

学校读者我要写书评

暂无评论

Chic: Experience-driven Scheduling in Machine learning Clusters 19

Chic: Experience-driven Scheduling in Machine Learning Clust...

引用

ieee/ACM international symposium on Quality of Service (IWQoS)

作者： Gong, Yifan Li, Baochun Liang, Ben Zhan, Zheng Univ Toronto Dept Elect & Comp Engn Toronto ON Canada Syracuse Univ Coll Engn & Comp Sci Syracuse NY 13244 USA

ISBN: (数字)9781450367783

ISBN: (纸本)9781450367783

Large-scale machine learning (ML) models are routinely trained in a distributed fashion, due to their increasing complexity and data sizes. In a shared cluster handling multiple distributed learning workloads with a parameter server framework, it is important to determine the adequate number of concurrent workers and parameter servers for each ML workload over time, in order to minimize the average completion time and increase resource utilization. Existing schedulers for machine learning workloads involve meticulously designed heuristics. However, as the execution environment is highly complex and dynamic, it is challenging to construct an accurate model to make online decisions. In this paper, we design an experience-driven approach that learns to manage the cluster directly from experience rather than using a mathematical model. We propose Chic, a scheduler that is tailored for scheduling machine learning workloads in a cluster by leveraging deep reinforcement learning techniques. With our design of the state space, action space, and reward function, Chic trains a deep neural network with a modified version of the cross-entropy method to approximate the policy for assigning workers and parameter servers for future workloads based on the experience of the agent. Furthermore, a simplified version named Chic-Pair with a shorter training time for the policy is purposed by assigning workers and parameter servers in a pair. We compare Chic and Chic-Pair with state-of-the-art heuristics, and our results show that Chic and Chic-Pair are able to reduce the average training time significantly for machine learning workloads under a wide variety of conditions.

关键词： Distributed Machine learning Deep reinforcement learning Work-load Scheduling

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：