Self-adaptive systems continuously adapt to internal and external changes in their execution environment. In context-based self-adaptation, adaptations take place in response to the characteristics of the execution en...
Self-adaptive systems continuously adapt to internal and external changes in their execution environment. In context-based self-adaptation, adaptations take place in response to the characteristics of the execution environment, captured as a context. However, in large-scale adaptive systems operating in dynamic environments, multiple contexts are often active at the same time, requiring simultaneous execution of multiple adaptations. Complex interactions between such adaptations might not have been foreseen or accounted for at design time. For example, adaptations can partially overlap, requiring only partial execution of each, or they can be conflicting, requiring some of the adaptations not to be executed at all, in order to preserve system execution. To ensure a correct composition of adaptations, we propose ComInA, a novel reinforcementlearning based approach, which autonomously learns interactions between adaptations as well as the most appropriate adaptation composition for each combination of active contexts, as they arise. We present an initial evaluation of ComInA in an urban public transport network simulation, where multiple adaptations to buses, routes, and stations are required. Early results show that ComInA correctly identifies whether adaptations are compatible or conflicting and learns to execute adaptations which maximize system performance. However, further investigation is needed into how best to utilize such identified relationships to optimize a wider range of metrics and utilize more complex composition strategies.
In robot-assisted rehabilitation, assist-as-needed (AAN) controllers have been proposed to promote subjects' active participation, which is thought to lead to better training outcomes. Most of these AAN controller...
In robot-assisted rehabilitation, assist-as-needed (AAN) controllers have been proposed to promote subjects' active participation, which is thought to lead to better training outcomes. Most of these AAN controllers require a patient-specific manual tuning of the parameters defining the underlying force-field, which typically results in a tedious and time-consuming process. In this paper, we propose a reinforcement-learning-based impedance controller that actively reshapes the stiffness of the force-field to the subject's performance, while providing assistance only when needed. This adaptability is made possible by correlating the subject's most recent performance to the ultimate control objective in real-time. In addition, the proposed controller is built upon action dependent heuristic dynamicprogramming using the actor-critic structure, and therefore does not require prior knowledge of the system model. The controller is experimentally validated with healthy subjects through a simulated ankle mobilization training session using a powered ankle-foot orthosis.
This paper presents an Approximate/adaptivedynamicprogramming (ADP) algorithm that finds online the Nash equilibrium for two-player nonzero-sum differential games with linear dynamics and infinite horizon quadratic ...
详细信息
ISBN:
(纸本)9781424477456
This paper presents an Approximate/adaptivedynamicprogramming (ADP) algorithm that finds online the Nash equilibrium for two-player nonzero-sum differential games with linear dynamics and infinite horizon quadratic cost. Each of the game players is using the procedure of Integral reinforcementlearning (IRL) to calculate online the infinite horizon value function that it associates with every given set of feedback control policies. It will be shown that the online algorithm is mathematically equivalent to an offline iterative method, previously introduced in the literature, that solves the set of coupled algebraic Riccati equations (ARE) underlying the game problem using complete knowledge on the system dynamics. Here we show how the ADP techniques will enhance the capabilities of the offline method allowing an online solution without the requirement of complete knowledge of the system dynamics. The two participants in the continuous-time differential game are competing in real-time and the feedback Nash control strategies will be determined based on online measured data from the system. The algorithm is built on interplay between a learning phase, where each of the players is learning online the value that they associate with a given set of play policies, and a policy update step, performed by each of the payers towards decreasing the value of their cost. The players are learning concurrently. The feasibility of the ADP scheme is demonstrated in simulation.
As the user count for video-related services continues to grow, ensuring high-quality service (QoS) for them will become even more crucial in the future. Many studies have been conducted to enhance the quality of on-d...
详细信息
ISBN:
(数字)9798350327939
ISBN:
(纸本)9798350327946
As the user count for video-related services continues to grow, ensuring high-quality service (QoS) for them will become even more crucial in the future. Many studies have been conducted to enhance the quality of on-demand video streaming using adaptive bitrate (ABR) algorithms and artificial intelligence (AI). This study addresses a more complex challenge than that of on-demand video streaming: enhancing service quality in multi-party, full-duplex communication scenarios, such as video conferences. We propose a deep reinforcementlearning (DRL)-based video bitrate allocation framework for a media server in the video conferencing system. Our framework aims to increase overall QoS by applying an appropriate bitrate for each connection in a video conferencing call, considering the network conditions for users. We train the DRL model to maximize the aggregate QoS of users in a meeting by constructing a feedback loop between a media server and a DRL server. Our experimental results demonstrate that our framework can adaptively control the video bitrate according to changes in network conditions. As a result, it achieves higher video bitrates in the user application (approximately, 5% under stable network conditions and 35% over the highly dynamic network conditions) compared to the existing rule-based bandwidth allocation.
Utility-scale inverter interfaced photovoltaic (PV) power plants integrated to weak grids experience several challenging issues, viz., poorly damped oscillations of power, transient dc bus over/under voltage and insta...
详细信息
ISBN:
(数字)9798350351330
ISBN:
(纸本)9798350351347
Utility-scale inverter interfaced photovoltaic (PV) power plants integrated to weak grids experience several challenging issues, viz., poorly damped oscillations of power, transient dc bus over/under voltage and instability during LVRT, which ultimately result in tripping of units. To overcome these challenges, a reinforcementlearning (RL) based adaptive optimal virtual inertia control (AOVIC) scheme has been proposed in this paper. The implemented virtual inertia control (VIC) scheme consists of two parts, (1) the virtual governor and (2) virtual inertia, which have been emulated using the stored energy of the DC-link capacitor. Small signal modelling and stability analysis have also been performed to demonstrate the effectiveness of such a VIC scheme for weak grid-tied PV power plant under variations in grid strength. Thereafter, an adaptivedynamicprogramming (ADP) strategy has been utilized for implementing the RL-based AOVIC scheme, which optimally tunes the VIC loop in a model-free manner with an aim of enhancing the power oscillation damping. Finally, the efficacy of the proposed AOVIC scheme has been evaluated by performing numerical simulations on a detailed nonlinear model of a weak grid-tied PV power plant.
In this paper a new method for designing and implementing coordinated wide area controller architecture is presented and tested using real-time digital simulation on a benchmark two area power system model for improve...
详细信息
In this paper a new method for designing and implementing coordinated wide area controller architecture is presented and tested using real-time digital simulation on a benchmark two area power system model for improved power system dynamic stability. The algorithm is an optimal Wide Area System-Centric Controller and Observer (WASCCO) based on reinforcement and temporal difference learning which allows the system to learn from interaction and predict future states. The controller design uses a powerful technique of the adaptive critic design (ACD) family called dual heuristic programming (DHP). The DHP controllers training and testing are implemented on the Innovative Integration Picolo card consisting of the TMS320C28335 processor. The main advantage of this design is its ability to learn from the past using eligibility traces and predict the optimal trajectory through temporal difference learning in the format of Receding Horizon Control(RHC). Results on a two area system provides better response compared to conventional schemes.
Many traditional high-performance computing applications including those that follow the Bulk Synchronous Parallel (BSP) communication paradigm are increasingly being deployed in cloud-native virtualized and multi-ten...
Many traditional high-performance computing applications including those that follow the Bulk Synchronous Parallel (BSP) communication paradigm are increasingly being deployed in cloud-native virtualized and multi-tenant container clusters. However, such a shared, virtualized platform limits the degree of control that BSP applications can have in effectively allocating resources. This can adversely impact their performance, particularly when stragglers manifest in individual BSP supersteps. Existing BSP resource management solutions assume the same execution time for individual tasks at every superstep, which is not always the case. To address these limitations, we present a dynamic resource management middleware for cloud-native BSP applications comprising a heuristics algorithm that determines effective resource configurations across multiple supersteps while considering dynamic workloads per superstep, and trading off performance improvements with reconfiguration costs. Moreover, we design dynamicprogramming and reinforcementlearning approaches that can be used as pluggable strategies to determine whether and when to enforce a reconfiguration. Empirical evaluations of our solution show between 10% and 25% improvement in performance over a baseline static approach even in the presence of reconfiguration penalty.
Applications with autonomous robots play an important role in the industry and in everyday life. Among them, the activities of manipulating and moving objects are highlighted by the wide variety of possible applicatio...
详细信息
ISBN:
(数字)9781665462808
ISBN:
(纸本)9781665462815
Applications with autonomous robots play an important role in the industry and in everyday life. Among them, the activities of manipulating and moving objects are highlighted by the wide variety of possible applications. These activities in static and known environments can be implemented through logic planned by the developer, but this is not feasible in dynamic environments. Machine learning (ML) techniques such as reinforcementlearning (RL) algorithms have sought to replace the pre-defined programming by teaching the robot how to act. This paper presents the implementation of two RL algorithms, Deep Deterministic Policy Gradient (DDPG) and Proximal Policy Optimization (PPO), for orientation and position control of a 6-degree-of-freedom (6-DoF) robotic manipulator. The results demonstrated that the DDPG have a faster learning convergence in simpler activities, but if the complexity of the problem increases, it might not obtain a satisfactory behavior. On the other hand, PPO can solve more complex problems but it limits the convergence rate to the best result in order to avoid learning instability.
Frequent charging and discharging of the battery will seriously shorten the battery life, thus increasing the power fluctuation in the distribution network. In this paper, a microgrid energy storage model combining su...
Frequent charging and discharging of the battery will seriously shorten the battery life, thus increasing the power fluctuation in the distribution network. In this paper, a microgrid energy storage model combining superconducting magnetic energy storage (SMES) and battery energy storage technology is proposed. At the same time, the energy storage efficiency and the application scenario of superconducting energy storage are analyzed. In order to optimize the performance of the proposed microgrid energy storage model, reinforcementlearning algorithm is used to solve the optimization strategy, and the feasibility of the energy storage model is verified by simulation analysis. The results show that the hybrid energy storage system is more conducive to the stable operation of power grid.
In the vehicle edge computing network (VECN), how to deal with the computation resources and energy resources shortage problem the roadside units (RSUs) encounter when they are performing delay sensitive computation t...
详细信息
ISBN:
(数字)9798350362244
ISBN:
(纸本)9798350362251
In the vehicle edge computing network (VECN), how to deal with the computation resources and energy resources shortage problem the roadside units (RSUs) encounter when they are performing delay sensitive computation tasks is an important issue, especially during the peak hours and the situation of VECN is dynamic. To complete the computation tasks on time with the minimum expenditure, in this paper, we investigate the problem of information-energy collaboration among RSUs, where the spectrum management is also involved. For the considered scenario, the RSUs’ strategies of spectrum selection, computation task offloading and energy sharing are derived from the formulated optimization problem. Since the proposed problem is a highly complex mixed-integer nonlinear programming problem and the strategies are coupled with each other, a multi-agent deep deterministic policy gradient (MADDPG) based algorithm is proposed to find the sub-optimal solutions quickly in a dynamic environment. The simulation results show that our approach is superior to the existing schemes in terms of total system expenditure and the spectral efficiency.
暂无评论