In this paper, adaptivedynamicprogramming (ADP) algorithm, with lifting technology, is develop to solve the multi-rate optimal control problem for discrete-time linear systems. We make use of the lifting technology ...
详细信息
ISBN:
(纸本)9781665478960
In this paper, adaptivedynamicprogramming (ADP) algorithm, with lifting technology, is develop to solve the multi-rate optimal control problem for discrete-time linear systems. We make use of the lifting technology to convert the multi-rate sample control problem to the single-rate one in the uniform cycle. The propose a Q-Leaming based approach to learn the optimal regulator by a value iteration (VI) algorithm. First, a class of continuous-time (CT) linear system with multi-timescale is considered. Then, the convergence of a Q-learning based algorithm is given. It is proven that the iterative cost function precisely converges to the optimal value, and the control input also converges to the optimal values. Finally, HIL system for grinding process is given to illustrate the effective performance of the proposed method.
Traditional graph neural networks (GNNs) construct static graph structures, which are unable to dynamically adapt to market changes. To address this challenge, this paper proposes an integrated prediction model combin...
详细信息
ISBN:
(数字)9798350363203
ISBN:
(纸本)9798350363210
Traditional graph neural networks (GNNs) construct static graph structures, which are unable to dynamically adapt to market changes. To address this challenge, this paper proposes an integrated prediction model combining GNNs and reinforcementlearning (RL), enabling dynamic adjustment of graph structures to enhance the model's adaptability and predictive ability in volatile markets. Firstly, we use GNNs to extract complex inter-stock relationships from historical stock data, constructing a dynamic market structure. Secondly, through RL strategies within the RL model, we continuously adjust trading decisions to maximize long-term returns and adapt to market changes. This combined model not only captures market trading information but also responds promptly to market dynamics, achieving more accurate predictions. Experimental results on multiple datasets of Taiwan 50 constituent stocks demonstrate that our proposed GNN-RL model outperforms traditional methods in terms of accuracy and other metrics, proving its potential and superiority in practical applications.
This paper investigated the speed change response of the scheduled Q-learningadaptive control of Switched Reluctance Motor (SRM) drives. This novel algorithm includes a scheduling approach to permit controlling the n...
详细信息
ISBN:
(纸本)9781728181929
This paper investigated the speed change response of the scheduled Q-learningadaptive control of Switched Reluctance Motor (SRM) drives. This novel algorithm includes a scheduling approach to permit controlling the nonlinear domain of an SRM using a set of Q-learning cores, each of which is a Q-learning controller at a local linear operating point, which expands over the nonlinear surface of the system. Despite the effective tracking performance of this algorithm, the main issue with the use of this controller for SRM application is that motor speed appears inside the model of the machine and hence the Q-cores are directly impacted by the speed. To cope with this issue, the Q-table should retrain the Q-matrices whenever the rotational speed changes. This causes a slow speed change response due to learning process. In this paper, a new 3D Simulation and experimental results have illustrated the speed change response of SRM at different stages of the operation condition.
This study presents an adaptive railway traffic controller for real-time operations based on approximate dynamicprogramming (ADP). By assessing requirements and opportunities, the controller aims to limit consecutive...
详细信息
This study presents an adaptive railway traffic controller for real-time operations based on approximate dynamicprogramming (ADP). By assessing requirements and opportunities, the controller aims to limit consecutive delays resulting from trains that entered a control area behind schedule by sequencing them at a critical location in a timely manner, thus representing the practical requirements of railway operations. This approach depends on an approximation to the value function of dynamicprogramming after optimisation from a specified state, which is estimated dynamically from operational experience using reinforcementlearning techniques. By using this approximation, the ADP avoids extensive explicit evaluation of performance and so reduces the computational burden substantially. In this investigation, we explore formulations of the approximation function and variants of the learning techniques used to estimate it. Evaluation of the ADP methods in a stochastic simulation environment shows considerable improvements in consecutive delays by comparison with the current industry practice of First-Come-First-Served sequencing. We also found that estimates of parameters of the approximate value function are similar across a range of test scenarios with different mean train entry delays.
Multi-access edge computing (MEC) is seen as a vital component of forthcoming 6G wireless networks, aiming to support emerging applications that demand high service reliability and low latency. However, ensuring the u...
详细信息
ISBN:
(数字)9798350354232
ISBN:
(纸本)9798350354249
Multi-access edge computing (MEC) is seen as a vital component of forthcoming 6G wireless networks, aiming to support emerging applications that demand high service reliability and low latency. However, ensuring the ultra-reliable and low-latency performance of MEC networks poses a significant challenge due to uncertainties associated with wireless links, constraints imposed by communication and computing resources, and the dynamic nature of network traffic. Enabling ultra-reliable and low-latency MEC mandates efficient load balancing jointly with resource allocation. In this paper, we investigate the joint optimization problem of offloading decisions, computation and communication resource allocation to minimize the expected weighted sum of delivery latency and energy consumption in a non-orthogonal multiple access (NOMA)-assisted MEC network. Given the formulated problem is a mixed-integer non-linear programming (MINLP), a new multi-agent federated deep reinforcementlearning (FDRL) solution based on double deep Q-network (DDQN) is developed to efficiently optimize the offloading strategies across the MEC network while accelerating the learning process of the Internet-of-Thing (IoT) devices. Simulation results show that the proposed FDRL scheme can effectively reduce the weighted sum of delivery latency and energy consumption of IoT devices in the MEC network and outperform the baseline approaches.
In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcementlearning using dynamic output feedback. The design objective is to lea...
详细信息
In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcementlearning using dynamic output feedback. The design objective is to learn the optimal control parameters by using only the measurable input-output data, without requiring model information. A state parametrization scheme is presented which reconstructs the system state based on the filtered input and output signals. Based on this parametrization, two new output feedback adaptivedynamicprogramming Bellman equations are derived for the LQR problem based on policy iteration and value iteration (VI). Unlike the existing output feedback methods for continuous-time systems, the need to apply discrete approximation is obviated. In contrast with the static output feedback controllers, the proposed method can also handle systems that are state feedback stabilizable but not static output feedback stabilizable. An advantage of this scheme is that it stands immune to the exploration bias issue. Moreover, it does not require a discounted cost function and, thus, ensures the closed-loop stability and the optimality of the solution. Compared with earlier output feedback results, the proposed VI method does not require an initially stabilizing policy. We show that the estimates of the control parameters converge to those obtained by solving the LQR algebraic Riccati equation. A comprehensive simulation study is carried out to verify the proposed algorithms.
In this paper, we solve the optimal output regulation problem for discrete-time systems without precise knowledge of the system model. Drawing inspiration from reinforcementlearning and adaptivedynamicprogramming, ...
In this paper, we solve the optimal output regulation problem for discrete-time systems without precise knowledge of the system model. Drawing inspiration from reinforcementlearning and adaptivedynamicprogramming, a data-driven solution is developed that enables asymptotic tracking and disturbance rejection. Notably, it is discovered that the proposed approach for discrete-time output regulation differs from the continuous-time approach in terms of the persistent excitation condition required for policy iteration to be unique and convergent. To address this issue, a new persistent excitation condition is introduced to ensure both uniqueness and convergence of the data-driven policy iteration. The efficacy of the proposed methodology is validated by an inverted pendulum on a cart example.
As the user count for video-related services continues to grow, ensuring high-quality service (QoS) for them will become even more crucial in the future. Many studies have been conducted to enhance the quality of on-d...
详细信息
ISBN:
(数字)9798350327939
ISBN:
(纸本)9798350327946
As the user count for video-related services continues to grow, ensuring high-quality service (QoS) for them will become even more crucial in the future. Many studies have been conducted to enhance the quality of on-demand video streaming using adaptive bitrate (ABR) algorithms and artificial intelligence (AI). This study addresses a more complex challenge than that of on-demand video streaming: enhancing service quality in multi-party, full-duplex communication scenarios, such as video conferences. We propose a deep reinforcementlearning (DRL)-based video bitrate allocation framework for a media server in the video conferencing system. Our framework aims to increase overall QoS by applying an appropriate bitrate for each connection in a video conferencing call, considering the network conditions for users. We train the DRL model to maximize the aggregate QoS of users in a meeting by constructing a feedback loop between a media server and a DRL server. Our experimental results demonstrate that our framework can adaptively control the video bitrate according to changes in network conditions. As a result, it achieves higher video bitrates in the user application (approximately, 5% under stable network conditions and 35% over the highly dynamic network conditions) compared to the existing rule-based bandwidth allocation.
The grid-connected inverter is a key energy conversion device for grid-connected new energy and is widely used in distributed power generation system. However, the traditional control strategy has many limitations in ...
The grid-connected inverter is a key energy conversion device for grid-connected new energy and is widely used in distributed power generation system. However, the traditional control strategy has many limitations in the aspects of stability, system voltage and frequency adjustment, a large amount of renewable energy connected to the grid may cause power and frequency oscillation that threaten the stable operation of the grid. To solve this problem, an adaptive and optimal control method based on reinforcementlearning and adaptivedynamicprogramming (ADP) is implemented for VSG three-phase grid-connected inverter. The method establishes a mathematical model of a VSG-based grid-connected inverter, and transformed into a standard linear quadratic regulation (LQR) optimization problem. On the basis of VSG power-frequency control, the dynamic compensation term given by the ADP algorithm is introduced into the active power loop. During the grid connection, the VSG output is optimally adjusted through the proposed adaptive optimal control strategy to reduce the system frequency fluctuation. In the case system dynamics are not known in the complex environment of grid-connected inverters, deep deterministic strategy gradient (DDPG) is used to replace the ADP algorithm in this paper. Finally, the effectiveness of the method is verified in the Simulink platform.
We present a blockchain-assisted mobile edge computing architecture for adaptive resource distribution in wireless communication systems, where the blockchain acts as an overhead system that provide command and contro...
详细信息
ISBN:
(数字)9798350394665
ISBN:
(纸本)9798350394672
We present a blockchain-assisted mobile edge computing architecture for adaptive resource distribution in wireless communication systems, where the blockchain acts as an overhead system that provide command and control functionalities. In this context, achieving consensus across nodes while also ensuring the functionality of both MEC and blockchain systems is a big difficulty. Furthermore, resource distribution, frame size, and the number of sequential blocks generated by each contributor are important to Blockchain aided MEC functionality. As a result, a strategy for dynamic resource distribution and block creation is presented. To strengthen the efficiency of the overlapped blockchain system and enhance the quality of services (QoS) of the clients in the technologies to facilitate MEC system, spectrum allocation, frame size, and number of developing blocks for each distributor are framed as a joint optimization method that takes into account time-varying communication channels and MEC server saturation is defined. We use deep reinforcementlearning (RAMBAN) to address this issue because standard approaches are ineffective. The simulation findings demonstrate that the efficacy of the suggested strategy when compared to different baseline approaches.
暂无评论