This article proposes a data-drivencontrol framework to regulate an unknown stochastic linear dynamical system to the solution of a stochastic convex optimization problem. Despite the centrality of this problem, most...
详细信息
This article proposes a data-drivencontrol framework to regulate an unknown stochastic linear dynamical system to the solution of a stochastic convex optimization problem. Despite the centrality of this problem, most of the available methods critically rely on a precise knowledge of the system dynamics, thus requiring offline system identification. To solve the control problem, we first show that the steady-state gain of the transfer function of a linear system can be computed directly from historical data generated by the open-loop system, thus overcoming the need to first identify the full system dynamics. We leverage this data-driven representation of the steady-state gain to design a controller, which is inspired by stochastic gradient descent methods, to regulate the system to the solution of the prescribed optimization problem. A distinguishing feature of our method is that it does not require any knowledge of the system dynamics or of the possibly time-varying disturbances affecting them (or their distributions). Our technical analysis combines concepts from behavioral system theory, stochastic optimization with decision-dependent distributions, and Lyapunov stability. We illustrate the applicability of the framework in a case study for mobility-on-demand ride service scheduling in Manhattan.
The trajectory tracking problem for nonlinearly parameterized uncertain systems with unknown input deadzone is investigated in this work. Lyapunov-based ILC strategy is adopted for controller design. Unlike traditiona...
详细信息
ISBN:
(纸本)9798350321050
The trajectory tracking problem for nonlinearly parameterized uncertain systems with unknown input deadzone is investigated in this work. Lyapunov-based ILC strategy is adopted for controller design. Unlike traditional ILC methods, in this paper, the initial condition for each iteration allows to be any bounded value. By using the parameter separation technique combined with the signal replacement mechanism, a Lyapunov functional is constructed to design the ILC law and learning laws. As the iteration cycle increases, the system error can follow the predetermined desired trajectory over the entire interval, and all the signals in close-loop system are guaranteed to be bounded. Effectiveness of the proposed method is verified by theoretical results and numerical results.
In this article, we focus on the discrete-time stochastic linear quadratic problem under the presence of process and observation noise, particularly within the framework of average cost setting, exploring the optimal ...
详细信息
ISBN:
(纸本)9798331540845;9789887581598
In this article, we focus on the discrete-time stochastic linear quadratic problem under the presence of process and observation noise, particularly within the framework of average cost setting, exploring the optimal policy based on output feedback mechanisms. This paper introduces a data-driven inverse reinforcement learning algorithm designed to reconstruct an unknown cost function and learn a near-optimal control policy solely based on observed optimal behavior trajectories (input-output pairs) in scenarios where the cost function is unknown. Initially, we present a model-based inverse reinforcement learning approach under the premise of known model parameters, followed by a proof of theoretical equivalence between this method and our proposed data-driven approach. This equivalence not only validates the theoretical soundness of the proposed data-driven method but also ensures the convergence of the algorithm through theoretical analysis. Ultimately, through carefully designed numerical simulation experiments, we demonstrate the effectiveness of the proposed algorithm, confirming its ability to successfully reconstruct the cost function and learn an effective policy based on demonstration trajectories under unknown cost function conditions.
A visual servo algorithm based on Siamese Convolution Neural Network is proposed for the manipulator to avoid the requirement of feature extraction and feature matching in the traditional image-based visual servo (IBV...
详细信息
ISBN:
(纸本)9798350321050
A visual servo algorithm based on Siamese Convolution Neural Network is proposed for the manipulator to avoid the requirement of feature extraction and feature matching in the traditional image-based visual servo (IBVS). The algorithm feeds the current image and the desired image into the network at the same time, and outputs the relative pose difference between the two images. A closed-loop control system is constructed through the pose difference, and control the end-effector of the manipulator to reach the desired position to grasp the target workpiece. Meanwhile, in order to meet the large amount of data needed in training the neural network, an algorithm to automatically generate the data set is proposed, which can avoid manual collection and labeling of the data set and greatly save the cost. The simulations show the effectiveness and accuracy of the proposed method by comparing with the traditional feature point based IBVS, and the grasping experiment shows the feasibility of the proposed method in actual practice.
In this paper, the robust convergence problem of iterative learningcontrol (ILC) is investigated for two-dimensional (2-D) discrete systems with iteration-varying boundary states and errors in the frequency domain. A...
详细信息
ISBN:
(纸本)9798350321050
In this paper, the robust convergence problem of iterative learningcontrol (ILC) is investigated for two-dimensional (2-D) discrete systems with iteration-varying boundary states and errors in the frequency domain. A classical P-type ILC law is designed. By using 2-D Z-transform analysis, a sufficient condition of the ILC law can be obtained. By the rigorous mathematical proof, the ultimate ILC tracking error can converge to a bounded region, which is dependent on the upper bound of boundary states/errors. In particular, when all the boundary states and errors are zero, the practical tracking output can precisely track a 2-D desired trajectory.
This work studies the consensus learningcontrol problem of nonlinear multi-agent systems (MASs) under false data injection (FDI) attacks. Applying the attacked output and consensus error information to the control la...
详细信息
This paper proposes a cascaded generalized extended state observer-based control (CGESOBC) implementation scheme for a class of nonlinear servo systems with nonintegral-chain form and multiple matched and mismatched d...
详细信息
ISBN:
(纸本)9798350321050
This paper proposes a cascaded generalized extended state observer-based control (CGESOBC) implementation scheme for a class of nonlinear servo systems with nonintegral-chain form and multiple matched and mismatched disturbances. In this approach, the total disturbances in each channel are reconstructed by designing a GESO. A reference model is developed with the estimated disturbances and the reference input, together with a state tracking error model containing the multiple residual disturbances. Another GESO is then devised to estimate the primary estimation errors, based on which a state feedback control law incorporating a dynamic compensator is formulated for robust stabilization of the state tracking error system. Moreover, the Lyapunov stability theory is applied to prove the bounded stability of the closed-loop system. Finally, the efficacy of the proposed control method is verified by a numerical example.
In this paper, a reinforcement learning environment based on a polynomial nonlinear model of an electro-hydraulic servo system is established, and an optimized state space sparse reward function is designed to improve...
详细信息
ISBN:
(纸本)9798350321050
In this paper, a reinforcement learning environment based on a polynomial nonlinear model of an electro-hydraulic servo system is established, and an optimized state space sparse reward function is designed to improve the exploration ability of the SAC algorithm under sparse rewards using random network distillation (RND). The control performance of the designed optimized SAC deep reinforcement learningcontroller is verified through the semi-physical simulation experiment platform, and the time-varying signal is designed according to different tasks to test the dynamic control performance of the controller under complex tasks. The experimental results prove that the optimized SAC deep reinforcement learningcontroller proposed in this paper has good control performance and strong robustness.
This article dedicates to investigating a methodology for enhancing adaptability to environmental changes of reinforcement learning (RL) techniques with data efficiency, by which a joint control protocol is learned us...
详细信息
This article dedicates to investigating a methodology for enhancing adaptability to environmental changes of reinforcement learning (RL) techniques with data efficiency, by which a joint control protocol is learned using only data for multiagent systems (MASs). Thus, all followers are able to synchronize themselves with the leader and minimize their individual performance. To this end, an optimal synchronization problem of heterogeneous MASs is first formulated, and then an arbitration RL mechanism is developed for well addressing key challenges faced by the current RL techniques, that is, insufficient data and environmental changes. In the developed mechanism, an improved Q-function with an arbitration factor is designed for accommodating the fact that control protocols tend to be made by historic experiences and instinctive decision-making, such that the degree of control over agents' behaviors can be adaptively allocated by on-policy and off-policy RL techniques for the optimal multiagent synchronization problem. Finally, an arbitration RL algorithm with critic-only neural networks is proposed, and theoretical analysis and proofs of synchronization and performance optimality are provided. Simulation results verify the effectiveness of the proposed method.
The article investigates the fault estimation problem for a class of stochastic time-varying systems based on iterative learning methods. Unlike most traditional works, the system investigated in this paper is affecte...
详细信息
暂无评论