This article investigates learning predictive control framework for multiagent systems with unknown dynamics. Predictive control, which generates temporal control inputs, provides potential applications in observation...
详细信息
This article investigates learning predictive control framework for multiagent systems with unknown dynamics. Predictive control, which generates temporal control inputs, provides potential applications in observation loss scenarios. First, control causality is inversely extracted from time series analysis, and predictive control is characterized as sequential feedback control. Next, we focus on distributed communication and time consistency under the causality. Distributed predictive control is equivalently partitioned into spatial and temporal subgames, respectively. Spatial subgames achieve equivalence between global and local objectives through Nash equilibrium, while temporal ones force local control causality to achieve stability and optimality with time consistency. Furthermore, a multistep reinforcement learning algorithm is proposed for data-driven implementation, and dynamics knowledge is avoided with interactive data. The learning properties are discussed with theoretical proof and parameters selection. Finally, we pose numerical results to demonstrate effectiveness, and robotics experiments are also carried out to show potential advantage under observation loss scenarios.
In today's industrial processes, data-driven soft sensors are a frequently used tool for predicting quality variables. Autoencoder (AE) is an unsupervised algorithm which can extract latent features from initial d...
详细信息
ISBN:
(纸本)9798350321050
In today's industrial processes, data-driven soft sensors are a frequently used tool for predicting quality variables. Autoencoder (AE) is an unsupervised algorithm which can extract latent features from initial data. However, during the feature extraction process, the traditional autoencoder does not consider the correlation between modeling input variables and quality variables to be predicted. To solve this issue, a novel autoencoder based on variable correlation analysis (VCA-AE) is proposed. In VCA-AE, the correlation of modeling input variables and quality variables to be predicted is performed by correlation analysis, and input variables are divided into two parts, which are input to the sub-autoencoder to extract latent features, respectively. In each sub-autoencoder, input variables and quality variables have the same correlation. Next, a feedforward neural network Extreme learning Machine (ELM) is used to develop soft sensor model based on the extracted latent feature variables and quality variables. Finally, the effectiveness of the proposed soft sensor model combining VCA-AE and ELM is illustrated by an experiment of the industrial PTA process.
This paper applies a data-driven adaptive control method to a benchmark of vapor compression refrigeration system based on the controller dynamic linearization method. Two controllers, namely, a single-input single-ou...
详细信息
This letter presents a direct minimum-variance (MV) data-driven safe control design approach for uncertain linear discrete-time stochastic systems. The superiority of the direct MV approach is shown by developing and ...
详细信息
This letter presents a direct minimum-variance (MV) data-driven safe control design approach for uncertain linear discrete-time stochastic systems. The superiority of the direct MV approach is shown by developing and comparing direct versus indirect learning approaches and MV versus certainty-equivalence (CE) approaches. First, it is shown that probabilistic safety can be guaranteed by ensuring probabilistic ? -contractivity of the safe set. Four data-based convex optimization-based algorithms are introduced to ensure the probabilistic ? -contractivity of the safe set (i.e., direct and indirect CE, direct and indirect MV). It is shown that while the CE approach results in a risk-neutral control design method with no robustness guarantees, the MV approach results in a risk-averse control design with probabilistic safety guarantees. This is because MV approach aims at learning a control gain that minimizes the variance of the state of the closed-loop system with respect to the safe set, and thus minimizes the risk of safety violation. Besides, it is shown that the direct learning approach requires weaker data richness conditions (i.e., lower sample complexity) than the indirect learning approach. Two simulation examples are provided to verify that the direct MV learning approach outperforms the other three approaches since it leads to low-complexity (i.e., low sample complexity and convex optimization) safe learning with a high probability of safety guarantees.
This letter presents a norm optimal-gain-arguable iterative learningcontrol (NOGAILC) scheme for accurately tracking the trajectory of a single-link flexible robotic manipulator. The proposed approach is based on the...
详细信息
Neural ordinary differential equations (Neural ODE) interprets deep networks as discretization of dynamical systems, and has shown great promise in the physical science, modeling irregular time series, and mean field ...
详细信息
ISBN:
(纸本)9798350321050
Neural ordinary differential equations (Neural ODE) interprets deep networks as discretization of dynamical systems, and has shown great promise in the physical science, modeling irregular time series, and mean field games. The Neural ODE comsumes a long time training process, which is arguably one of the main stumbling blocks towards their widespread adoption. To improve the convergence speed of training, in this parper, we formulate the training task as a separable nonlinear optimization problem, and propose a separable training algorithm based on a nonmonotone trust-region method. The proposed algorithm uses the variable projection strategy to reduce the dimension of variables by solving a subproblem and then the trust-region method is used to optimize the reduced function. To accelerate the convergence speed, we introduce the nonmonotone strategy to make the update of trust-region radius elastic and employ the adaptive technology that uses the gradient information of the objective function to update the radius. Numerical results confirm the effectiveness of the proposed algorithm.
This paper proposed a method of online non-parameter identification of nonlinear ship motion systems. Firstly, we use Manner to generate a certain amount of ship motion data to train the LWPR model. Then the ship trav...
详细信息
ISBN:
(纸本)9798350321050
This paper proposed a method of online non-parameter identification of nonlinear ship motion systems. Firstly, we use Manner to generate a certain amount of ship motion data to train the LWPR model. Then the ship travels along a set track. During this process, the sensors continuously obtain the distance, radial velocity and azimuth of the ship relative to the ship, and then completes the construction of simulation data. Next, the performance of the algorithm is verified which uses the Kalman filtering framework. Finally, the estimated value is further used for updating the LWPR model to achieve the purpose of online learning, and the updated model will be used for the next prediction. The experimental results show that the online modeling and tracking method proposed in this paper has higher tracking accuracy than the parameter estimation techniques.
An improved long short-term memory (LSTM) model based on ensemble empirical mode decomposition (EEMD) is designed for short-term passenger flow prediction in view of the complex dynamics, uncertainty and prediction di...
详细信息
ISBN:
(纸本)9798350321050
An improved long short-term memory (LSTM) model based on ensemble empirical mode decomposition (EEMD) is designed for short-term passenger flow prediction in view of the complex dynamics, uncertainty and prediction difficulty of subway inbound passenger flow. First, the raw data is decomposed into several stationary components and a residue by EEMD method. Then, a combination of high-correlation components and a combination of low-correlation components obtained by calculating Pearson Correlation Coefficient between each component and the raw data are combined with date feature to form the input set of LSTM neural network. And the predicted passenger flow data is the output set. Finally, compared with the single LSTM model, the trained EEMD-LSTM model is better according to the metrics, and the absolute error of the EEMD-LSTM model is significantly lower during the peak passenger flows. The experimental results of Tiantongyuan Station of Beijing Metro Line 5 show that the improved model can effectively improve the prediction accuracy, which is conducive to the dynamic adjustment of station management plan.
With the wide application of multi-intelligent reinforcement learning (MARL), its development becomes more and more mature. Multi-agent Proximal Policy Optimization (MAPPO) extended by Proximal Policy Optimization (PP...
详细信息
ISBN:
(纸本)9798350321050
With the wide application of multi-intelligent reinforcement learning (MARL), its development becomes more and more mature. Multi-agent Proximal Policy Optimization (MAPPO) extended by Proximal Policy Optimization (PPO) algorithm has attracted the attention of researchers with its superior performance. However, the increase in the number of agents in multi-agent cooperation tasks leads to overfitting problems and suboptimal policies due to the fixed clip range that limits the step size of updates. In this paper, MAPPO via Non-fixed Value Clipping (NVC-MAPPO) algorithm is proposed based on MAPPO, and Gaussian noise is introduced in the value function and the clipping function, respectively, and rewriting the clipping function into a form called non-fixed value clipping function. In the end, experiments are conducted on StarCraftII Multi-Agent Challenge (SMAC) to verify that the algorithm can effectively prevent the step size from changing too much while enhancing the exploration ability of the agents, which has improved the performance compared with MAPPO.
For nonlinear leader-following stochastic multi-agent system, a tube-based distributed model predictive control (MPC) algorithm of followers is proposed in this paper to achieve containment control. Considering that n...
详细信息
ISBN:
(纸本)9798350321050
For nonlinear leader-following stochastic multi-agent system, a tube-based distributed model predictive control (MPC) algorithm of followers is proposed in this paper to achieve containment control. Considering that nonlinear may greatly increase the complexity of the optimization problems, some scholars pay attention to linearization method, in which the model error may lead bad system performance. In order to compensate the model error between nonlinear system and linear system, additive state decomposition is utilized to separate the original nonlinear system into two parts: primary linear system and secondary nonlinear system. Based on pole assignment, the feedback control algorithm for the secondary nonlinear system is proposed, ensuring primary state convergence. To ensure the satisfaction of system constraints, the tightened constraints are constructed involving the disturbance and linearity error. Based on the constraints, the distributed tube-based MPC optimization problem is established for the primary system. Then in the combined action of the two controllers, all followers are steered to the convex hull of leaders under interference. Finally, the effectiveness of the proposed method is verified for the nonlinear system by numerical simulation.
暂无评论