作者:
Wang, Fei-YueChinese Acad Sci
State Key Lab Management & Control Complex Syst Inst Automat Beijing 100190 Peoples R China
Welcome to the new issue of the IEEE Transactions on Computational Social systems (TCSS). I am grateful to report that, as of April 9, 2020,the Citescore of TCSS has reached to 5.26, a new high. Many thanks to all of ...
Welcome to the new issue of the IEEE Transactions on Computational Social systems (TCSS). I am grateful to report that, as of April 9, 2020,the Citescore of TCSS has reached to 5.26, a new high. Many thanks to all of you for your great effort and support.
Reinforcement learning offers a promising way for self-learning control of an unknown system, but it involves the issues of policy evaluation and exploration, especially in the domain of continuous state. In this stud...
详细信息
Reinforcement learning offers a promising way for self-learning control of an unknown system, but it involves the issues of policy evaluation and exploration, especially in the domain of continuous state. In this study, these issues are addressed from the perspective of probability. It models the action value function as the latent variable of Gaussian process, while the reward as the observed variable. Then an online approach is proposed to update the action value function by Bayesian inference. Taking an advantage of the proposed framework, a prior knowledge can be incorporated into the action value function, and thus an efficient exploration strategy is presented. At last, the Bayesian-state-action-reward-state-action algorithm is tested on some benchmark problems and empirical results show its effectiveness.
A novel supervised Actor-Critic (SAC) approach for adaptive cruise control (ACC) problem is proposed in this paper. The key elements required by the SAC algorithm namely Actor and Critic, are approximated by feed-forw...
详细信息
A novel supervised Actor-Critic (SAC) approach for adaptive cruise control (ACC) problem is proposed in this paper. The key elements required by the SAC algorithm namely Actor and Critic, are approximated by feed-forward neural networks respectively. The output of Actor and the state are input to Critic to approximate the performance index function. A Lyapunov stability analysis approach has been presented to prove the uniformly ultimate bounded property of the estimation errors of the neural networks. Moreover, we use the supervisory controller to pre-train Actor to achieve a basic control policy, which can improve the training convergence and success rate. We apply this method to learn an approximate optimal control policy for the ACC problem. Experimental results in several driving scenarios demonstrate that the SAC algorithm performs well, so it is feasible and effective for the ACC problem.
In this paper, we establish a neural-network-based decentralized control law to stabilize a class of continuous-time nonlinear interconnected large-scale systems using an online model-free integral policy iteration (P...
详细信息
In this paper, we establish a neural-network-based decentralized control law to stabilize a class of continuous-time nonlinear interconnected large-scale systems using an online model-free integral policy iteration (PI) algorithm. The model-free PI approach can solve the decentralized control problem for the interconnected system which has unknown dynamics. The stabilizing decentralized control law is derived based on the optimal control policies of the isolated subsystems. The online model-free integral PI algorithm is developed to solve the optimal control problems for the isolated subsystems with unknown system dynamics. We use the actor-critic technique based on the neural network and the least squares implementation method to obtain the optimal control policies. Two simulation examples are given to verify the applicability of the decentralized control law. (C) 2015 Elsevier B.V. All rights reserved.
作者:
Li, H.Liu, D.Chinese Acad Sci
State Key Lab Management & Control Complex Syst Inst Automat Beijing 100190 Peoples R China
In this study, the authors propose a novel adaptive dynamic programming scheme based on general value iteration (VI) to obtain near optimal control for discrete-time affine non-linear systems with continuous state and...
详细信息
In this study, the authors propose a novel adaptive dynamic programming scheme based on general value iteration (VI) to obtain near optimal control for discrete-time affine non-linear systems with continuous state and control spaces. First, the selection of initial value function is different from the traditional VI, and a new method is introduced to demonstrate the convergence property and convergence speed of value function. Then, the control law obtained at each iteration can stabilise the system under some conditions. At last, an error-bound-based condition is derived considering the approximation errors of neural networks, and then the error between the optimal and approximated value functions can also be estimated. To facilitate the implementation of the iterative scheme, three neural networks with Levenberg-Marquardt training algorithm are used to approximate the unknown system, the value function and the control law. Two simulation examples are presented to demonstrate the effectiveness of the proposed scheme.
Three-dimensional displaytechnologies based on lenticular sheet overlaid onto spatial light modulator screen have been studied for decades. However, the quality of these displays still suffers from insufficient number...
详细信息
Three-dimensional displaytechnologies based on lenticular sheet overlaid onto spatial light modulator screen have been studied for decades. However, the quality of these displays still suffers from insufficient number of views and zone-jumping between views. We present herein a subpixel multiplexing method in this paper. We propose to split mapping and alignment into two separate tasks, processed in parallel threads. Alignment thread deals with the task of computing the geometrical relationship between lenticular sheet and Liquid Crystal Display (LCD) panel for multiplexing. Afterwards, we conduct the multiplexing procedure through a box-constrained integer least squares algorithm. After multiplexing, each subpixel aggregated on the lenticular sheet is a multiplexing one that mixes up a number of subpixels in local region on the LCD plane. As a result, we multiplex subpixels on the synthetic image up to 27 views with a resolution of 1080 x 1920 and the rendering speed is 73.34 frames per second (fps).
This paper is devoted to the mechatronic design and implementation of a dolphin-like robot capable of fast swimming. In the context of multiple coordinated control surfaces, a set of serially connected flapping module...
详细信息
This paper is devoted to the mechatronic design and implementation of a dolphin-like robot capable of fast swimming. In the context of multiple coordinated control surfaces, a set of serially connected flapping modules is responsible for dorsoventral oscillations, an internal moving slider for pitch control and a yaw joint for lateral turns. To improve the swimming speed, an updated modular slider-crank-based flapping mechanism that fully capitalizes on the continuous high-speed rotation of the DC motor is proposed and constructed. With the proposed mechanisms, the resulting dolphin robot achieved a high level of propulsive speed, largely illustrating the validity of the present design scheme.
Residential energy scheduling is a hot topic nowadays in the background of energy saving and environmental protection worldwide. To achieve this objective, a new residential energy scheduling algorithm is developed fo...
详细信息
Residential energy scheduling is a hot topic nowadays in the background of energy saving and environmental protection worldwide. To achieve this objective, a new residential energy scheduling algorithm is developed for energy management, based on action dependent heuristic dynamic programming. The algorithm works under the circumstance of residential real-time pricing and two adjacent housing units with energy inter-exchange, which can reduce the overall cost and enhance renewable energy efficiency after long-term operation. It is designed to obtain the optimal control policy to manage the directions and amounts of electricity energy flux. The algorithm's architecture is mainly constructed based on neural networks, denoting the learned characteristics in the linkage of layers. To get close to real situations, many constraints such as maximum charging/discharging power of batteries are taken into account. The absent energy penalty cost is developed for the first time as a part of the performance index function. When the environment changes, the residential energy scheduling algorithm gains new features and keeps adapting in real-time operations. Simulation results show that the developed algorithm is beneficial to energy conversation. (C) 2015 Elsevier Ltd. All rights reserved.
In this paper, a novel strategy is established to design the robust controller for a class of continuous-time nonlinear systems with uncertainties based on the online policy iteration algorithm. The robust control pro...
详细信息
In this paper, a novel strategy is established to design the robust controller for a class of continuous-time nonlinear systems with uncertainties based on the online policy iteration algorithm. The robust control problem is transformed into the optimal control problem by properly choosing a cost function that reflects the uncertainties, regulation, and control. An online policy iteration algorithm is presented to solve the Hamilton-Jacobi-Bellman (HJB) equation by constructing a critic neural network. The approximate expression of the optimal control policy can be derived directly. The closed-loop system is proved to possess the uniform ultimate boundedness. The equivalence of the neural-network-based HJB solution of the optimal control problem and the solution of the robust control problem is established as well. Two simulation examples are provided to verify the effectiveness of the present robust control scheme.
暂无评论