In contemporary Multi-Agent Reinforcement Learning (MARL), effectively enhancing the expressive capacity of value functions has been a persistent research focus. Many studies have employed value decomposition methods;...
详细信息
Event-triggered control has attracted considerable attention for its effectiveness in resource-restricted applications. To make event-triggered control as an end-to-end solution, a key issue is how to effectively lear...
详细信息
This paper considers distributed online convex optimization with time-varying constraints. In this setting, a network of agents makes decisions at each round, and then only a portion of the loss function and a coordin...
详细信息
In this paper, we propose a hybrid algorithm that combines an improved Artificial Potential Field (APF) method with the Simulated Annealing (SA) algorithm for path planning of an electric power operation robot manipul...
详细信息
The problem of generating optimal paths for curvature-constrained unmanned aerial vehicles (UAVs) performing surveillance of multiple ground targets is addressed in this paper. UAVs are modeled as Dubins vehicles so...
详细信息
The problem of generating optimal paths for curvature-constrained unmanned aerial vehicles (UAVs) performing surveillance of multiple ground targets is addressed in this paper. UAVs are modeled as Dubins vehicles so that the constraints of UAVs' minimal turning radius can be taken into account. In view of the effective surveillance range of the sensors equipped on UAVs, the problem is formulated as a Dubins traveling salesman problem with neighborhood (DTSPN). Considering its prohibitively high computational complexity, the Dubins paths in the sense of terminal heading relaxation are introduced to simplify the calculation of the Dubins distance, and a boundary-based encoding scheme is proposed to determine the visiting point of every target neighborhood. Then, an evolutionary algorithm is used to derive the optimal Dubins tour. To further enhance the quality of the solutions, a local search strategy based on approximate gradient is employed to improve the visiting points of target neighborhoods. Finally, by a minor modification to the individual encoding, the algorithm is easily extended to deal with other two more sophisticated DTSPN variants (multi-UAV scenario and multiple groups of targets scenario). The performance of the algorithm is demonstrated through comparative experiments with other two state-of-the-art DTSPN algorithms identified in literature. Numerical simulations exhibit that the algorithm proposed in this paper can find high-quality solutions to the DTSPN with lower computational cost and produce significantly improved performance over the other algorithms.
Many real-world optimization problems involve multiple conflicting objectives. Such problems are called multiobjective optimization problems(MOPs). Typically, MOPs have a set of so-called Pareto optimal solutions rath...
详细信息
Many real-world optimization problems involve multiple conflicting objectives. Such problems are called multiobjective optimization problems(MOPs). Typically, MOPs have a set of so-called Pareto optimal solutions rather than one unique optimal solution. To assist the decision maker(DM) in finding his/her most preferred solution, we propose an interactive multiobjective evolutionary algorithm(MOEA)called iDMOEA-εC, which utilizes the DM's preferences to compress the objective space directly and progressively for identifying the DM's preferred region. The proposed algorithm employs a state-of-the-art decomposition-based MOEA called DMOEA-εC as the search engine to search for solutions. DMOEA-εC decomposes an MOP into a series of scalar constrained subproblems using a set of evenly distributed upper bound vectors to approximate the entire Pareto front. To guide the population toward only the DM's preferred part on the Pareto front, an adaptive adjustment mechanism of the upper bound vectors and two-level feasibility rules are proposed and integrated into DMOEA-εC to control the spread of the population. To ease the DM's burden, only a small set of representative solutions is presented in each interaction to the DM,who is expected to specify a preferred one from the set. Furthermore, the proposed algorithm includes a two-stage selection procedure, allowing to elicit the DM's preferences as accurately as possible. To evaluate the performance of the proposed algorithm, it was compared with other interactive MOEAs in a series of experiments. The experimental results demonstrated the superiority of iDMOEA-εC over its competitors.
A model-based offline policy iteration(PI) algorithm and a model-free online Q-learning algorithm are proposed for solving fully cooperative linear quadratic dynamic games. The PI-based adaptive Q-learning method can ...
详细信息
A model-based offline policy iteration(PI) algorithm and a model-free online Q-learning algorithm are proposed for solving fully cooperative linear quadratic dynamic games. The PI-based adaptive Q-learning method can learn the feedback Nash equilibrium online using the state samples generated by behavior policies, without sending inquiries to the system model. Unlike the existing Q-learning methods, this novel Q-learning algorithm executes both policy evaluation and policy improvement in an adaptive *** prove the convergence of the offline PI algorithm by proving its equivalence to Newton's method while solving the game algebraic Riccati equation(GARE). Furthermore, we prove that the proposed Q-learning method will converge to the Nash equilibrium under a small learning rate if the method satisfies certain persistence of excitation conditions, which can be easily met by suitable behavior policies. Our simulation results demonstrate the good performance of the proposed online adaptive Q-learning algorithm.
作者:
Dai, JunLi, XinbinHan, SongYu, JunzhiLiu, ZhixinYanshan University
Key Lab of Industrial Computer Control Engineering of Hebei Province Qinhuangdao066004 China Yanshan University
Key Laboratory of Intelligent Rehabilitation and Neuromodulation of Hebei Province Qinhuangdao066004 China Peking University
State Key Laboratory for Turbulence and Complex Systems Department of Advanced Manufacturing and Robotics BIC-ESAT College of Engineering Beijing100871 China
This paper investigates the cooperative link configuration problem for Autonomous Underwater Vehicle (AUV) in Underwater Acoustic (UWA) sensor networks with Energy Harvesting (EH), which aims to maximize long-term cum...
详细信息
A backstepping method based adaptive robust dead-zone compensation controller is pro- posed for the electro-hydraulic servo systems (EHSSs) with unknown dead-zone and uncertain system parameters. Variable load is se...
详细信息
A backstepping method based adaptive robust dead-zone compensation controller is pro- posed for the electro-hydraulic servo systems (EHSSs) with unknown dead-zone and uncertain system parameters. Variable load is seen as a sum of a constant and a variable part. The constant part is regarded as a parameter of the system to be estimated real time. The variable part together with the friction are seen as disturbance so that a robust term in the controller can be adopted to reject them. Compared with the traditional dead-zone compensation method, a dead-zone compensator is incor- porated in the EH$S without constructing a dead-zone inverse. Combining backstepping method, an adaptive robust controller (ARC) with dead-zone compensation is formed. An easy-to-use ARC tuning method is also proposed after a further analysis of the ARC structure. Simulations show that the proposed method has a splendid tracking performance, all the uncertain parameters can be estimated, and the disturbance has been rejected while the dead-zone term is well estimated and compensated.
This paper is concerned with the optimal fusion of sensors with cross-correlated sensor *** taking linear transformations to the measurements and the related parameters, new measurement models are established, where t...
详细信息
This paper is concerned with the optimal fusion of sensors with cross-correlated sensor *** taking linear transformations to the measurements and the related parameters, new measurement models are established, where the sensor noises are decoupled. The centralized fusion with raw data, the centralized fusion with transformed data, and a distributed fusion estimation algorithm are introduced, which are shown to be equivalent to each other in estimation precision, and therefore are globally optimal in the sense of linear minimum mean square error(LMMSE). It is shown that the centralized fusion with transformed data needs lower communication requirements compared to the centralized fusion using raw data directly, and the distributed fusion algorithm has the best flexibility and robustness and proper communication requirements and computation complexity among the three algorithms(less communication and computation complexity compared to the existed distributed Kalman filtering fusion algorithms). An example is shown to illustrate the effectiveness of the proposed algorithms.
暂无评论