To improve the search performance of the multi-objective differential evolution algorithm, we use a reinforcement learning agent to control the dynamic mutationprocess. First, a novel multi-objective optimization fra...
详细信息
To improve the search performance of the multi-objective differential evolution algorithm, we use a reinforcement learning agent to control the dynamic mutationprocess. First, a novel multi-objective optimization framework with new mutation operators is developed. Then, a Q-learning algorithm is introduced to select mutation operators with different characteristics adaptively. Specifically, we design three feature functions to describe the state of the population observed by the agent. Based on the observed state, the agent dynamically selects the mutation operator in each generation, which strongly improved the searching ability. We compare the performance of the proposed method with two state-of-art algorithms on benchmark functions and find that the proposed algorithm has a better performance in multiple indicators. At the same time, the component validity analysis also shows the effectiveness of the Q-learning and the framework introduced in this paper. Finally, the learning process shows that the agent can achieve asymptotic convergence. Copyright (c) 2022 The Authors. This is an open access article under the CC BY-NC-ND license (https://***/licenses/by-nc-nd/4.0/)
To improve the search performance of the multi-objective differential evolution algorithm, we use a reinforcement learning agent to control the dynamic mutationprocess. First, a novel multi-objective optimization fra...
详细信息
To improve the search performance of the multi-objective differential evolution algorithm, we use a reinforcement learning agent to control the dynamic mutationprocess. First, a novel multi-objective optimization framework with new mutation operators is developed. Then, a Q-learning algorithm is introduced to select mutation operators with different characteristics adaptively. Specifically, we design three feature functions to describe the state of the population observed by the agent. Based on the observed state, the agent dynamically selects the mutation operator in each generation, which strongly improved the searching ability. We compare the performance of the proposed method with two state-of-art algorithms on benchmark functions and find that the proposed algorithm has a better performance in multiple indicators. At the same time, the component validity analysis also shows the effectiveness of the Q-learning and the framework introduced in this paper. Finally, the learning process shows that the agent can achieve asymptotic convergence.
暂无评论