Existing deep reinforcement learning (DRL) methods for multi-objective vehicle routing problems (MOVRPs) typically decompose an MOVRP into subproblems with respective preferences and then train policies to solve corre...
详细信息
ISBN:
(纸本)9798400704864
Existing deep reinforcement learning (DRL) methods for multi-objective vehicle routing problems (MOVRPs) typically decompose an MOVRP into subproblems with respective preferences and then train policies to solve corresponding subproblems. However, such a paradigm is still less effective in tackling the intricate interactions among subproblems, thus holding back the quality of the Pareto solutions. To counteract this limitation, we introduce a collaborative deep reinforcement learning method. We first propose a preference-based attention network (PAN) that allows the DRL agents to reason out solutions to subproblems in parallel, where a shared encoder learns the instance embedding and a decoder is tailored for each agent by preference intervention to construct respective solutions. Then, we design a collaborative active search (CAS) to further improve the solution quality, which updates only a part of the decoder parameters per instance during inference. In the CAS process, we also explicitly foster the interactions of neighboring DRL agents by imitation learning, empowering them to exchange insights of elite solutions to similar subproblems. Extensive results on random and benchmark instances verified the efficacy of PAN and CAS, which is particularly pronounced on the configurations (i.e., problem sizes or node distributions) beyond the training ones. Our code is available at https://***/marmotlab/PAN-CAS.
This paper presents a stochastic partially optimized cyclic shift crossover operator for the optimization of the multi-objectivevehiclerouting problem with time windows using genetic algorithms. The aim of the paper...
详细信息
This paper presents a stochastic partially optimized cyclic shift crossover operator for the optimization of the multi-objectivevehiclerouting problem with time windows using genetic algorithms. The aim of the paper is to show how the combination of simple stochastic rules and sequential appendage policies addresses a common limitation of the traditional genetic algorithm when optimizing complex combinatorial problems. The limitation, in question, is the inability of the traditional genetic algorithm to perform local optimization. A series of tests based on the Solomon benchmark instances show the level of competitiveness of the newly introduced crossover operator. (C) 2016 Elsevier B.V. All rights reserved
暂无评论