Object tracking as a low-level vision task has always been a hot topic in computer vision. It is well known that Challenges such as background clutters, fast object motion and occlusion et al. affect a lot the robustn...
详细信息
ISBN:
(数字)9783030361891
ISBN:
(纸本)9783030361891;9783030361884
Object tracking as a low-level vision task has always been a hot topic in computer vision. It is well known that Challenges such as background clutters, fast object motion and occlusion et al. affect a lot the robustness or accuracy of existing object tracking methods. This paper proposes a reinforcement learning model based on twin delayed deep deterministic algorithm (TD3) for single object tracking. The model is based on the deep reinforcement learning model, Actor-Critic (AC), in which the Actor network predicts a continuous action that moves the target bounding box in the previous frame to the object position in the current frame and adapts to the object size. The Critic network evaluates the confidence of the new bounding box online to determine whether the Critic model needs to be updated or re-initialized. In further, in our model we use TD3 algorithm to further optimize the AC model by using two Critic networks to jointly predict the bounding box confidence, and to obtain the smaller predicted value as the label to update the network parameters, thereby rendering the Critic network to avoid excessive estimation bias, accelerate the convergence of the loss function, and obtain more accurate prediction values. Also, a small amount of random noise with upper and lower bounds are added to the action in the Actor model, and the search area is reasonably expanded in offline learning to improve the robustness of the tracking method under strong background interference and fast object motion. The Critic model can also guide the Actor model to select the best action and continuously update the state of the tracking object. Comprehensive experimental results on the OTB-2013 and OTB-2015 benchmarks demonstrate that our tracker performs best in precision, robustness, and efficiency when compared with state-of-the-art methods.
The prioritized experience replay mechanisms have achieved remarkable success in accelerating the convergence of reinforcement learning algorithms. However, applying traditional prioritized experience replay mechanism...
详细信息
The prioritized experience replay mechanisms have achieved remarkable success in accelerating the convergence of reinforcement learning algorithms. However, applying traditional prioritized experience replay mechanisms directly to asynchronous reinforcement learning leads to slow convergence, due to the difficulty for an agent to utilize excellent experiences obtained by other agents interacting with the environment. To address the above issue, we propose a Multi-pool Prioritized experience replay-based asynchronous twindelayeddeepdeterministic policy gradient algorithm (MP-TD3). Specifically, a multi-pool prioritized experience replay mechanism is proposed to strengthen the experience interactions among different agents to accelerate the network convergence. Then, a global-pool self-cleaning mechanism based on sample diversity and a global-pool self-cleaning mechanism based on TD-errors are designed to overcome the deficiency that the samples suffer from high redundancy and low information content in the global-pool, respectively. Finally, a multi-batch sampling mechanism is investigated to further reduce the training time. Extensive experiments validate that the proposed MP-TD3 significantly improve the convergence speed and performance compared with state-of-the-art methods.
暂无评论