Learning Automata (LA) can be reckoned to be the founding algorithms on which the field of Reinforcement Learning has been built. Among the families of LA, Estimator algorithms (EAs) are certainly the fastest, and of ...
详细信息
ISBN:
(纸本)9783319074559;9783319074542
Learning Automata (LA) can be reckoned to be the founding algorithms on which the field of Reinforcement Learning has been built. Among the families of LA, Estimator algorithms (EAs) are certainly the fastest, and of these, the family of pursuitalgorithms (PAs) are the pioneering work. It has recently been reported that the previous proofs for s-optimality for all the reported algorithms in the family of PAs have been flawed1. We applaud the researchers who discovered this flaw, and who further proceeded to rectify the proof for the Continuous pursuitalgorithm (CPA). The latter proof, though requires the learning parameter to be continuously changing, is, to the best of our knowledge, the current best and only way to prove CPA's s-optimality. However, for all the algorithms with absorbing states, for example, the Absorbing Continuous pursuitalgorithm (ACPA) and the discretized pursuit algorithm (DPA), the constrain of a continuously changing learning parameter can be removed. In this paper, we provide a new method to prove the s-optimality of the discretized pursuit algorithm which does not require this constraint. We believe that our proof is both unique and pioneering. It can also form the basis for formally showing the s-optimality of the other EAs with absorbing states.
暂无评论