A central aspect of the development of visual perception is the autonomous calibration of various kinds of eye movements including saccadic, pursuit, or vergence eye movements. An important but less well-studied class...
详细信息
ISBN:
(纸本)9781538661116
A central aspect of the development of visual perception is the autonomous calibration of various kinds of eye movements including saccadic, pursuit, or vergence eye movements. An important but less well-studied class of eye movements are so-called torsional eye movements, where the eyes rotate around the line of sight. In humans, such torsional eye movements obey certain lawful relationships such as Listing's Law. However, it is still an open question how these eye movements develop and what learning processes may contribute to their development. Here we propose a model of the development of torsional eye movements based on the active efficient coding (AEC) framework. AEC models the joint development of sensory encoding and movements of the sense organs to maximize the overall coding efficiency of the perceptual system. Our results demonstrate that optimizing coding efficiency in this way leads to torsional eye movements consistent with Listing's Law describing torsional eye movements in humans. This suggests that humanoid robots aiming to maximize the coding efficiency of their visual systems could also benefit from physical or simulated torsional eye movements.
A novel class of Approximate Policy Iteration (API) algorithms have recently demonstrated impressive practical performance (e.g., ExIt [1], AlphaGo-Zero [2]). This new family of algorithms maintains, and alternately o...
A novel class of Approximate Policy Iteration (API) algorithms have recently demonstrated impressive practical performance (e.g., ExIt [1], AlphaGo-Zero [2]). This new family of algorithms maintains, and alternately optimizes, two policies: a fast, reactive policy (e.g., a deep neural network) deployed at test time, and a slow, non-reactive policy (e.g., Tree Search), that can plan multiple steps ahead. The reactive policy is updated under supervision from the non-reactive policy, while the non-reactive policy is improved via guidance from the reactive policy. In this work we study this class of Dual Policy Iteration (DPI) strategy in an alternating optimization framework and provide a convergence analysis that extends existing API theory. We also develop a special instance of this framework which reduces the update of non-reactive policies to model-based optimal control using learned local models, and provides a theoretically sound way of unifying model-free and model-based RL approaches with unknown dynamics. We demonstrate the efficacy of our approach on various continuous control Markov Decision processes.
暂无评论