版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:State Key Laboratory of Intelligent Technology and Systems Tsinghua University Beijing 100084 China Tsinghua National Laboratory for Information Science and Technology Tsinghua University Beijing 100084 China Department of Computer Science and Technology Tsinghua University Beijing 100084 China
出 版 物:《Dongnan Daxue Xuebao (Ziran Kexue Ban)/Journal of Southeast University (Natural Science Edition)》 (Dongnan Daxue Xuebao)
年 卷 期:2009年第39卷第SUPPL. 1期
页 面:146-151页
核心收录:
摘 要:Based on the policy search algorithm in partially observable Markov decision process (POMDP), an optimal policy search algorithm is proposed. An algorithm leading to matching law is then derived from the optimal algorithm. The aim of the subject can find a policy parameter that can maximize the expected value of a value function, and the policy parameter is updated on the experience of the subject. Due to the Markov assumption for the environment, the optimal policy algorithm can be obtained from computing the gradient of the expected value of the value function. Theoretical analysis and simulation results show that the decision behavior achieved by this algorithm is able to reach matching law. The matching law can be met if one subject tries to maximize the expected value of the value function under the simple assumption that past choice behaviors do not affect the expected value of the value function and the current policy. It reveals the relationship between the matching behavior and the optimal policy search algorithm, and suggests that the matching behavior is a suboptimal decision behavior.