Least Square Policy Iteration in Reinforcement Learning
- 10.2991/lemcs-15.2015.272How to use a DOI?
- Policy iteration; Least Square; Reinforcement learning; Sample-effective; Policy improvement
Policy iteration is the core procedure for solving problems of reinforcement learning method. Policy iteration evaluates polices by evaluating value functions of these polices and then new improvement polices will be figured out by these value functions. Value functions and polices in classic policy iteration are tabular and accurate. However, these are not suitable for problems in extensive and continuous, i.e. action space reinforcement learning. Therefore, approximate policy iteration is often used to solving the problems. It constructs approximate value function for present policy and becomes an important part of approximate policy iteration. Policy is expressed by instantly calculating policy action from approximate function rather than explicit expression. Least square reinforcement method is sample-effective in solving parameters approximating the value function, the larger the sample size, the faster the speed of approaching solution. This paper will discuss the online least square policy iteration algorithms in reinforcement learning.
- © 2015, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Haifei Zhang AU - Hailong Deng AU - Bin Zhao AU - Ying Hong PY - 2015/07 DA - 2015/07 TI - Least Square Policy Iteration in Reinforcement Learning BT - Proceedings of the International Conference on Logistics, Engineering, Management and Computer Science PB - Atlantis Press SP - 1365 EP - 1370 SN - 1951-6851 UR - https://doi.org/10.2991/lemcs-15.2015.272 DO - 10.2991/lemcs-15.2015.272 ID - Zhang2015/07 ER -