Least Square Policy Iteration in Reinforcement Learning

Haifei Zhang; Hailong Deng; Bin Zhao; Ying Hong

doi:10.2991/lemcs-15.2015.272

<Previous Article In Volume

Next Article In Volume>

Least Square Policy Iteration in Reinforcement Learning

Authors

Haifei Zhang, Hailong Deng, Bin Zhao, Ying Hong

Corresponding Author

Haifei Zhang

Available Online July 2015.

DOI: 10.2991/lemcs-15.2015.272 How to use a DOI?
Keywords: Policy iteration; Least Square; Reinforcement learning; Sample-effective; Policy improvement
Abstract: Policy iteration is the core procedure for solving problems of reinforcement learning method. Policy iteration evaluates polices by evaluating value functions of these polices and then new improvement polices will be figured out by these value functions. Value functions and polices in classic policy iteration are tabular and accurate. However, these are not suitable for problems in extensive and continuous, i.e. action space reinforcement learning. Therefore, approximate policy iteration is often used to solving the problems. It constructs approximate value function for present policy and becomes an important part of approximate policy iteration. Policy is expressed by instantly calculating policy action from approximate function rather than explicit expression. Least square reinforcement method is sample-effective in solving parameters approximating the value function, the larger the sample size, the faster the speed of approaching solution. This paper will discuss the online least square policy iteration algorithms in reinforcement learning.
Copyright: © 2015, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the International Conference on Logistics, Engineering, Management and Computer Science
Series: Advances in Intelligent Systems Research
Publication Date: July 2015
ISBN: 978-94-6252-102-5
ISSN: 1951-6851
DOI: 10.2991/lemcs-15.2015.272 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - CONF
AU  - Haifei Zhang
AU  - Hailong Deng
AU  - Bin Zhao
AU  - Ying Hong
PY  - 2015/07
DA  - 2015/07
TI  - Least Square Policy Iteration in Reinforcement Learning
BT  - Proceedings of the International Conference on Logistics, Engineering, Management and Computer Science
PB  - Atlantis Press
SP  - 1365
EP  - 1370
SN  - 1951-6851
UR  - https://doi.org/10.2991/lemcs-15.2015.272
DO  - 10.2991/lemcs-15.2015.272
ID  - Zhang2015/07
ER  -

download .riscopy to clipboard

International Conference on Logistics Engineering, Management and Computer Science (LEMCS 2015)

Least Square Policy Iteration in Reinforcement Learning

Cite this article