Proceedings of the International Conference on Logistics, Engineering, Management and Computer Science

Least Square Policy Iteration in Reinforcement Learning

Authors
Haifei Zhang, Hailong Deng, Bin Zhao, Ying Hong
Corresponding Author
Haifei Zhang
Available Online July 2015.
DOI
10.2991/lemcs-15.2015.272How to use a DOI?
Keywords
Policy iteration; Least Square; Reinforcement learning; Sample-effective; Policy improvement
Abstract

Policy iteration is the core procedure for solving problems of reinforcement learning method. Policy iteration evaluates polices by evaluating value functions of these polices and then new improvement polices will be figured out by these value functions. Value functions and polices in classic policy iteration are tabular and accurate. However, these are not suitable for problems in extensive and continuous, i.e. action space reinforcement learning. Therefore, approximate policy iteration is often used to solving the problems. It constructs approximate value function for present policy and becomes an important part of approximate policy iteration. Policy is expressed by instantly calculating policy action from approximate function rather than explicit expression. Least square reinforcement method is sample-effective in solving parameters approximating the value function, the larger the sample size, the faster the speed of approaching solution. This paper will discuss the online least square policy iteration algorithms in reinforcement learning.

Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the International Conference on Logistics, Engineering, Management and Computer Science
Series
Advances in Intelligent Systems Research
Publication Date
July 2015
ISBN
10.2991/lemcs-15.2015.272
ISSN
1951-6851
DOI
10.2991/lemcs-15.2015.272How to use a DOI?
Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Haifei Zhang
AU  - Hailong Deng
AU  - Bin Zhao
AU  - Ying Hong
PY  - 2015/07
DA  - 2015/07
TI  - Least Square Policy Iteration in Reinforcement Learning
BT  - Proceedings of the International Conference on Logistics, Engineering, Management and Computer Science
PB  - Atlantis Press
SP  - 1365
EP  - 1370
SN  - 1951-6851
UR  - https://doi.org/10.2991/lemcs-15.2015.272
DO  - 10.2991/lemcs-15.2015.272
ID  - Zhang2015/07
ER  -