GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces

Hamid Reza Maei; Richard S. Sutton

doi:10.2991/agi.2010.22

<Previous Article In Volume

Next Article In Volume>

GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces

Authors

Hamid Reza Maei, Richard S. Sutton

Corresponding Author

Hamid Reza Maei

Available Online June 2010.

DOI: 10.2991/agi.2010.22 How to use a DOI?
Abstract: A new family of gradient temporal-difference learning algorithms have recently been introduced by Sutton,Maei and others in which function approximation is much more straightforward. In this paper, we introduce the GQ(lambda) algorithm which can be seen as extension of that work to a more general setting including eligibility traces and o -policy learning of temporally abstract predictions. These extensions bring us closer to the ultimate goal of this work|the development of a universal prediction learning algorithm suitable for learning experientially grounded knowledge of the world. Eligibility traces are essential to this goal because they bridge the temporal gaps in cause and effect when experience is processed at a temporally fine resolution. Temporally abstract predictions are also essential as the means for representing abstract, higher-level knowledge about courses of action, or options. GQ(lambda) can be thought of as an extension of Q-learning. We extend existing convergence results for policy evaluation to this setting and carry out a forward-view/backward-view analysis to derive and prove the validity of the new algorithm.
Copyright: © 2010, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 3d Conference on Artificial General Intelligence (2010)
Series: Advances in Intelligent Systems Research
Publication Date: June 2010
ISBN: 978-90-78677-36-9
ISSN: 1951-6851
DOI: 10.2991/agi.2010.22 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - CONF
AU  - Hamid Reza Maei
AU  - Richard S. Sutton
PY  - 2010/06
DA  - 2010/06
TI  - GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces
BT  - Proceedings of the 3d Conference on Artificial General Intelligence (2010)
PB  - Atlantis Press
SP  - 100
EP  - 105
SN  - 1951-6851
UR  - https://doi.org/10.2991/agi.2010.22
DO  - 10.2991/agi.2010.22
ID  - Maei2010/06
ER  -

download .riscopy to clipboard