Proceedings of the 2nd Conference on Artificial General Intelligence (2009)

Feature Markov Decision Processes

Authors
Marcus Hutter
Corresponding Author
Marcus Hutter
Available Online June 2009.
DOI
10.2991/agi.2009.30How to use a DOI?
Abstract

General purpose intelligent learning agents cycle through (complex,non-MDP) sequences of observations, actions, and rewards. On the other hand, reinforcement learning is well- developed for small finite state Markov Decision Processes (MDPs). So far it is an art performed by human designers to extract the right state representation out of the bare observa- tions, i.e. to reduce the agent setup to the MDP framework. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main con- tribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Ex- tensions to more realistic dynamic Bayesian networks are de- veloped in the companion article [Hut09].

Copyright
© 2009, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2nd Conference on Artificial General Intelligence (2009)
Series
Advances in Intelligent Systems Research
Publication Date
June 2009
ISBN
10.2991/agi.2009.30
ISSN
1951-6851
DOI
10.2991/agi.2009.30How to use a DOI?
Copyright
© 2009, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Marcus Hutter
PY  - 2009/06
DA  - 2009/06
TI  - Feature Markov Decision Processes
BT  - Proceedings of the 2nd Conference on Artificial General Intelligence (2009)
PB  - Atlantis Press
SP  - 138
EP  - 143
SN  - 1951-6851
UR  - https://doi.org/10.2991/agi.2009.30
DO  - 10.2991/agi.2009.30
ID  - Hutter2009/06
ER  -