On Evaluating Agent Performance in a Fixed Period of Time
- 10.2991/agi.2010.41How to use a DOI?
The evaluation of several agents over a given task in a finite period of time is a very common problem in experimental design, statistics, computer science, economics and, in general, any experimental science. It is also crucial for intelligence evaluation. In reinforcement learning, the task is formalised as an interactive environment with observations, actions and rewards. Typically, the decision that has to be made by the agent is a choice among a set of actions, cycle after cycle. However, in real evaluation scenarios, the time can be intentionally modulated by the agent. Consequently, agents not only choose an action but they also choose the time when they want to perform an action. This is natural in biological systems but it is also an issue in control. In this paper we revisit the classical reward aggregating functions which are commonly used in reinforcement learning and related areas, we analyse their problems, and we propose a modifcation of the average reward to get a consistent measurement for continuous time.
- © 2010, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - José Hernandez-Orallo PY - 2010/06 DA - 2010/06 TI - On Evaluating Agent Performance in a Fixed Period of Time BT - Proceedings of the 3d Conference on Artificial General Intelligence (2010) PB - Atlantis Press SP - 194 EP - 199 SN - 1951-6851 UR - https://doi.org/10.2991/agi.2010.41 DO - 10.2991/agi.2010.41 ID - Hernandez-Orallo2010/06 ER -