previous  home  search  PostScript (564kb)   PDF (152kb)   Html/Gif   contact    up    next  

Gradient-based Reinforcement Planning in Policy-Search Methods


Author: Ivo Kwee, Marcus Hutter, Juergen Schmidhuber (2001)
Comments: Extended version: 9 pages
Subj-class: Artificial Intelligence; Learning;

ACM-class:  

I.2; I.2.6; I.2.8;
Reference: Proceedings of the 5th European Workshop on Reinforcement Learning (EWRL-5) 27-29, Onderwijsinsituut CKI

Keywords: Artificial intelligence, reinforcement learning, direct policy search, planning, gradient decent.

Abstract: We introduce a learning method called "gradient-based reinforcement planning" (GREP). Unlike traditional DP methods that improve their policy backwards in time, GREP is a gradient-based method that plans ahead and improves its policy before it actually acts in the environment. We derive formulas for the exact policy gradient that maximizes the expected future reward and confirm our ideas with numerical experiments.

Table of Contents

 previous  home  search  PostScript (564kb)   PDF (152kb)   Html/Gif   contact    up    next  

BibTeX Entry

@Article{Hutter:01grep,
  author =       "Ivo Kwee and Marcus Hutter and Juergen Schmidhuber",
  institution =  "Istituto Dalle Molle di Studi sull'Intelligenza Artificiale (IDSIA)",
  title =        "Gradient-based Reinforcement Planning in Policy-Search Methods",
  month =        oct,
  year =         "2001",
  pages =        "27--29",
  address =      "Manno(Lugano), CH",
  journal =      "Proceedings of the 5th European Workshop on Reinforcement Learning (EWRL-5)",
  number =       "27",
  editor =       "Marco A. Wiering",
  publisher =    "Onderwijsinsituut CKI - Utrecht University",
  series =       "Cognitieve Kunstmatige Intelligentie",
  ISBN =         "90-393-2874-9",
  ISSN =         "1389-5184",
  keywords =     "Artificial intelligence, reinforcement learning, direct policy search,
                  planning, gradient decent.",
  url =          "http://www.hutter1.net/ai/pgrep.htm",
  categories =   "I.2.   [Artificial Intelligence],
                  I.2.6. [Learning],
                  I.2.8. [Problem Solving, Control Methods and Search]",
  abstract =     "We introduce a learning method called ``gradient-based reinforcement
                  planning'' (GREP). Unlike traditional DP methods that improve their
                  policy backwards in time, GREP is a gradient-based method that plans
                  ahead and improves its policy {\em before} it actually acts in the
                  environment. We derive formulas for the exact policy gradient that
                  maximizes the expected future reward and confirm our ideas
                  with numerical experiments.",
}
 previous  home  search  PostScript (87kb)   PDF (152kb)   Html/Gif   contact    up    next