Universal Sequential Decisions in Unknown Environments (Marcus Hutter)

previous home search

PostScript (564kb) PDF (152kb) Html/Gif

contact up next

Gradient-based Reinforcement Planning in Policy-Search Methods

Author: Ivo Kwee, Marcus Hutter, Juergen Schmidhuber (2001)

Comments: Extended version: 9 pages

Subj-class: Artificial Intelligence; Learning;

ACM-class:
I.2; I.2.6; I.2.8;

Reference: Proceedings of the 5th European Workshop on Reinforcement Learning (EWRL-5) 27-29, Onderwijsinsituut CKI

Keywords: Artificial intelligence, reinforcement learning, direct policy search, planning, gradient decent.

Abstract: We introduce a learning method called "gradient-based reinforcement planning" (GREP). Unlike traditional DP methods that improve their policy backwards in time, GREP is a gradient-based method that plans ahead and improves its policy before it actually acts in the environment. We derive formulas for the exact policy gradient that maximizes the expected future reward and confirm our ideas with numerical experiments.

Table of Contents

Introduction

Derivation of the policy gradient

Computation of the optimal policy

Numerical experiments

Conclusions

Implicit policies

Monte Carlo gradient sampling

previous home search

PostScript (564kb) PDF (152kb) Html/Gif

contact up next

BibTeX Entry

@Article{Hutter:01grep,
  author =       "Ivo Kwee and Marcus Hutter and Juergen Schmidhuber",
  institution =  "Istituto Dalle Molle di Studi sull'Intelligenza Artificiale (IDSIA)",
  title =        "Gradient-based Reinforcement Planning in Policy-Search Methods",
  month =        oct,
  year =         "2001",
  pages =        "27--29",
  address =      "Manno(Lugano), CH",
  journal =      "Proceedings of the 5th European Workshop on Reinforcement Learning (EWRL-5)",
  number =       "27",
  editor =       "Marco A. Wiering",
  publisher =    "Onderwijsinsituut CKI - Utrecht University",
  series =       "Cognitieve Kunstmatige Intelligentie",
  ISBN =         "90-393-2874-9",
  ISSN =         "1389-5184",
  keywords =     "Artificial intelligence, reinforcement learning, direct policy search,
                  planning, gradient decent.",
  url =          "http://www.hutter1.net/ai/pgrep.htm",
  categories =   "I.2.   [Artificial Intelligence],
                  I.2.6. [Learning],
                  I.2.8. [Problem Solving, Control Methods and Search]",
  abstract =     "We introduce a learning method called ``gradient-based reinforcement
                  planning'' (GREP). Unlike traditional DP methods that improve their
                  policy backwards in time, GREP is a gradient-based method that plans
                  ahead and improves its policy {\em before} it actually acts in the
                  environment. We derive formulas for the exact policy gradient that
                  maximizes the expected future reward and confirm our ideas
                  with numerical experiments.",
}

previous home search

PostScript (87kb) PDF (152kb) Html/Gif

contact up next

Author:	Ivo Kwee, Marcus Hutter, Juergen Schmidhuber (2001)
Comments:	Extended version: 9 pages
Subj-class:	Artificial Intelligence; Learning;
ACM-class:	I.2; I.2.6; I.2.8;
Reference:	Proceedings of the 5th European Workshop on Reinforcement Learning (EWRL-5) 27-29, Onderwijsinsituut CKI