Convergence and Loss Bounds for Bayesian Sequence Prediction (Marcus Hutter)

previous home search

LaTeX (51kb) PostScript (420kb) PDF (293kb) Html/Gif

contact up next

Convergence and Loss Bounds for Bayesian Sequence Prediction

Author: Marcus Hutter (2001-2003)

Comments: 8 two-column LaTeX pages

Subj-class: Artificial Intelligence; Learning

ACM-class:
E.4;I.2.6;G.3

Reference: IEEE Transactions on Information Theory, 49:8 (2003) 2061--2067

Report-no: IDSIA-09-01 and cs.LG/0301014

Paper: LaTeX (58kb) - PostScript (493kb) - PDF (233kb) - Html/Gif

Slides: PostScript - PDF

Keywords: Bayesian sequence prediction; general loss function and bounds; convergence; mixture distributions.

Abstract: The probability of observing $x_t$ at time $t$, given past observations $x_1...x_{t-1}$ can be computed with Bayes' rule if the true generating distribution $\mu$ of the sequences $x_1x_2x_3...$ is known. If $\mu$ is unknown, but known to belong to a class $M$ one can base ones prediction on the Bayes mix $\xi$ defined as a weighted sum of distributions $\nu\in M$. Various convergence results of the mixture posterior $\xi_t$ to the true posterior $\mu_t$ are presented. In particular a new (elementary) derivation of the convergence $\xi_t/\mu_t\to 1$ is provided, which additionally gives the rate of convergence. A general sequence predictor is allowed to choose an action $y_t$ based on $x_1...x_{t-1}$ and receives loss $\ell_{x_t y_t}$ if $x_t$ is the next symbol of the sequence. No assumptions are made on the structure of $\ell$ (apart from being bounded) and $M$. The Bayes-optimal prediction scheme $\Lambda_\xi$ based on mixture $\xi$ and the Bayes-optimal informed prediction scheme $\Lambda_\mu$ are defined and the total loss $L_\xi$ of $\Lambda_\xi$ is bounded in terms of the total loss $L_\mu$ of $\Lambda_\mu$. It is shown that $L_\xi$ is bounded for bounded $L_\mu$ and $L_\xi/L_\mu\to 1$ for $L_\mu\to \infty$. Convergence of the instantaneous losses are also proven.

previous home search

LaTeX (51kb) PostScript (420kb) PDF (293kb) Html/Gif

contact up next

BibTeX Entry

@Article{Hutter:02spupper,
  author =       "Marcus Hutter",
  _number =       "IDSIA-09-01",
  title =        "Convergence and Loss Bounds for {Bayesian} Sequence Prediction",
  volume =       "49",
  number =       "8",
  year =         "2003",
  pages =        "2061--2067",
  journal =      "IEEE Transactions on Information Theory",
  http =         "http://www.hutter1.net/ai/spupper.htm",
  url =          "http://arxiv.org/abs/cs.LG/0301014",
  ftp =          "ftp://ftp.idsia.ch/pub/techrep/IDSIA-09-01.ps.gz",
  keywords =     "Bayesian sequence prediction;
                  general loss function and bounds;
                  convergence; mixture distributions.",
  abstract =     "The probability of observing $x_t$ at time $t$, given past
                  observations $x_1...x_{t-1}$ can be computed with Bayes rule if
                  the true generating distribution $\mu$ of the sequences
                  $x_1x_2x_3...$ is known. If $\mu$ is unknown, but known to belong
                  to a class $M$ one can base ones prediction on the Bayes mix
                  $\xi$ defined as a weighted sum of distributions $\nu\in M$.
                  Various convergence results of the mixture posterior $\xi_t$ to
                  the true posterior $\mu_t$ are presented. In particular a new
                  (elementary) derivation of the convergence $\xi_t/\mu_t\to 1$ is
                  provided, which additionally gives the rate of convergence. A
                  general sequence predictor is allowed to choose an action $y_t$
                  based on $x_1...x_{t-1}$ and receives loss $\ell_{x_t y_t}$ if
                  $x_t$ is the next symbol of the sequence. No assumptions are made
                  on the structure of $\ell$ (apart from being bounded) and $M$.
                  The Bayes-optimal prediction scheme $\Lambda_\xi$ based on mixture
                  $\xi$ and the Bayes-optimal informed prediction scheme
                  $\Lambda_\mu$ are defined and the total loss $L_\xi$ of
                  $\Lambda_\xi$ is bounded in terms of the total loss $L_\mu$ of
                  $\Lambda_\mu$. It is shown that $L_\xi$ is bounded for bounded
                  $L_\mu$ and $L_\xi/L_\mu\to 1$ for $L_\mu\to \infty$. Convergence
                  of the instantaneous losses is also proven.",
}

previous home search

LaTeX (51kb) PostScript (420kb) PDF (293kb) Html/Gif

contact up next

Author:	Marcus Hutter (2001-2003)
Comments:	8 two-column LaTeX pages
Subj-class:	Artificial Intelligence; Learning
ACM-class:	E.4;I.2.6;G.3
Reference:	IEEE Transactions on Information Theory, 49:8 (2003) 2061--2067
Report-no:	IDSIA-09-01 and cs.LG/0301014
Paper:	LaTeX (58kb) - PostScript (493kb) - PDF (233kb) - Html/Gif
Slides:	PostScript - PDF