Convergence and Loss Bounds for Bayesian Sequence Prediction
Keywords: Bayesian sequence prediction;
general loss function and bounds;
convergence; mixture distributions.
Abstract: The probability of observing $x_t$ at time $t$, given past
observations $x_1...x_{t-1}$ can be computed with Bayes' rule if
the true generating distribution $\mu$ of the sequences
$x_1x_2x_3...$ is known. If $\mu$ is unknown, but known to belong
to a class $M$ one can base ones prediction on the Bayes mix
$\xi$ defined as a weighted sum of distributions $\nu\in M$.
Various convergence results of the mixture posterior $\xi_t$ to
the true posterior $\mu_t$ are presented. In particular a new
(elementary) derivation of the convergence $\xi_t/\mu_t\to 1$ is
provided, which additionally gives the rate of convergence. A
general sequence predictor is allowed to choose an action $y_t$
based on $x_1...x_{t-1}$ and receives loss $\ell_{x_t y_t}$ if
$x_t$ is the next symbol of the sequence. No assumptions are made
on the structure of $\ell$ (apart from being bounded) and $M$.
The Bayes-optimal prediction scheme $\Lambda_\xi$ based on mixture
$\xi$ and the Bayes-optimal informed prediction scheme
$\Lambda_\mu$ are defined and the total loss $L_\xi$ of
$\Lambda_\xi$ is bounded in terms of the total loss $L_\mu$ of
$\Lambda_\mu$. It is shown that $L_\xi$ is bounded for bounded
$L_\mu$ and $L_\xi/L_\mu\to 1$ for $L_\mu\to \infty$. Convergence
of the instantaneous losses are also proven.
BibTeX Entry
@Article{Hutter:02spupper,
author = "Marcus Hutter",
_number = "IDSIA-09-01",
title = "Convergence and Loss Bounds for {Bayesian} Sequence Prediction",
volume = "49",
number = "8",
year = "2003",
pages = "2061--2067",
journal = "IEEE Transactions on Information Theory",
http = "http://www.hutter1.net/ai/spupper.htm",
url = "http://arxiv.org/abs/cs.LG/0301014",
ftp = "ftp://ftp.idsia.ch/pub/techrep/IDSIA-09-01.ps.gz",
keywords = "Bayesian sequence prediction;
general loss function and bounds;
convergence; mixture distributions.",
abstract = "The probability of observing $x_t$ at time $t$, given past
observations $x_1...x_{t-1}$ can be computed with Bayes rule if
the true generating distribution $\mu$ of the sequences
$x_1x_2x_3...$ is known. If $\mu$ is unknown, but known to belong
to a class $M$ one can base ones prediction on the Bayes mix
$\xi$ defined as a weighted sum of distributions $\nu\in M$.
Various convergence results of the mixture posterior $\xi_t$ to
the true posterior $\mu_t$ are presented. In particular a new
(elementary) derivation of the convergence $\xi_t/\mu_t\to 1$ is
provided, which additionally gives the rate of convergence. A
general sequence predictor is allowed to choose an action $y_t$
based on $x_1...x_{t-1}$ and receives loss $\ell_{x_t y_t}$ if
$x_t$ is the next symbol of the sequence. No assumptions are made
on the structure of $\ell$ (apart from being bounded) and $M$.
The Bayes-optimal prediction scheme $\Lambda_\xi$ based on mixture
$\xi$ and the Bayes-optimal informed prediction scheme
$\Lambda_\mu$ are defined and the total loss $L_\xi$ of
$\Lambda_\xi$ is bounded in terms of the total loss $L_\mu$ of
$\Lambda_\mu$. It is shown that $L_\xi$ is bounded for bounded
$L_\mu$ and $L_\xi/L_\mu\to 1$ for $L_\mu\to \infty$. Convergence
of the instantaneous losses is also proven.",
}