General Loss Bounds for Universal Sequence Prediction
Keywords: Bayesian sequence prediction; general loss function;
Solomonoff induction; Kolmogorov complexity; learning; universal
probability; loss bounds; games of chance; partial and delayed
prediction; classification.
Abstract: The Bayesian framework is ideally suited for
induction problems. The probability of observing xk
at time k, given past observations x1...xk-1
can be computed with Bayes' rule if the true distribution µ of
the sequences x1x2x3... is
known. The problem, however, is that in many cases one does not even
have a reasonable estimate of the true distribution. In order to
overcome this problem a universal distribution ß is
defined as a weighted sum of distributions µi
in M, where M is any countable set of distributions
including µ. This is a generalization of Solomonoff
induction, in which M is the set of all enumerable
semi-measures. Systems which predict yk, given
x1...xk-1 and which receive loss lxk
yk if xk is the true next
symbol of the sequence are considered. It is proven that using the
universal ß as a prior is nearly as good as using the
unknown true distribution µ. Furthermore, games of
chance, defined as a sequence of bets, observations, and rewards are
studied. The time needed to reach the winning zone is estimated.
Extensions to arbitrary alphabets, partial and delayed prediction, and
more active systems are discussed.
Table of Contents
- Introduction
- Induction
- Universal Sequence Prediction
- Contents
- Setup and Convergence
- Strings and Probability Distributions
- Universal Prior Probability Distribution
- (Probability Classes)
- Loss Bounds
- Unit Loss Function
- Proof Sketch of Loss Bound
- General Loss
- Application to Games of Chance
- Introduction/Example
- Games of Chance
- Information-Theoretic Interpretation
- Outlook
- General Alphabet
- Partial Prediction, Delayed Prediction, Classification
- More Active Systems
- The Weighted Majority Algorithm(s)
- Miscellaneous
- Summary
BibTeX Entry
@Article{Hutter:01loss,
author = "Marcus Hutter",
title = "General Loss Bounds for Universal Sequence Prediction",
number = "IDSIA-03-01",
institution = "Istituto Dalle Molle di Studi sull'Intelligenza Artificiale",
address = "Manno(Lugano), CH",
month = june,
year = "2001",
pages = "210--217",
journal = "Proceedings of the $18^{th}$ International Conference
on Machine Learning (ICML-2001)",
publisher = "Morgan Kaufmann"
url = "http://www.hutter1.net/official/ploss.htm",
url2 = "http://arxiv.org/abs/cs.AI/0101019",
ftp = "ftp://ftp.idsia.ch/pub/techrep/IDSIA-03-01.ps.gz",
categories = "I.2. [Artificial Intelligence],
I.2.6. [Learning],
I.2.8. [Problem Solving, Control Methods and Search],
F.1.3. [Complexity Classes].",
keywords = "Bayesian and deterministic prediction; general loss function;
Solomonoff induction; Kolmogorov complexity; leaning; universal
probability; loss bounds; games of chance; partial and delayed
prediction; classification.",
abstract = "The Bayesian framework is ideally suited for induction problems.
The probability of observing $x_k$ at time $k$, given past
observations $x_1...x_{k-1}$ can be computed with Bayes' rule if
the true distribution $\mu$ of the sequences $x_1x_2x_3...$ is
known. The problem, however, is that in many cases one does not
even have a reasonable estimate of the true distribution. In order
to overcome this problem a universal distribution $\xi$ is defined
as a weighted sum of distributions $\mu_i\in M$, where $M$ is
any countable set of distributions including $\mu$. This is a
generalization of Solomonoff induction, in which $M$ is the set of
all enumerable semi-measures. Systems which predict $y_k$, given
$x_1...x_{k-1}$ and which receive loss $l_{x_k y_k}$ if $x_k$ is
the true next symbol of the sequence are considered. It is proven
that using the universal $\xi$ as a prior is nearly as good as
using the unknown true distribution $\mu$. Furthermore, games of
chance, defined as a sequence of bets, observations, and rewards
are studied. The time needed to reach the winning zone is
estimated. Extensions to arbitrary alphabets, partial and delayed
prediction, and more active systems are discussed.",
}