previous home search |
PostScript (420kb)
PDF (293kb)
Html/Gif
| contact up next |

## Optimality of Universal Bayesian Sequence Prediction

Author:Marcus Hutter (2001) Comments:15 LaTeX pages Subj-class:Artificial Intelligence; Learning ACM-class:

I.2; I.2.6; I.2.8; F.1.3; F.2 Paper:PostScript (420kb) - PDF (293kb) - Html/Gif Slides:PostScript - PDF Remark:Technical Report

Keywords:Bayesian sequence prediction; Mixture distributions, Solomonoff induction; Kolmogorov complexity; learning; universal probability; tight loss and error bounds; Pareto-optimality.

Abstract:Various optimality properties of universal sequence prediction based on Bayes-mixtures in general, and Solomonoff's prediction scheme in particular will be studied. The probability of observing $x_t$ at time $t$, given past observations $x_1...x_{t-1}$ can be computed with Bayes' rule if the true generating distribution $\mu$ of the sequences $x_1x_2x_3...$ is known. If $\mu$ is unknown, but known to belong to a class $\M$ one can base ones prediction on the Bayes-mix $\xi$ defined as a $w_\nu$ weighted sum of distributions $\nu\in\M$. The number of additional prediction errors $E_\xi$ made by the optimal universal prediction scheme based on $\xi$ minus the number of errors $E_\mu$ of the optimal informed prediction scheme based on $\mu$ is bounded by $O(\sqrt{E_\mu})$. We show that the bound is tight and that no other predictor can lead to smaller bounds. Furthermore, for various performance measures we show Pareto-optimality of $\xi$ in the sense that there is no other predictor which performs better or equal in all environments $\nu\in\M$ and strictly better in at least one. So, optimal predictors can (w.r.t.\ to most performance measures in expectation) be based on the mixture $\xi$. Finally we give an Occam's razor argument that Solomonoff's choice $w_\nu\sim 2^{-K(\nu)}$ for the weights is optimal, where $K(\nu)$ is the length of the shortest program describing $\nu$.

previous home search |
PostScript (420kb)
PDF (293kb)
Html/Gif
| contact up next |

## Table of Contents

- Introduction
- Setup and Convergence
- Error & Loss Bounds
- Lower Error Bound
- Pareto Optimality of ß
- On the Optimal Choice of Weights
- Conclusions

previous home search |
PostScript (420kb)
PDF (293kb)
Html/Gif
| contact up next |

@TechReport{Hutter:02splower, author = "Marcus Hutter", number = "IDSIA-09-01", institution = "Istituto Dalle Molle di Studi sull'Intelligenza Artificiale (IDSIA)", title = "Optimality of Universal {Bayesian} Sequence Prediction", year = "2002", pages = "15", address = "Manno(Lugano), CH", abstract = "Various optimality properties of universal sequence prediction based on Bayes-mixtures in general, and Solomonoff's prediction scheme in particular will be studied. The probability of observing $x_t$ at time $t$, given past observations $x_1...x_{t-1}$ can be computed with Bayes' rule if the true generating distribution $\mu$ of the sequences $x_1x_2x_3...$ is known. If $\mu$ is unknown, but known to belong to a class $\M$ one can base ones prediction on the Bayes-mix $\xi$ defined as a $w_\nu$ weighted sum of distributions $\nu\in\M$. The number of additional prediction errors $E_\xi$ made by the optimal universal prediction scheme based on $\xi$ minus the number of errors $E_\mu$ of the optimal informed prediction scheme based on $\mu$ is bounded by $O(\sqrt{E_\mu})$. We show that the bound is tight and that no other predictor can lead to smaller bounds. Furthermore, for various performance measures we show Pareto-optimality of $\xi$ in the sense that there is no other predictor which performs better or equal in all environments $\nu\in\M$ and strictly better in at least one. So, optimal predictors can (w.r.t.\ to most performance measures in expectation) be based on the mixture $\xi$. Finally we give an Occam's razor argument that Solomonoff's choice $w_\nu\sim 2^{-K(\nu)}$ for the weights is optimal, where $K(\nu)$ is the length of the shortest program describing $\nu$.", url = "http://www.hutter1.net/ai/splower.htm", }

previous home search |
PostScript (420kb)
PDF (293kb)
Html/Gif
| contact up next |