%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% On the Existence and Convergence Computable Universal Priors %%
%% Marcus Hutter: Start: 01.08.02 LastEdit: 29.05.03 %%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%-------------------------------%
% Document-Style %
%-------------------------------%
\documentclass[12pt,twoside]{article}
\pagestyle{myheadings}
\markboth{\sc Marcus Hutter, Technical Report IDSIA-05-03
}{\sc Computable Universal Priors}
\setcounter{tocdepth}{4} \setcounter{secnumdepth}{2}
\topmargin=0mm \oddsidemargin=5mm \evensidemargin=5mm
\textwidth=15cm \textheight=22cm
\sloppy
%-------------------------------%
% My Math-Spacings %
%-------------------------------%
\def\,{\mskip 3mu} \def\>{\mskip 4mu plus 2mu minus 4mu} \def\;{\mskip 5mu plus 5mu} \def\!{\mskip-3mu}
\def\dispmuskip{\thinmuskip= 3mu plus 0mu minus 2mu \medmuskip= 4mu plus 2mu minus 2mu \thickmuskip=5mu plus 5mu minus 2mu}
\def\textmuskip{\thinmuskip= 0mu \medmuskip= 1mu plus 1mu minus 1mu \thickmuskip=2mu plus 3mu minus 1mu}
\textmuskip
\def\beq{\dispmuskip\begin{equation}} \def\eeq{\end{equation}\textmuskip}
\def\beqn{\dispmuskip\begin{displaymath}}\def\eeqn{\end{displaymath}\textmuskip}
\def\bqa{\dispmuskip\begin{eqnarray}} \def\eqa{\end{eqnarray}\textmuskip}
\def\bqan{\dispmuskip\begin{eqnarray*}} \def\eqan{\end{eqnarray*}\textmuskip}
%-------------------------------%
% Macro-Definitions %
%-------------------------------%
\newtheorem{theorem}{Theorem}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{definition}[theorem]{Definition}
\newenvironment{keywords}{\centerline{\bf\small
Keywords}\vspace{-1ex}\begin{quote}\small}{\par\end{quote}\vskip
1ex}
\newtheorem{tablex}[theorem]{Table}
\newtheorem{figurex}[equation]{Figure}
\def\ftheorem#1#2#3{\begin{theorem}[#2]\label{#1} #3 \end{theorem} }
\def\fcorollary#1#2#3{\begin{corollary}[#2]\label{#1} #3 \end{corollary} }
\def\flemma#1#2#3{\begin{lemma}[#2]\label{#1} #3 \end{lemma} }
\def\fdefinition#1#2#3{\begin{definition}[#2]\label{#1} #3 \end{definition} }
\def\ftablex#1#2#3{\begin{tablex}[#2]\label{#1} #3 \end{tablex} }
\def\ffigurex#1#2#3#4{{#4}\begin{figurex}[#2]\label{#1}#3\end{figurex}}
\def\idx#1{\index{#1}#1} %\idx{name} for also in text
\def\indxs#1#2{\index{#1!#2}\index{#2!#1}} %\idx{name} for also in text
\def\paragraph#1{\vspace{1ex}\noindent{\bf{#1.}}}
\def\paranodot#1{\vspace{1ex}\noindent{\bf{#1}}}
\def\myparskip{\vspace{1.5ex plus 1ex minus 1ex}\noindent}
\def\ff{\Longrightarrow}
\def\gdw{\Longleftrightarrow}
\def\toinfty#1{\stackrel{#1\to\infty}{\longrightarrow}}
\def\nq{\hspace{-1em}}
\def\qed{\hspace*{\fill}$\Box\quad$}
\def\odt{{\textstyle{1\over 2}}}
\def\odf{{\textstyle{1\over 4}}}
\def\eps{\varepsilon} % for small positive number
\def\epstr{\epsilon} % for empty string
\def\blank{{\,_\sqcup\,}} % blank position
\def\pfx{`} %prefix code
\def\qmbox#1{{\quad\mbox{#1}\quad}}
\def\argmax{\mathop{\rm arg\,max}} % maxarg
\def\argmin{\mathop{\rm arg\,min}} % minarg
\def\eqm{\stackrel\times=} % for some reason
\def\leqm{\stackrel\times\leq}
\def\geqm{\stackrel\times\geq}
\def\odn{{\textstyle{1\over n}}}
\def\v#1{{\bf #1}}
\def\l{{l}} % length of string or program
\def\M{{\cal M}} % Set of prob. distributions
\def\X{{\cal X}} % input/perception set/alphabet
\def\Y{{\cal Y}} % output/action set/alphabet
\def\R{{\cal R}} % reward set subset of reals
\def\F{{\cal F}} % Generic performance measure
\def\I{{\cal I}} % some set
\def\S{{\cal S}} % some set
\def\Q{{\cal Q}}
\def\E{{\bf E}} % Expectation value
\def\P{{\bf P}} % Expectation value
\def\B{\{0,1\}} % Binary set (or \set{B})
\def\MM{M} % Solomonoff's prior
\def\th{\theta}
\def\Set#1{{\if#1Q{I\!\!\!#1}\else\if#1Z{Z\!\!\!Z}\else{I\!\!#1}\fi\fi}}
\def\lb{\log}
\def\sumprime{\mathop{{\sum\nolimits'}}}
\begin{document}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% T i t l e - P a g e %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\title{\vskip -15mm\normalsize\sc Technical Report \hfill IDSIA-05-03
\vskip 2mm\bf\LARGE\hrule height5pt \vskip 3mm
\sc On the Existence and Convergence \\ of Computable Universal Priors\thanks{%
This work was supported by SNF grant 2000-61847.00 to J\"{u}rgen
Schmidhuber.}
\vskip 6mm \hrule height2pt \vskip 5mm}
\author{{\bf Marcus Hutter}\\[3mm]
\normalsize IDSIA, Galleria 2, CH-6928\ Manno-Lugano, Switzerland\\
\normalsize marcus@idsia.ch \hspace{8.5ex} http://www.idsia.ch/$^{_{_\sim}}\!$marcus}
\date{29 May 2003}
\maketitle
\begin{abstract}
\noindent Solomonoff unified Occam's razor and Epicurus' principle
of multiple explanations to one elegant, formal, universal theory
of inductive inference, which initiated the field of algorithmic
information theory. His central result is that the posterior of
his universal semimeasure $\MM$ converges rapidly to the true
sequence generating posterior $\mu$, if the latter is computable.
Hence, $M$ is eligible as a universal predictor in case of unknown
$\mu$.
%
We investigate the existence and convergence of computable
universal (semi)measures for a hierarchy of computability classes:
finitely computable, estimable, enumerable, and approximable.
For instance, $\MM$ is known to be enumerable, but not finitely
computable, and to dominate all enumerable semimeasures.
%
We define seven classes of (semi)measures based on these four
computability concepts. Each class may or may not contain a
(semi)measure which dominates all elements of another class. The
analysis of these 49 cases can be reduced to four basic cases, two
of them being new. The results hold for discrete and continuous
semimeasures.
%
We also investigate more closely the types of convergence, possibly
implied by universality: in difference and in ratio, with probability
1, in mean sum, and for Martin-L{\"o}f random sequences.
%
We introduce a generalized concept of randomness for individual
sequences and use it to exhibit difficulties regarding these
issues.
\end{abstract}
\begin{keywords}
Sequence prediction;
Algorithmic Information Theory;
Solomonoff's prior;
universal probability;
mixture distributions;
posterior convergence;
computability concepts;
Martin-L{\"o}f randomness.
\end{keywords}
\pagebreak
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Introduction}\label{secIntro}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
All induction problems can be phrased as sequence prediction
tasks. This is, for instance, obvious for time series prediction,
but also includes classification tasks. Having observed data $x_t$
at times $t0$, the important dominance $\xi_\M(x)\geq w_\nu
\nu(x)\,\forall\nu\in\M$ is satisfied. The question is what
properties does $\xi_\M$ possess. The distinguishing property of
$\MM=\xi_{\M_{enum}^{semi}}$ is that it is itself
an element of $\M_{enum}^{semi}$.
%
On the other hand, for prediction $\xi_\M\in\M$ is not by itself
an important property. What matters is whether $\xi_\M$ is
computable (in one of the senses defined) to avoid
getting into the (un)realm of non-constructive math.
%Goal of this work
The intention of this work is to investigate the existence,
computability and convergence of universal (semi)measures for
various computability classes: finitely computable $\subset$
estimable $\subset$ enumerable $\subset$ approximable (see
Definition \ref{defCompFunc}). For instance, $\MM(x)$ is
enumerable, but not finitely computable. The research in this work
was motivated by recent generalizations of Kolmogorov complexity
and Solomonoff's prior by Schmidhuber \cite{Schmidhuber:02gtm} to
approximable (and others not here discussed) cases.
%------------------------------%
\paragraph{Contents}
%------------------------------%
In Section \ref{secCC} we review various computability concepts
and discuss their relation.
%
In Section \ref{secUniM} we define the prefix Kolmogorov
complexity $K$, the concept of (semi)measures, Solomonoff's
universal prior $\MM$, and explain its universality.
%
Section \ref{secUSP} summarizes Solomonoff's major convergence
result, discusses general mixture distributions and the important
universality property -- multiplicative dominance.
%
In Section \ref{secUSM} we define seven classes of (semi)measures
based on four computability concepts. Each class may or may not
contain a (semi)measures which dominates all elements of another
class. We reduce the analysis of these 49 cases to four basic
cases. Domination (essentially by $\MM$) is known to be true for
two cases. The two new cases do not allow for domination.
%
In Section \ref{secConv} we investigate more closely the type of
convergence implied by universality. We summarize the result on
posterior convergence in difference $(\xi-\mu\to 0)$ and improve
the previous result \cite{Li:97} on the convergence in ratio
$\xi/\mu\to 1$ by showing rapid convergence without use
of Martingales.
%
In Section \ref{secMLconv} we investigate whether convergence for
all Martin-L{\"o}f random sequences could hold. We define a
generalized concept of randomness for individual sequences and use
it to show that proofs based on universality cannot decide this
question.
%
Section \ref{secConc} concludes the paper. Proofs will be
presented elsewhere.
%------------------------------%
\paragraph{Notation}
%------------------------------%
%Strings
We denote strings of length $n$ over finite alphabet $\X$ by
$x=x_1x_2...x_n$ with $x_t\in\X$ and further abbreviate
$x_{1:n}:=x_1x_2...x_{n-1}x_n$ and $x_{0
\eeq
dominates all $\nu\in\M$. This dominance is
necessary for the desired convergence $\xi\to\mu$ similarly to
(\ref{eukdist}). The question is what properties $\xi$ possesses.
The distinguishing property of $\M_{enum}^{semi}$ is that $\xi$ is
itself an element of $\M_{enum}^{semi}$. When concerned with
predictions, $\xi_\M\in\M$ is not by itself an important property,
but whether $\xi$ is computable in one of the senses of Definition
\ref{defCompFunc}. We define
\bqan
\M_1\geqm\M_2 & :\Leftrightarrow &
\mbox{there is an element of $\M_1$ which dominates all elements of
$\M_2$} \\
& :\Leftrightarrow &
\exists\rho\!\in\!\M_1\;\forall\nu\!\in\!\M_2\;\exists w_\nu\!>\!0
\;\forall x:\rho(x)\!\geq\!w_\nu\nu(x).
\eqan
$\geqm $ is transitive (but not necessarily reflexive) in the
sense that $\M_1 \geqm \M_2 \geqm \M_3$ implies $\M_1 \geqm \M_3$
and $\M_0 \supseteq \M_1 \geqm \M_2 \supseteq \M_3$ implies $\M_0
\geqm \M_3$.
%
For the computability concepts introduced in Section \ref{secCC}
we have the following proper set inclusions
\beqn
\begin{array}{ccccccc}
\M_{comp}^{msr} & \subset & \M_{est}^{msr} & \equiv & \M_{enum}^{msr} & \subset & \M_{appr}^{msr} \\
\cap & & \cap & & \cap & & \cap \\
\M_{comp}^{semi} & \subset & \M_{est}^{semi} & \subset & \M_{enum}^{semi} & \subset & \M_{appr}^{semi}
\end{array}
\eeqn
%
where $\M^{msr}_c$ stands for the set of all probability measures
of appropriate computability type $c\in\{$comp=finitely
computable, est=estimable, enum=enumerable,
appr=approximable$\}$, and similarly for semimeasures
$\M^{semi}_c$. From an enumeration of a measures $\rho$ on can
construct a co-enumeration by exploiting
$\rho(x_{1:n})=1-\sum_{y_{1:n}\neq x_{1:n}}\rho(y_{1:n})$. This
shows that every enumerable measure is also co-enumerable, hence
estimable, which proves the identity $\equiv$ above.
With this notation, Theorem \ref{thUniM} implies
$\M_{enum}^{semi}\geqm\M_{enum}^{semi}$. Transitivity allows to
conclude, for instance, that
$\M_{appr}^{semi}\geqm\M_{comp}^{msr}$, i.e.\ that there is an
approximable semimeasure which dominates all computable measures.
The standard ``diagonalization'' way of proving
$\M_1\stackrel\times{\not\geq}\M_2$ is to take an arbitrary
$\mu\in\M_1$ and ``increase'' it to $\rho$ such that
$\mu\stackrel\times{\not\geq}\rho$ and show that $\rho\in\M_2$.
There are $7\times 7$ combinations of (semi)measures $\M_1$ with
$\M_2$ for which $\M_1\geqm\M_2$ could be true or false. There are
four basic cases, explicated in the following theorem, from which
the other 49 combinations displayed in Table \ref{tabUniSMsr}
follow by transitivity.
%------------------------------%
\ftheorem{thNoUniApp}{Universal (semi)measures}{
%------------------------------%
A semimeasure $\rho$ is said to be universal for $\M$ if it
multiplicatively dominates all elements of $\M$ in the sense
$\forall\nu\exists w_\nu>0:\rho(x)\geq w_\nu\nu(x)\forall x$. The
following holds true:
\begin{itemize}
\item[$o)$]
$\exists\rho:\{\rho\}\geqm\M$: For every countable set
of (semi)measures $\M$, there is a (semi)measure which dominates
all elements of $\M$.
\item[$i)$]
$\M_{enum}^{semi}\geqm\M_{enum}^{semi}$:
The class of enumerable semimeasures {\em contains}
a universal element.
\item[$ii)$]
$\M_{appr}^{msr}\geqm\M_{enum}^{semi}$:
There {\em is} an approximable measure which dominates all enumerable
semimeasures.
\item[$iii)$]
$\M_{est}^{semi}\stackrel\times{\not\geq}\M_{comp}^{msr}$: There is
{\em no} estimable semimeasure which dominates all computable
measures.
\item[$iv)$]
$\M_{appr}^{semi}\stackrel\times{\not\geq}\M_{appr}^{msr}$: There is
{\em no} approximable semimeasure which dominates all approximable
measures.
\end{itemize}
}%------------------------------%
\begin{table}[thb]
\ftablex{tabUniSMsr}{Existence of universal (semi)measures}{%
The entry in row $r$ and column $c$ indicates whether there is a
$r$-able (semi)measure $\rho$ for the set $\M$ which contains all
$c$-able (semi)measures, where $r,c\in\{$comput, estimat, enumer,
approxim$\}$. Enumerable measures are estimable. This is the
reason why the enum. row and column in case of measures is
missing. The superscript indicates from which part of Theorem
\ref{thNoUniApp} the answer follows. For the bold face entries
directly, for the others using transitivity of $\geqm $.
\begin{center}
\begin{tabular}{|c|c||c|c|c|c||c|c|c|}\hline
$\nwarrow$ & $\M$ & \multicolumn{4}{c||}{semimeasure} & \multicolumn{3}{c|}{measure}\\ \hline
$\rho$&$\searrow$& comp. & est. & enum. & appr. & comp. & est. & appr. \\ \hline\hline
s & comp. & no$^{iii}$ & no$^{iii}$ & no$^{iii}$ & no$^{iv}$ & no$^{iii}$ & no$^{iii}$ & no$^{iv}$ \\ \cline{2-9}
e & est. & no$^{iii}$ & no$^{iii}$ & no$^{iii}$ & no$^{iv}$ & {\bf no}$^{\bf iii}$& no$^{iii}$ & no$^{iv}$ \\ \cline{2-9}
m & enum. & yes$^{i}$ & yes$^{i}$ & {\bf yes}$^{\bf i}$ & no$^{iv}$ & yes$^{i}$ & yes$^{i}$ & no$^{iv}$ \\ \cline{2-9}
i &appr. & yes$^{i}$ & yes$^{i}$ & yes$^{i}$ & no$^{iv}$ & yes$^{i}$ & yes$^{i}$ & {\bf no}$^{\bf iv}$\\ \hline\hline
m & comp. & no$^{iii}$ & no$^{iii}$ & no$^{iii}$ & no$^{iv}$ & no$^{iii}$ & no$^{iii}$ & no$^{iv}$ \\ \cline{2-9}
s & est. & no$^{iii}$ & no$^{iii}$ & no$^{iii}$ & no$^{iv}$ & no$^{iii}$ & no$^{iii}$ & no$^{iv}$ \\ \cline{2-9}
r &appr. & yes$^{ii}$ & yes$^{ii}$ & {\bf yes}$^{\bf ii}$& no$^{iv}$ & yes$^{ii}$ & yes$^{ii}$ & no$^{iv}$ \\ \hline
\end{tabular}
\end{center}}
\end{table}
\noindent If we ask for a universal (semi)measure which at least satisfies
the weakest form of computability, namely being approximable, we
see that the largest dominated set among the 7 sets defined above
is the set of enumerable semimeasures. This is the reason why
$\M_{enum}^{semi}$ plays a special role.
On the other hand, $\M_{enum}^{semi}$ is not the largest set
dominated by an approximable semimeasure, and indeed no such
largest set exists. One may, hence, ask for ``natural'' larger
sets $\M$. One such set, namely the set of cumulatively enumerable
semimeasures $\M_{CEM}$, has recently been discovered by
Schmidhuber \cite{Schmidhuber:02gtm}, for which even
$\xi_{CEM}\in\M_{CEM}$ holds.
\noindent Theorem \ref{thNoUniApp} also holds for {\em discrete
(semi)measures} $P$ defined as follows:
\index{semimeasure!enumerable}
%------------------------------%
\fdefinition{defDSemi}{Discrete (Semi)measures}{
%------------------------------%
$P(x)$ denotes the probability of $x\in\Set N$. We call
$P:\Set{N}\to[0,1]$ a discrete (semi)measure if $\sum_{x\in\Set{N}}
P(x)\stackrel{(<)}=1$.
}%------------------------------%
\noindent Theorem \ref{thNoUniApp}
%$(o)$ is elementary,
$(i)$ is Levin's major result \cite[Th.4.3.1 \& Th.4.5.1]{Li:97}, %
$(ii)$ is due to Solomonoff \cite{Solomonoff:78}, %
the proof of
$\M_{comp}^{semi}\stackrel\times{\not\geq}\M_{comp}^{semi}$ in
\cite[p249]{Li:97} contains minor errors and is not extensible to
$(iii)$ and the proof in \cite[p276]{Li:97} only applies to
infinite alphabet and not to the binary/finite case considered
here. A complete proof of $(o)-(iv)$ for discrete and continuous
(semi)measures is given elsewhere.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Posterior Convergence}\label{secConv}
%\subsection{Definition of Random Sequences}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
We have investigated in detail the computational properties of
various mixture distributions $\xi$. A mixture $\xi_\M$
multiplicatively dominates all distributions in $\M$. We have
mentioned that dominance implies posterior convergence. In this
section we present in more detail what dominance implies and what
not.
Convergence of $\xi(x_t|x_{