Distribution of Mutual Information
Keywords: Mutual Information, Cross Entropy, Dirichlet distribution, Second
order distribution, expectation and variance of mutual
information.
Abstract: The mutual information of two random variables i and j with joint
probabilities tij is commonly used in learning Bayesian nets as
well as in many other fields. The chances tij are usually
estimated by the empirical sampling frequency nij/n leading to a
point estimate I(nij/n) for the mutual information. To answer
questions like "is I(nij/n) consistent with zero?" or "what is
the probability that the true mutual information is much larger
than the point estimate?" one has to go beyond the point estimate.
In the Bayesian framework one can answer these questions by
utilizing a (second order) prior distribution p(t) comprising
prior information about t. From the prior p(t) one can compute the
posterior p(t|n), from which the distribution p(I|n) of the mutual
information can be calculated. We derive reliable and quickly
computable approximations for p(I|n). We concentrate on the mean,
variance, skewness, and kurtosis, and non-informative priors. For
the mean we also give an exact expression. Numerical issues and
the range of validity are discussed.
Table of Contents
- Introduction
- Mutual Information Distribution
- Results for I under the Dirichlet P(oste)rior
- Approximation of Expectation and Variance of I
- The Second Order Dirichlet Distribution
- Exact Value for E[I]
- Generalizations
- Numerics
BibTeX Entry
@InProceedings{Hutter:01xentropy,
author = "Marcus Hutter",
title = "Distribution of Mutual Information",
_number = "IDSIA-13-01",
booktitle = "Advances in Neural Information Processing Systems 14",
editor = "T. G. Dietterich and S. Becker and Z. Ghahramani",
publisher = "MIT Press",
address = "Cambridge, MA",
pages = "399--406",
year = "2002",
url = "http://www.hutter1.net/ai/xentropy.htm",
url2 = "http://arxiv.org/abs/cs.AI/0112019",
ftp = "ftp://ftp.idsia.ch/pub/techrep/IDSIA-13-01.ps.gz",
categories = "I.2. [Artificial Intelligence]",
keywords = "Mutual Information, Cross Entropy, Dirichlet distribution, Second
order distribution, expectation and variance of mutual
information.",
abstract = "The mutual information of two random variables i and j with joint
probabilities t_ij is commonly used in learning Bayesian nets as
well as in many other fields. The chances t_ij are usually
estimated by the empirical sampling frequency n_ij/n leading to a
point estimate I(n_ij/n) for the mutual information. To answer
questions like ``is I(n_ij/n) consistent with zero?'' or ``what is
the probability that the true mutual information is much larger
than the point estimate?'' one has to go beyond the point estimate.
In the Bayesian framework one can answer these questions by
utilizing a (second order) prior distribution p(t) comprising
prior information about t. From the prior p(t) one can compute the
posterior p(t|n), from which the distribution p(I|n) of the mutual
information can be calculated. We derive reliable and quickly
computable approximations for p(I|n). We concentrate on the mean,
variance, skewness, and kurtosis, and non-informative priors. For
the mean we also give an exact expression. Numerical issues and
the range of validity are discussed.",
}