Distribution of Mutual Information (Marcus Hutter)

previous home search

PostScript (167kb) PDF (172kb) Html/Gif

contact up next

Distribution of Mutual Information

Author: Marcus Hutter (2001)

Comments: 8 LaTeX pages

Subj-class: Artificial Intelligence

ACM-class:
I.2

Reference: Advances in Neural Information Processing Systems, 14 (NIPS-2001) 399-406

Report-no: IDSIA-13-01 and cs.AI/0112019

Paper: PostScript (167kb) - PDF (172kb) - Html/Gif

Slides: PostScript - PDF

Keywords: Mutual Information, Cross Entropy, Dirichlet distribution, Second order distribution, expectation and variance of mutual information.

Abstract: The mutual information of two random variables i and j with joint probabilities t_ij is commonly used in learning Bayesian nets as well as in many other fields. The chances t_ij are usually estimated by the empirical sampling frequency n_ij/n leading to a point estimate I(n_ij/n) for the mutual information. To answer questions like "is I(n_ij/n) consistent with zero?" or "what is the probability that the true mutual information is much larger than the point estimate?" one has to go beyond the point estimate. In the Bayesian framework one can answer these questions by utilizing a (second order) prior distribution p(t) comprising prior information about t. From the prior p(t) one can compute the posterior p(t|n), from which the distribution p(I|n) of the mutual information can be calculated. We derive reliable and quickly computable approximations for p(I|n). We concentrate on the mean, variance, skewness, and kurtosis, and non-informative priors. For the mean we also give an exact expression. Numerical issues and the range of validity are discussed.

previous home search

PostScript (167kb) PDF (172kb) Html/Gif

contact up next

Table of Contents

Introduction

Mutual Information Distribution

Results for I under the Dirichlet P(oste)rior

Approximation of Expectation and Variance of I

The Second Order Dirichlet Distribution

Exact Value for E[I]

Generalizations

Numerics

previous home search

PostScript (167kb) PDF (172kb) Html/Gif

contact up next

BibTeX Entry

@InProceedings{Hutter:01xentropy,
  author =       "Marcus Hutter",
  title =        "Distribution of Mutual Information",
  _number =       "IDSIA-13-01",
  booktitle =    "Advances in Neural Information Processing Systems 14",
  editor =       "T. G. Dietterich and S. Becker and Z. Ghahramani",
  publisher =    "MIT Press",
  address =      "Cambridge, MA",
  pages =        "399--406",
  year =         "2002",
  url =          "http://www.hutter1.net/ai/xentropy.htm",
  url2 =         "http://arxiv.org/abs/cs.AI/0112019",
  ftp =          "ftp://ftp.idsia.ch/pub/techrep/IDSIA-13-01.ps.gz",
  categories =   "I.2.   [Artificial Intelligence]",
  keywords =     "Mutual Information, Cross Entropy, Dirichlet distribution, Second
                  order distribution, expectation and variance of mutual
                  information.",
  abstract =     "The mutual information of two random variables i and j with joint
                  probabilities t_ij is commonly used in learning Bayesian nets as
                  well as in many other fields. The chances t_ij are usually
                  estimated by the empirical sampling frequency n_ij/n leading to a
                  point estimate I(n_ij/n) for the mutual information. To answer
                  questions like ``is I(n_ij/n) consistent with zero?'' or ``what is
                  the probability that the true mutual information is much larger
                  than the point estimate?'' one has to go beyond the point estimate.
                  In the Bayesian framework one can answer these questions by
                  utilizing a (second order) prior distribution p(t) comprising
                  prior information about t. From the prior p(t) one can compute the
                  posterior p(t|n), from which the distribution p(I|n) of the mutual
                  information can be calculated. We derive reliable and quickly
                  computable approximations for p(I|n). We concentrate on the mean,
                  variance, skewness, and kurtosis, and non-informative priors. For
                  the mean we also give an exact expression. Numerical issues and
                  the range of validity are discussed.",
}

previous home search

PostScript (167kb) PDF (172kb) Html/Gif

contact up next

Author:	Marcus Hutter (2001)
Comments:	8 LaTeX pages
Subj-class:	Artificial Intelligence
ACM-class:	I.2
Reference:	Advances in Neural Information Processing Systems, 14 (NIPS-2001) 399-406
Report-no:	IDSIA-13-01 and cs.AI/0112019
Paper:	PostScript (167kb) - PDF (172kb) - Html/Gif
Slides:	PostScript - PDF