Bayesian Treatment of Incomplete Discrete Data applied to Mutual Information and Feature Selection
Keywords: Incomplete data, Bayesian statistics, expectation maximization,
global optimization, Mutual Information, Cross Entropy, Dirichlet
distribution, Second order distribution, Credible intervals,
expectation and variance of mutual information, missing data,
Robust feature selection, Filter approach, naive Bayes classifier.
Abstract: Given the joint chances of a pair of random variables one can
compute quantities of interest, like the mutual information. The
Bayesian treatment of unknown chances involves computing, from a
second order prior distribution and the data likelihood, a
posterior distribution of the chances. A common treatment of
incomplete data is to assume ignorability and determine the
chances by the expectation maximization (EM) algorithm. The two
different methods above are well established but typically
separated. This paper joins the two approaches in the case of
Dirichlet priors, and derives efficient approximations for the
mean, mode and the (co)variance of the chances and the mutual
information. Furthermore, we prove the unimodality of the
posterior distribution, whence the important property of
convergence of EM to the global maximum in the chosen framework.
These results are applied to the problem of selecting features for
incremental learning and naive Bayes classification. A fast filter
based on the distribution of mutual information is shown to
outperform the traditional filter based on empirical mutual
information on a number of incomplete real data sets.
BibTeX Entry
@InProceedings{Hutter:03mimiss,
author = "Marcus Hutter and Marco Zaffalon",
title = "Bayesian Treatment of Incomplete Discrete Data applied
to Mutual Information and Feature Selection",
year = "2003",
pages = "396--406",
series = "Lecture Notes in Computer Science",
volume = "2821",
booktitle = "Proceedings of the 26th German Conference on Artificial Intelligence (KI-2003)",
editor = "A. G{\"u}nter, R. Kruse and B. Neumann",
publisher = "Springer",
address = "Heidelberg",
http = "http://www.hutter1.net/ai/mimiss.htm",
url = "http://arxiv.org/abs/cs.LG/0306126",
ftp = "ftp://ftp.idsia.ch/pub/techrep/IDSIA-15-03.ps.gz",
keywords = "Incomplete data, Bayesian statistics, expectation maximization,
global optimization, Mutual Information, Cross Entropy, Dirichlet
distribution, Second order distribution, Credible intervals,
expectation and variance of mutual information, missing data,
Robust feature selection, Filter approach, naive Bayes classifier.",
abstract = "Given the joint chances of a pair of random variables one can
compute quantities of interest, like the mutual information. The
Bayesian treatment of unknown chances involves computing, from a
second order prior distribution and the data likelihood, a
posterior distribution of the chances. A common treatment of
incomplete data is to assume ignorability and determine the
chances by the expectation maximization (EM) algorithm. The two
different methods above are well established but typically
separated. This paper joins the two approaches in the case of
Dirichlet priors, and derives efficient approximations for the
mean, mode and the (co)variance of the chances and the mutual
information. Furthermore, we prove the unimodality of the
posterior distribution, whence the important property of
convergence of EM to the global maximum in the chosen framework.
These results are applied to the problem of selecting features for
incremental learning and naive Bayes classification. A fast filter
based on the distribution of mutual information is shown to
outperform the traditional filter based on empirical mutual
information on a number of incomplete real data sets.",
}