Bayesian Entropy Estimation for Countable Discrete Distributions

Archer, Evan; Park, Il Memming; Pillow, Jonathan

Computer Science > Information Theory

arXiv:1302.0328 (cs)

[Submitted on 2 Feb 2013 (v1), last revised 9 Apr 2014 (this version, v3)]

Title:Bayesian Entropy Estimation for Countable Discrete Distributions

Authors:Evan Archer, Il Memming Park, Jonathan Pillow

View PDF

Abstract:We consider the problem of estimating Shannon's entropy $H$ from discrete data, in cases where the number of possible symbols is unknown or even countably infinite. The Pitman-Yor process, a generalization of Dirichlet process, provides a tractable prior distribution over the space of countably infinite discrete distributions, and has found major applications in Bayesian non-parametric statistics and machine learning. Here we show that it also provides a natural family of priors for Bayesian entropy estimation, due to the fact that moments of the induced posterior distribution over $H$ can be computed analytically. We derive formulas for the posterior mean (Bayes' least squares estimate) and variance under Dirichlet and Pitman-Yor process priors. Moreover, we show that a fixed Dirichlet or Pitman-Yor process prior implies a narrow prior distribution over $H$, meaning the prior strongly determines the entropy estimate in the under-sampled regime. We derive a family of continuous mixing measures such that the resulting mixture of Pitman-Yor processes produces an approximately flat prior over $H$. We show that the resulting Pitman-Yor Mixture (PYM) entropy estimator is consistent for a large class of distributions. We explore the theoretical properties of the resulting estimator, and show that it performs well both in simulation and in application to real data.

Comments:	38 pages LaTeX. Revised and resubmitted to JMLR
Subjects:	Information Theory (cs.IT)
Cite as:	arXiv:1302.0328 [cs.IT]
	(or arXiv:1302.0328v3 [cs.IT] for this version)
	https://doi.org/10.48550/arXiv.1302.0328

Submission history

From: Il Memming Park [view email]
[v1] Sat, 2 Feb 2013 01:04:11 UTC (1,503 KB)
[v2] Tue, 16 Apr 2013 21:55:47 UTC (1,511 KB)
[v3] Wed, 9 Apr 2014 21:47:32 UTC (1,719 KB)

Computer Science > Information Theory

Title:Bayesian Entropy Estimation for Countable Discrete Distributions

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Theory

Title:Bayesian Entropy Estimation for Countable Discrete Distributions

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators