$\ell_p$ Testing and Learning of Discrete Distributions

Waggoner, Bo

doi:10.1145/2688073.2688095

Computer Science > Data Structures and Algorithms

arXiv:1412.2314 (cs)

[Submitted on 7 Dec 2014 (v1), last revised 21 Mar 2015 (this version, v4)]

Title:$\ell_p$ Testing and Learning of Discrete Distributions

Authors:Bo Waggoner

View PDF

Abstract:The classic problems of testing uniformity of and learning a discrete distribution, given access to independent samples from it, are examined under general $\ell_p$ metrics. The intuitions and results often contrast with the classic $\ell_1$ case. For $p > 1$, we can learn and test with a number of samples that is independent of the support size of the distribution: With an $\ell_p$ tolerance $\epsilon$, $O(\max\{ \sqrt{1/\epsilon^q}, 1/\epsilon^2 \})$ samples suffice for testing uniformity and $O(\max\{ 1/\epsilon^q, 1/\epsilon^2\})$ samples suffice for learning, where $q=p/(p-1)$ is the conjugate of $p$. As this parallels the intuition that $O(\sqrt{n})$ and $O(n)$ samples suffice for the $\ell_1$ case, it seems that $1/\epsilon^q$ acts as an upper bound on the "apparent" support size.
For some $\ell_p$ metrics, uniformity testing becomes easier over larger supports: a 6-sided die requires fewer trials to test for fairness than a 2-sided coin, and a card-shuffler requires fewer trials than the die. In fact, this inverse dependence on support size holds if and only if $p > \frac{4}{3}$. The uniformity testing algorithm simply thresholds the number of "collisions" or "coincidences" and has an optimal sample complexity up to constant factors for all $1 \leq p \leq 2$. Another algorithm gives order-optimal sample complexity for $\ell_{\infty}$ uniformity testing. Meanwhile, the most natural learning algorithm is shown to have order-optimal sample complexity for all $\ell_p$ metrics.
The author thanks Clément Canonne for discussions and contributions to this work.

Comments:	This is the full version of the paper appearing at ITCS 2015. Two columns. 24 pages, of which 14 appendix
Subjects:	Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Statistics Theory (math.ST)
ACM classes:	F.2.0; G.3
Cite as:	arXiv:1412.2314 [cs.DS]
	(or arXiv:1412.2314v4 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.1412.2314
Related DOI:	https://doi.org/10.1145/2688073.2688095

Submission history

From: Bo Waggoner [view email]
[v1] Sun, 7 Dec 2014 03:57:29 UTC (260 KB)
[v2] Thu, 8 Jan 2015 17:53:34 UTC (261 KB)
[v3] Mon, 19 Jan 2015 13:34:20 UTC (262 KB)
[v4] Sat, 21 Mar 2015 17:30:44 UTC (263 KB)

Computer Science > Data Structures and Algorithms

Title:$\ell_p$ Testing and Learning of Discrete Distributions

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Data Structures and Algorithms

Title:$\ell_p$ Testing and Learning of Discrete Distributions

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators