Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing Inequality

Braverman, Mark; Garg, Ankit; Ma, Tengyu; Nguyen, Huy L.; Woodruff, David P.

Computer Science > Machine Learning

arXiv:1506.07216 (cs)

[Submitted on 24 Jun 2015 (v1), last revised 10 May 2016 (this version, v3)]

Title:Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing Inequality

Authors:Mark Braverman, Ankit Garg, Tengyu Ma, Huy L. Nguyen, David P. Woodruff

View PDF

Abstract:We study the tradeoff between the statistical error and communication cost of distributed statistical estimation problems in high dimensions. In the distributed sparse Gaussian mean estimation problem, each of the $m$ machines receives $n$ data points from a $d$-dimensional Gaussian distribution with unknown mean $\theta$ which is promised to be $k$-sparse. The machines communicate by message passing and aim to estimate the mean $\theta$. We provide a tight (up to logarithmic factors) tradeoff between the estimation error and the number of bits communicated between the machines. This directly leads to a lower bound for the distributed \textit{sparse linear regression} problem: to achieve the statistical minimax error, the total communication is at least $\Omega(\min\{n,d\}m)$, where $n$ is the number of observations that each machine receives and $d$ is the ambient dimension. These lower results improve upon [Sha14,SD'14] by allowing multi-round iterative communication model. We also give the first optimal simultaneous protocol in the dense case for mean estimation.
As our main technique, we prove a \textit{distributed data processing inequality}, as a generalization of usual data processing inequalities, which might be of independent interest and useful for other problems.

Comments:	To appear at STOC 2016. Fixed typos in theorem 4.5 and incorporated reviewers' suggestions
Subjects:	Machine Learning (cs.LG); Computational Complexity (cs.CC); Information Theory (cs.IT); Machine Learning (stat.ML)
Cite as:	arXiv:1506.07216 [cs.LG]
	(or arXiv:1506.07216v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1506.07216

Submission history

From: Tengyu Ma [view email]
[v1] Wed, 24 Jun 2015 01:01:41 UTC (1,589 KB)
[v2] Sun, 22 Nov 2015 23:37:03 UTC (58 KB)
[v3] Tue, 10 May 2016 00:58:29 UTC (459 KB)

Computer Science > Machine Learning

Title:Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing Inequality

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing Inequality

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators