A Theory of Statistical Inference for Ensuring the Robustness of Scientific Results

Coker, Beau; Rudin, Cynthia; King, Gary

doi:10.1287/mnsc.2020.3818

Statistics > Machine Learning

arXiv:1804.08646 (stat)

[Submitted on 23 Apr 2018 (v1), last revised 13 Oct 2020 (this version, v2)]

Title:A Theory of Statistical Inference for Ensuring the Robustness of Scientific Results

Authors:Beau Coker, Cynthia Rudin, Gary King

View PDF

Abstract:Inference is the process of using facts we know to learn about facts we do not know. A theory of inference gives assumptions necessary to get from the former to the latter, along with a definition for and summary of the resulting uncertainty. Any one theory of inference is neither right nor wrong, but merely an axiom that may or may not be useful. Each of the many diverse theories of inference can be valuable for certain applications. However, no existing theory of inference addresses the tendency to choose, from the range of plausible data analysis specifications consistent with prior evidence, those that inadvertently favor one's own hypotheses. Since the biases from these choices are a growing concern across scientific fields, and in a sense the reason the scientific community was invented in the first place, we introduce a new theory of inference designed to address this critical problem. We introduce hacking intervals, which are the range of a summary statistic one may obtain given a class of possible endogenous manipulations of the data. Hacking intervals require no appeal to hypothetical data sets drawn from imaginary superpopulations. A scientific result with a small hacking interval is more robust to researcher manipulation than one with a larger interval, and is often easier to interpret than a classical confidence interval. Some versions of hacking intervals turn out to be equivalent to classical confidence intervals, which means they may also provide a more intuitive and potentially more useful interpretation of classical confidence intervals.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:1804.08646 [stat.ML]
	(or arXiv:1804.08646v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.1804.08646
Journal reference:	Management Science, March 2021
Related DOI:	https://doi.org/10.1287/mnsc.2020.3818

Submission history

From: Beau Coker [view email]
[v1] Mon, 23 Apr 2018 18:13:41 UTC (4,738 KB)
[v2] Tue, 13 Oct 2020 02:28:23 UTC (1,562 KB)

Statistics > Machine Learning

Title:A Theory of Statistical Inference for Ensuring the Robustness of Scientific Results

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:A Theory of Statistical Inference for Ensuring the Robustness of Scientific Results

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators