Supersparse Linear Integer Models for Optimized Medical Scoring Systems

Ustun, Berk; Rudin, Cynthia

doi:10.1007/s10994-015-5528-6

Statistics > Machine Learning

arXiv:1502.04269 (stat)

[Submitted on 15 Feb 2015 (v1), last revised 26 Jan 2016 (this version, v3)]

Title:Supersparse Linear Integer Models for Optimized Medical Scoring Systems

Authors:Berk Ustun, Cynthia Rudin

View PDF

Abstract:Scoring systems are linear classification models that only require users to add, subtract and multiply a few small numbers in order to make a prediction. These models are in widespread use by the medical community, but are difficult to learn from data because they need to be accurate and sparse, have coprime integer coefficients, and satisfy multiple operational constraints. We present a new method for creating data-driven scoring systems called a Supersparse Linear Integer Model (SLIM). SLIM scoring systems are built by solving an integer program that directly encodes measures of accuracy (the 0-1 loss) and sparsity (the $\ell_0$-seminorm) while restricting coefficients to coprime integers. SLIM can seamlessly incorporate a wide range of operational constraints related to accuracy and sparsity, and can produce highly tailored models without parameter tuning. We provide bounds on the testing and training accuracy of SLIM scoring systems, and present a new data reduction technique that can improve scalability by eliminating a portion of the training data beforehand. Our paper includes results from a collaboration with the Massachusetts General Hospital Sleep Laboratory, where SLIM was used to create a highly tailored scoring system for sleep apnea screening

Comments:	This version reflects our findings on SLIM as of January 2016 (arXiv:1306.5860 and arXiv:1405.4047 are out-of-date). The final published version of this articled is available at this http URL
Subjects:	Machine Learning (stat.ML); Discrete Mathematics (cs.DM); Machine Learning (cs.LG); Applications (stat.AP); Methodology (stat.ME)
Cite as:	arXiv:1502.04269 [stat.ML]
	(or arXiv:1502.04269v3 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.1502.04269
Related DOI:	https://doi.org/10.1007/s10994-015-5528-6

Submission history

From: Berk Ustun [view email]
[v1] Sun, 15 Feb 2015 01:26:41 UTC (1,537 KB)
[v2] Mon, 18 May 2015 14:46:40 UTC (219 KB)
[v3] Tue, 26 Jan 2016 17:34:21 UTC (215 KB)

Statistics > Machine Learning

Title:Supersparse Linear Integer Models for Optimized Medical Scoring Systems

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Supersparse Linear Integer Models for Optimized Medical Scoring Systems

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators