short-paper

Learning to Validate the Predictions of Black Box Classifiers on Unseen Data

Authors:

Sebastian Schelter,

Tammo Rukat, and

Felix BiessmannAuthors Info & Claims

SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

June 2020

Pages 1289 - 1299

https://doi.org/10.1145/3318464.3380604

Published: 31 May 2020 Publication History

Abstract

Machine Learning (ML) models are difficult to maintain in production settings. In particular, deviations of the unseen serving data (for which we want to compute predictions) from the source data (on which the model was trained) pose a central challenge, especially when model training and prediction are outsourced via cloud services. Errors or shifts in the serving data can affect the predictive quality of a model, but are hard to detect for engineers operating ML deployments.

We propose a simple approach to automate the validation of deployed ML models by estimating the model's predictive performance on unseen, unlabeled serving data. In contrast to existing work, we do not require explicit distributional assumptions on the dataset shift between the source and serving data. Instead, we rely on a programmatic specification of typical cases of dataset shift and data errors. We use this information to learn a performance predictor for a pretrained black box model that automatically raises alarms when it detects performance drops on unseen serving data.

We experimentally evaluate our approach on various datasets, models and error types. We find that it reliably predicts the performance of black box models in the majority of cases, and outperforms several baselines even in the presence of unspecified data errors.

Supplementary Material

MP4 File (3318464.3380604.mp4)

Presentation Video

Download
126.10 MB

References

[1]

Denis Baylor, Eric Breck, Heng-Tze Cheng, Noah Fiedel, Chuan Yu Foo, Zakaria Haque, Salem Haykal, Mustafa Ispir, Vihan Jain, Levent Koc, Chiu Yuen Koo, Lukasz Lew, Clemens Mewald, Akshay Naresh Modi, Neoklis Polyzotis, Sukriti Ramesh, Sudip Roy, Steven Euijong Whang, Martin Wicke, Jarek Wilkiewicz, Xin Zhang, and Martin Zinkevich. 2017. TFX: A TensorFlow-Based Production-Scale Machine Learning Platform. KDD, 1387--1395.

[2]

Steffen Bickel, Michael Brückner, and Tobias Scheffer. 2009. Discriminative learning under covariate shift. JMLR, Vol. 10, 2137--2155.

Digital Library

[3]

Eric Breck, Neoklis Polyzotis, Sudip Roy, Steven Whang, and Martin Zinkevich. 2019. Data Validation for Machine Learning. SysML.

[4]

Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. KDD, 785--794.

Digital Library

[5]

Francois Chollet et al. 2015. Keras tensorflow.org/guide/keras.

[6]

Yeounoh Chung, Tim Kraska, Steven Euijong Whang, and Neoklis Polyzotis. 2018. Slice finder: Automated data slicing for model interpretability. SysML.

[7]

Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. Efficient and robust automated machine learning. NeurIPS, 2962--2970.

[8]

Joseph M Hellerstein. 2008. Quantitative data cleaning for large databases. United Nations Economic Commission for Europe (UNECE).

[9]

Jiayuan Huang, Arthur Gretton, Karsten M Borgwardt, Bernhard Schölkopf, and Alex J Smola. 2007. Correcting sample selection bias by unlabeled data. NeurIPS, 601--608.

[10]

Nick Hynes, D Sculley, and Michael Terry. 2017. The Data Linter: Lightweight, Automated Sanity Checking for ML Data Sets. ML Systems Workshop @ NeurIPS.

[11]

Haifeng Jin, Qingquan Song, and Xia Hu. 2018. Auto-Keras: Efficient Neural Architecture Search with Network Morphism. [arXiv]cs.LG/cs.LG/1806.10282

[12]

Arun Kumar, Robert McCann, Jeffrey Naughton, and Jignesh M Patel. 2016. Model selection management systems: The next frontier of advanced analytics. SIGMOD Record, Vol. 44, 4, 17--22.

Digital Library

[13]

Zachary C Lipton, Yu-Xiang Wang, and Alex Smola. 2018. Detecting and Correcting for Label Shift with Black Box Predictors. ICML.

[14]

Wes McKinney et almbox. 2010. Data structures for statistical computing in python. Python in Science, Vol. 445, 51--56.

[15]

Augustus Odena and Ian Goodfellow. 2018. Tensorfuzz: Debugging neural networks with coverage-guided fuzzing. arXiv preprint arXiv:1807.10875.

[16]

Randal S. Olson, Ryan J. Urbanowicz, Peter C. Andrews, Nicole A. Lavender, La Creis Kidd, and Jason H. Moore. 2016. Automating Biomedical Data Science Through Tree-Based Pipeline Optimization. EvoApplications.

[17]

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et almbox. 2011. Scikit-learn: Machine learning in Python. JMLR, Vol. 12, Oct, 2825--2830.

Digital Library

[18]

Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated whitebox testing of deep learning systems. SOSP, 1--18.

Digital Library

[19]

Neoklis Polyzotis, Sudip Roy, Steven Euijong Whang, and Martin Zinkevich. 2018. Data Lifecycle Challenges in Production Machine Learning: A Survey. SIGMOD Record, Vol. 47, 2, 17.

Digital Library

[20]

Stephan Rabanser, Stephan Günnemann, and Zachary C Lipton. 2019. Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift. NeurIPS.

[21]

Sergey Redyuk, Sebastian Schelter, Tammo Rukat, Volker Markl, and Felix Biessmann. 2019. Learning to Validate the Predictions of Black Box Machine Learning Models on Unseen Data. Human-in-the-Loop Data Analytics workshop at SIGMOD.

Digital Library

[22]

Sebastian Schelter, Felix Biessmann, Tim Januschowski, David Salinas, Stephan Seufert, and Gyuri Szarvas. 2018a. On Challenges in Machine Learning Model Management. IEEE Data Engineering Bulletin, Vol. 41.

[23]

Sebastian Schelter, Dustin Lange, Philipp Schmidt, Meltem Celikel, Felix Biessmann, and Andreas Grafberger. 2018b. Automating large-scale data quality verification. PVLDB, Vol. 11, 12, 1781--1794.

Digital Library

[24]

S. Schelter, J. Soto, V. Markl, D. Burdick, B. Reinwald, and A. Evfimievski. 2015. Efficient sample generation for scalable meta learning. ICDE, 1191--1202.

[25]

D Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. Hidden technical debt in machine learning systems. NeurIPS, 2503--2511.

[26]

Burr Settles. 2010. Active Learning Literature Survey. Technical Report 1648. University ofWisconsin--Madison.

[27]

Masashi Sugiyama and Motoaki Kawanabe. 2012. Machine Learning in Non-Stationary Environments - Introduction to Covariate Shift Adaptation .MIT Press.

[28]

Masashi Sugiyama, Neil D Lawrence, Anton Schwaighofer, et almbox. 2017. Dataset shift in machine learning .MIT Press.

[29]

Paul von Bünau, Frank C. Meinecke, Franz C. Király, and Klaus-Robert Müller. 2009. Finding Stationary Subspaces in Multivariate Time Series. Phys. Rev. Lett., Vol. 103. Issue 21.

[30]

Pei Wang and Yeye He. 2019. Uni-Detect: A Unified Approach to Automated Error Detection in Tables. SIGMOD, 811--828.

[31]

Kun Zhang, Bernhard Schölkopf, Krikamol Muandet, and Zhikun Wang. 2013. Domain Adaptation under Target and Conditional Shift. ICML, Vol. 28, 819--827.

Cited By

Sirin UIdreos S(2024)The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage FormatProceedings of the ACM on Management of Data10.1145/36393072:1(1-31)Online publication date: 26-Mar-2024
https://doi.org/10.1145/3639307
Siddiqi SKern RBoehm M(2023)SAGA: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning ApplicationsProceedings of the ACM on Management of Data10.1145/36173381:3(1-26)Online publication date: 13-Nov-2023
https://dl.acm.org/doi/10.1145/3617338
Wang YYang ZLiu JZhang WCui B(2023)Scapin: Scalable Graph Structure Perturbation by Augmented Influence MaximizationProceedings of the ACM on Management of Data10.1145/35892911:2(1-21)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1145/3589291
Show More Cited By

Index Terms

Learning to Validate the Predictions of Black Box Classifiers on Unseen Data
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Integrity checking

Recommendations

Learning to Validate the Predictions of Black Box Machine Learning Models on Unseen Data
HILDA '19: Proceedings of the Workshop on Human-In-the-Loop Data Analytics

When end users apply a machine learning (ML) model on new unlabeled data, it is difficult for them to decide whether they can trust its predictions. Errors or shifts in the target data can lead to hard-to-detect drops in the predictive quality of the ...
Read More
Predicting unseen labels using label hierarchies in large-scale multi-label learning
ECMLPKDD'15: Proceedings of the 2015th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I

An important problem in multi-label classification is to capture label patterns or underlying structures that have an impact on such patterns. One way of learning underlying structures over labels is to project both instances and labels into the same ...
Read More
Is Your Anomaly Detector Ready for Change? Adapting AIOps Solutions to the Real World
CAIN '24: Proceedings of the IEEE/ACM 3rd International Conference on AI Engineering - Software Engineering for AI

Anomaly detection techniques are essential in automating the monitoring of IT systems and operations. These techniques imply that machine learning algorithms are trained on operational data corresponding to a specific period of time and that they are ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data

June 2020

2925 pages

ISBN:9781450367356

DOI:10.1145/3318464

General Chairs:
David Maier
Portland State University, USA
,
Rachel Pottinger
University of British Columbia, Canada
,
Program Chairs:
AnHai Doan
University of Wisconsin, USA
,
Wang-Chiew Tan
Megagon Labs, USA
,
Publications Chairs:
Abdussalam Alawini
University of Illinois at Urbana-Champaign, USA
,
Hung Q. Ngo
RelationalAI, USA

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

SIGMOD/PODS '20

Sponsor:

SIGMOD

SIGMOD/PODS '20: International Conference on Management of Data

June 14 - 19, 2020

OR, Portland, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
749
Total Downloads

Downloads (Last 12 months)63
Downloads (Last 6 weeks)4

Other Metrics

View Author Metrics

Citations

Cited By

Sirin UIdreos S(2024)The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage FormatProceedings of the ACM on Management of Data10.1145/36393072:1(1-31)Online publication date: 26-Mar-2024
https://doi.org/10.1145/3639307
Siddiqi SKern RBoehm M(2023)SAGA: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning ApplicationsProceedings of the ACM on Management of Data10.1145/36173381:3(1-26)Online publication date: 13-Nov-2023
https://dl.acm.org/doi/10.1145/3617338
Wang YYang ZLiu JZhang WCui B(2023)Scapin: Scalable Graph Structure Perturbation by Augmented Influence MaximizationProceedings of the ACM on Management of Data10.1145/35892911:2(1-21)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1145/3589291
Jagadish HStoyanovich JHowe B(2023)The Many Facets of Data EquityJournal of Data and Information Quality10.1145/353342514:4(1-21)Online publication date: 7-Feb-2023
https://dl.acm.org/doi/10.1145/3533425
Roschewitz MGlocker B(2023)Distance Matters For Improving Performance Estimation Under Covariate Shift2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)10.1109/ICCVW60793.2023.00489(4551-4561)Online publication date: 2-Oct-2023
https://doi.org/10.1109/ICCVW60793.2023.00489
Leibrandt RGünnemann S(2023)Generalized density attractor clustering for incomplete dataData Mining and Knowledge Discovery10.1007/s10618-022-00904-637:2(970-1009)Online publication date: 18-Jan-2023
https://doi.org/10.1007/s10618-022-00904-6
Pocevičiūtė MEilertsen GGarvin SLundström C(2023)Detecting Domain Shift in Multiple Instance Learning for Digital Pathology Using Fréchet Domain DistanceMedical Image Computing and Computer Assisted Intervention – MICCAI 202310.1007/978-3-031-43904-9_16(157-167)Online publication date: 1-Oct-2023
https://doi.org/10.1007/978-3-031-43904-9_16
Baek CJiang YRaghunathan AKolter ZKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Agreement-on-the-lineProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601671(19274-19289)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601671
Lahoti PGummadi KWeikum G(2022)Responsible model deployment via model-agnostic uncertainty learningMachine Learning10.1007/s10994-022-06248-y112:3(939-970)Online publication date: 18-Oct-2022
https://doi.org/10.1007/s10994-022-06248-y
Cong ZChu LYang YPei J(2021)Comprehensible counterfactual explanation on Kolmogorov-Smirnov testProceedings of the VLDB Endowment10.14778/3461535.346154614:9(1583-1596)Online publication date: 22-Oct-2021
https://dl.acm.org/doi/10.14778/3461535.3461546
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents