Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3318464.3380604acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
short-paper

Learning to Validate the Predictions of Black Box Classifiers on Unseen Data

Published: 31 May 2020 Publication History
  • Get Citation Alerts
  • Abstract

    Machine Learning (ML) models are difficult to maintain in production settings. In particular, deviations of the unseen serving data (for which we want to compute predictions) from the source data (on which the model was trained) pose a central challenge, especially when model training and prediction are outsourced via cloud services. Errors or shifts in the serving data can affect the predictive quality of a model, but are hard to detect for engineers operating ML deployments.
    We propose a simple approach to automate the validation of deployed ML models by estimating the model's predictive performance on unseen, unlabeled serving data. In contrast to existing work, we do not require explicit distributional assumptions on the dataset shift between the source and serving data. Instead, we rely on a programmatic specification of typical cases of dataset shift and data errors. We use this information to learn a performance predictor for a pretrained black box model that automatically raises alarms when it detects performance drops on unseen serving data.
    We experimentally evaluate our approach on various datasets, models and error types. We find that it reliably predicts the performance of black box models in the majority of cases, and outperforms several baselines even in the presence of unspecified data errors.

    Supplementary Material

    MP4 File (3318464.3380604.mp4)
    Presentation Video

    References

    [1]
    Denis Baylor, Eric Breck, Heng-Tze Cheng, Noah Fiedel, Chuan Yu Foo, Zakaria Haque, Salem Haykal, Mustafa Ispir, Vihan Jain, Levent Koc, Chiu Yuen Koo, Lukasz Lew, Clemens Mewald, Akshay Naresh Modi, Neoklis Polyzotis, Sukriti Ramesh, Sudip Roy, Steven Euijong Whang, Martin Wicke, Jarek Wilkiewicz, Xin Zhang, and Martin Zinkevich. 2017. TFX: A TensorFlow-Based Production-Scale Machine Learning Platform. KDD, 1387--1395.
    [2]
    Steffen Bickel, Michael Brückner, and Tobias Scheffer. 2009. Discriminative learning under covariate shift. JMLR, Vol. 10, 2137--2155.
    [3]
    Eric Breck, Neoklis Polyzotis, Sudip Roy, Steven Whang, and Martin Zinkevich. 2019. Data Validation for Machine Learning. SysML.
    [4]
    Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. KDD, 785--794.
    [5]
    Francois Chollet et al. 2015. Keras tensorflow.org/guide/keras.
    [6]
    Yeounoh Chung, Tim Kraska, Steven Euijong Whang, and Neoklis Polyzotis. 2018. Slice finder: Automated data slicing for model interpretability. SysML.
    [7]
    Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. Efficient and robust automated machine learning. NeurIPS, 2962--2970.
    [8]
    Joseph M Hellerstein. 2008. Quantitative data cleaning for large databases. United Nations Economic Commission for Europe (UNECE).
    [9]
    Jiayuan Huang, Arthur Gretton, Karsten M Borgwardt, Bernhard Schölkopf, and Alex J Smola. 2007. Correcting sample selection bias by unlabeled data. NeurIPS, 601--608.
    [10]
    Nick Hynes, D Sculley, and Michael Terry. 2017. The Data Linter: Lightweight, Automated Sanity Checking for ML Data Sets. ML Systems Workshop @ NeurIPS.
    [11]
    Haifeng Jin, Qingquan Song, and Xia Hu. 2018. Auto-Keras: Efficient Neural Architecture Search with Network Morphism. [arXiv]cs.LG/cs.LG/1806.10282
    [12]
    Arun Kumar, Robert McCann, Jeffrey Naughton, and Jignesh M Patel. 2016. Model selection management systems: The next frontier of advanced analytics. SIGMOD Record, Vol. 44, 4, 17--22.
    [13]
    Zachary C Lipton, Yu-Xiang Wang, and Alex Smola. 2018. Detecting and Correcting for Label Shift with Black Box Predictors. ICML.
    [14]
    Wes McKinney et almbox. 2010. Data structures for statistical computing in python. Python in Science, Vol. 445, 51--56.
    [15]
    Augustus Odena and Ian Goodfellow. 2018. Tensorfuzz: Debugging neural networks with coverage-guided fuzzing. arXiv preprint arXiv:1807.10875.
    [16]
    Randal S. Olson, Ryan J. Urbanowicz, Peter C. Andrews, Nicole A. Lavender, La Creis Kidd, and Jason H. Moore. 2016. Automating Biomedical Data Science Through Tree-Based Pipeline Optimization. EvoApplications.
    [17]
    Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et almbox. 2011. Scikit-learn: Machine learning in Python. JMLR, Vol. 12, Oct, 2825--2830.
    [18]
    Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated whitebox testing of deep learning systems. SOSP, 1--18.
    [19]
    Neoklis Polyzotis, Sudip Roy, Steven Euijong Whang, and Martin Zinkevich. 2018. Data Lifecycle Challenges in Production Machine Learning: A Survey. SIGMOD Record, Vol. 47, 2, 17.
    [20]
    Stephan Rabanser, Stephan Günnemann, and Zachary C Lipton. 2019. Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift. NeurIPS.
    [21]
    Sergey Redyuk, Sebastian Schelter, Tammo Rukat, Volker Markl, and Felix Biessmann. 2019. Learning to Validate the Predictions of Black Box Machine Learning Models on Unseen Data. Human-in-the-Loop Data Analytics workshop at SIGMOD.
    [22]
    Sebastian Schelter, Felix Biessmann, Tim Januschowski, David Salinas, Stephan Seufert, and Gyuri Szarvas. 2018a. On Challenges in Machine Learning Model Management. IEEE Data Engineering Bulletin, Vol. 41.
    [23]
    Sebastian Schelter, Dustin Lange, Philipp Schmidt, Meltem Celikel, Felix Biessmann, and Andreas Grafberger. 2018b. Automating large-scale data quality verification. PVLDB, Vol. 11, 12, 1781--1794.
    [24]
    S. Schelter, J. Soto, V. Markl, D. Burdick, B. Reinwald, and A. Evfimievski. 2015. Efficient sample generation for scalable meta learning. ICDE, 1191--1202.
    [25]
    D Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. Hidden technical debt in machine learning systems. NeurIPS, 2503--2511.
    [26]
    Burr Settles. 2010. Active Learning Literature Survey. Technical Report 1648. University ofWisconsin--Madison.
    [27]
    Masashi Sugiyama and Motoaki Kawanabe. 2012. Machine Learning in Non-Stationary Environments - Introduction to Covariate Shift Adaptation .MIT Press.
    [28]
    Masashi Sugiyama, Neil D Lawrence, Anton Schwaighofer, et almbox. 2017. Dataset shift in machine learning .MIT Press.
    [29]
    Paul von Bünau, Frank C. Meinecke, Franz C. Király, and Klaus-Robert Müller. 2009. Finding Stationary Subspaces in Multivariate Time Series. Phys. Rev. Lett., Vol. 103. Issue 21.
    [30]
    Pei Wang and Yeye He. 2019. Uni-Detect: A Unified Approach to Automated Error Detection in Tables. SIGMOD, 811--828.
    [31]
    Kun Zhang, Bernhard Schölkopf, Krikamol Muandet, and Zhikun Wang. 2013. Domain Adaptation under Target and Conditional Shift. ICML, Vol. 28, 819--827.

    Cited By

    View all
    • (2024)The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage FormatProceedings of the ACM on Management of Data10.1145/36393072:1(1-31)Online publication date: 26-Mar-2024
    • (2023)SAGA: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning ApplicationsProceedings of the ACM on Management of Data10.1145/36173381:3(1-26)Online publication date: 13-Nov-2023
    • (2023)Scapin: Scalable Graph Structure Perturbation by Augmented Influence MaximizationProceedings of the ACM on Management of Data10.1145/35892911:2(1-21)Online publication date: 20-Jun-2023
    • Show More Cited By

    Index Terms

    1. Learning to Validate the Predictions of Black Box Classifiers on Unseen Data

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
      June 2020
      2925 pages
      ISBN:9781450367356
      DOI:10.1145/3318464
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 31 May 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. model monitoring
      2. performance prediction
      3. shift detection

      Qualifiers

      • Short-paper

      Conference

      SIGMOD/PODS '20
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 785 of 4,003 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)63
      • Downloads (Last 6 weeks)4

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage FormatProceedings of the ACM on Management of Data10.1145/36393072:1(1-31)Online publication date: 26-Mar-2024
      • (2023)SAGA: A Scalable Framework for Optimizing Data Cleaning Pipelines for Machine Learning ApplicationsProceedings of the ACM on Management of Data10.1145/36173381:3(1-26)Online publication date: 13-Nov-2023
      • (2023)Scapin: Scalable Graph Structure Perturbation by Augmented Influence MaximizationProceedings of the ACM on Management of Data10.1145/35892911:2(1-21)Online publication date: 20-Jun-2023
      • (2023)The Many Facets of Data EquityJournal of Data and Information Quality10.1145/353342514:4(1-21)Online publication date: 7-Feb-2023
      • (2023)Distance Matters For Improving Performance Estimation Under Covariate Shift2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)10.1109/ICCVW60793.2023.00489(4551-4561)Online publication date: 2-Oct-2023
      • (2023)Generalized density attractor clustering for incomplete dataData Mining and Knowledge Discovery10.1007/s10618-022-00904-637:2(970-1009)Online publication date: 18-Jan-2023
      • (2023)Detecting Domain Shift in Multiple Instance Learning for Digital Pathology Using Fréchet Domain DistanceMedical Image Computing and Computer Assisted Intervention – MICCAI 202310.1007/978-3-031-43904-9_16(157-167)Online publication date: 1-Oct-2023
      • (2022)Agreement-on-the-lineProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601671(19274-19289)Online publication date: 28-Nov-2022
      • (2022)Responsible model deployment via model-agnostic uncertainty learningMachine Learning10.1007/s10994-022-06248-y112:3(939-970)Online publication date: 18-Oct-2022
      • (2021)Comprehensible counterfactual explanation on Kolmogorov-Smirnov testProceedings of the VLDB Endowment10.14778/3461535.346154614:9(1583-1596)Online publication date: 22-Oct-2021
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media