Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Beat the Machine: Challenging Humans to Find a Predictive Model's “Unknown Unknowns”

Published: 04 March 2015 Publication History
  • Get Citation Alerts
  • Abstract

    We present techniques for gathering data that expose errors of automatic predictive models. In certain common settings, traditional methods for evaluating predictive models tend to miss rare but important errors—most importantly, cases for which the model is confident of its prediction (but wrong). In this article, we present a system that, in a game-like setting, asks humans to identify cases that will cause the predictive model-based system to fail. Such techniques are valuable in discovering problematic cases that may not reveal themselves during the normal operation of the system and may include cases that are rare but catastrophic. We describe the design of the system, including design iterations that did not quite work. In particular, the system incentivizes humans to provide examples that are difficult for the model to handle by providing a reward proportional to the magnitude of the predictive model's error. The humans are asked to “Beat the Machine” and find cases where the automatic model (“the Machine”) is wrong. Experiments show that the humans using Beat the Machine identify more errors than do traditional techniques for discovering errors in predictive models, and, indeed, they identify many more errors where the machine is (wrongly) confident it is correct. Furthermore, those cases the humans identify seem to be not simply outliers, but coherent areas missed completely by the model. Beat the Machine identifies the “unknown unknowns.” Beat the Machine has been deployed at an industrial scale by several companies. The main impact has been that firms are changing their perspective on and practice of evaluating predictive models.
    There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are also unknown unknowns. There are things we don't know we don't know.”
    --Donald Rumsfeld

    References

    [1]
    Josh Attenberg, Panagiotis G. Ipeirotis, and Foster J. Provost. 2011. Beat the machine: Challenging workers to find the unknown unknowns. In Proceedings of the 3rd Human Computation Workshop (HCOMP'11).
    [2]
    Josh Attenberg and Foster Provost. 2010. Inactive learning? Difficulties employing active learning in practice. SIGKDD Explorations 12, 2 (2010), 36--41.
    [3]
    C. Chow. 1970. On optimum recognition error and reject tradeoff. IEEE Transactions on Information Theory 16, 1 (Jan. 1970), 41--46.
    [4]
    C. K. Chow. 1957. An optimum character recognition system using decision functions. IRE Transactions on Electronic Computers EC-6, 4 (Dec. 1957), 247--254.
    [5]
    Pedro Domingos. 1999. Metacost: A general method for making classifiers cost-sensitive. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 155--164.
    [6]
    Charles Elkan. 2001. The foundations of cost-sensitive learning. In Proceedings of the 17th International Joint Conference on Artificial Intelligence- Volume 2. 973--978.
    [7]
    David D. Lewis and William A. Gale. 1994. A sequential algorithm for training text classifiers. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 3--12.
    [8]
    C. Perlich, B. Dalessandro, T. Raeder, O. Stitelman, and F. Provost. 2014. Machine learning for targeted display advertising: Transfer learning in action. Machine Learning 95, 1 (2014), 103--127.
    [9]
    Foster Provost and Tom Fawcett. 2013. Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking. O'Reilly Media.
    [10]
    Vikas C. Raykar, Shipeng Yu, Linda H. Zhao, Anna Jerebko, Charles Florin, Gerardo Hermosillo Valadez, Luca Bogoni, and Linda Moy. 2009. Supervised learning from multiple experts: Whom to trust when everyone lies a bit. In Proceedings of the 26th Annual International Conference on Machine Learning. 889--896.
    [11]
    R. Reiter. 1977. On Closed World Data Bases. Technical Report. University of British Columbia, Vancouver, BC, Canada.
    [12]
    Maytal Saar-Tsechansky and Foster Provost. 2004. Active sampling for class probability estimation and ranking. Machine Learning 54, 2 (2004), 153--178.
    [13]
    Robert E. Schapire. 1999. A brief introduction to boosting. In Proceedings of the 16th International Joint Conference on Artificial Intelligence - Volume 2.
    [14]
    Burr Settles. 2012. Active Learning. Vol. 6. Morgan & Claypool Publishers.
    [15]
    Victor S. Sheng, Foster Provost, and Panagiotis G. Ipeirotis. 2008. Get another label? Improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 614--622.
    [16]
    Kurt VanLehn. 1998. Analogy events: How examples are used during problem solving. Cognitive Science 22, 3 (1998), 347--388.
    [17]
    Gary M. Weiss. 2010. The impact of small disjuncts on classifier learning. In Data Mining, Robert Stahlbock, Sven F. Crone, and Stefan Lessmann (Eds.). Annals of Information Systems, Vol. 8. Springer, 193--226.
    [18]
    Patrick H. Winston. 1970. Learning Structural Descriptions From Examples. Technical Report. Massachusetts Institute of Technology.

    Cited By

    View all
    • (2024)Wikibench: Community-Driven Data Curation for AI Evaluation on WikipediaProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642278(1-24)Online publication date: 11-May-2024
    • (2024)Design Patterns for Machine Learning-Based Systems With Humans in the LoopIEEE Software10.1109/MS.2023.334025641:4(151-159)Online publication date: Jul-2024
    • (2024)Commercial Dispute Resolution and AIThe Cambridge Handbook of Private Law and Artificial Intelligence10.1017/9781108980197.027(511-533)Online publication date: 21-Mar-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Journal of Data and Information Quality
    Journal of Data and Information Quality  Volume 6, Issue 1
    March 2015
    23 pages
    ISSN:1936-1955
    EISSN:1936-1963
    DOI:10.1145/2742852
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 March 2015
    Accepted: 01 December 2014
    Revised: 01 November 2014
    Received: 01 February 2014
    Published in JDIQ Volume 6, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Crowdsourcing
    2. incentives
    3. machine learning evaluation
    4. model assessment
    5. risk identification
    6. system design

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • George Kellner Faculty Fellowship
    • Andre Meyer Faculty Fellowship
    • Google Focused Award
    • NEC Faculty Fellowship
    • Moore-Sloan Data Science Environment at NYU

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)112
    • Downloads (Last 6 weeks)8

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Wikibench: Community-Driven Data Curation for AI Evaluation on WikipediaProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642278(1-24)Online publication date: 11-May-2024
    • (2024)Design Patterns for Machine Learning-Based Systems With Humans in the LoopIEEE Software10.1109/MS.2023.334025641:4(151-159)Online publication date: Jul-2024
    • (2024)Commercial Dispute Resolution and AIThe Cambridge Handbook of Private Law and Artificial Intelligence10.1017/9781108980197.027(511-533)Online publication date: 21-Mar-2024
    • (2024)Corporate and Commercial LawThe Cambridge Handbook of Private Law and Artificial Intelligence10.1017/9781108980197.021(407-596)Online publication date: 21-Mar-2024
    • (2024)On monitorability of AIAI and Ethics10.1007/s43681-024-00420-xOnline publication date: 6-Feb-2024
    • (2023)Combining human intelligence and machine learning for fact-checking: Towards a hybrid human-in-the-loop frameworkIntelligenza Artificiale10.3233/IA-23001117:2(163-172)Online publication date: 20-Dec-2023
    • (2023)Working Together, Forever? Project Evaluation, Ai, and Managerial RedundancySSRN Electronic Journal10.2139/ssrn.4657010Online publication date: 2023
    • (2023)Human-AI Ensembles: When Can They Work?Journal of Management10.1177/01492063231194968Online publication date: 3-Oct-2023
    • (2023)Supporting Human-AI Collaboration in Auditing LLMs with LLMsProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society10.1145/3600211.3604712(913-926)Online publication date: 8-Aug-2023
    • (2023)Capturing Humans’ Mental Models of AI: An Item Response Theory ApproachProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency10.1145/3593013.3594111(1723-1734)Online publication date: 12-Jun-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media