Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3411764.3445306acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article
Open access

Whither AutoML? Understanding the Role of Automation in Machine Learning Workflows

Published: 07 May 2021 Publication History

Abstract

Efforts to make machine learning more widely accessible have led to a rapid increase in Auto-ML tools that aim to automate the process of training and deploying machine learning. To understand how Auto-ML tools are used in practice today, we performed a qualitative study with participants ranging from novice hobbyists to industry researchers who use Auto-ML tools. We present insights into the benefits and deficiencies of existing tools, as well as the respective roles of the human and automation in ML workflows. Finally, we discuss design implications for the future of Auto-ML tool development. We argue that instead of full automation being the ultimate goal of Auto-ML, designers of these tools should focus on supporting a partnership between the user and the Auto-ML tool. This means that a range of Auto-ML tools will need to be developed to support varying user goals such as simplicity, reproducibility, and reliability.

References

[1]
Mark S. Ackerman. 2000. The Intellectual Challenge of CSCW: The Gap between Social Requirements and Technical Feasibility. Hum.-Comput. Interact. 15, 2 (Sept. 2000), 179–203. https://doi.org/10.1207/S15327051HCI1523_5
[2]
Google AI. 2017. Facets: An Open Source Visualization Tool for Machine Learning Training Data. https://ai.googleblog.com/2017/07/facets-open-source-visualization-tool.html
[3]
AmazonSagemakerAutopilot. 2020. Automate model development with Amazon SageMaker Autopilot. Website. Retrieved July 26, 2020 from https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-automate-model-development.html
[4]
Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software Engineering for Machine Learning: A Case Study. In Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice(ICSE-SEIP ’19). IEEE Press, Montreal, Quebec, Canada, 291–300. https://doi.org/10.1109/ICSE-SEIP.2019.00042
[5]
Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the People: The Role of Humans in Interactive Machine Learning. AI Magazine 35, 4 (2014), 105–120. https://doi.org/10.1609/aimag.v35i4.2513
[6]
Saleema Amershi, Max Chickering, Steven M. Drucker, Bongshin Lee, Patrice Y. Simard, and Jina Suh. 2015. ModelTracker: Redesigning Performance Analysis Tools for Machine Learning. In Proceedings of the 33rd ACM SIGCHI Conference on Human Factors in Computing Systems. ACM, Seoul, Korea, 337–346. https://doi.org/10.1145/2702123.2702509
[7]
Alexander Ratner Stephen H Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and Christopher Ré. 2017. Snorkel: Rapid Training Data Creation with Weak Supervision. Proceedings of the VLDB Endowment 11, 3 (2017), 269–282.
[8]
Ioana Baldini, Paul Castro, Kerry Chang, Perry Cheng, Stephen Fink, Vatche Ishakian, Nick Mitchell, Vinod Muthusamy, Rodric Rabbah, Aleksander Slominski, 2017. Serverless computing: Current trends and open problems. In Research Advances in Cloud Computing. Springer, none, 1–20.
[9]
Eran Bringer, Abraham Israeli, Yoav Shoham, Alex Ratner, and Christopher Ré. 2019. Osprey: Weak Supervision of Imbalanced Extraction Problems without Code. In Proceedings of the 3rd International Workshop on Data Management for End-to-End Machine Learning(DEEM’19). Association for Computing Machinery, New York, NY, USA, Article 4, 11 pages. https://doi.org/10.1145/3329486.3329492
[10]
Jenna Burrell. 2016. How the machine ‘thinks’: Understanding opacity in machine learning algorithms. Big Data & Society 3, 1 (2016), 2053951715622512. https://doi.org/10.1177/2053951715622512
[11]
Corinna Cortes, Xavi Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, and Scott Yang. 2017. AdaNet: Adaptive Structural Learning of Artificial Neural Networks. arxiv:cs.LG/1607.01097
[12]
Datarobot. 2020. DataRobot Automated Machine Learning. Website. Retrieved July 18, 2020 from https://www.datarobot.com/platform/automated-machine-learning/
[13]
dedoose. 2020. Dedoose. Website. Retrieved September 15, 2020 from https://www.dedoose.com
[14]
Jaimie Drozdal, Justin Weisz, Dakuo Wang, Gaurav Dass, Bingsheng Yao, Changruo Zhao, Michael Muller, Lin Ju, and Hui Su. 2020. Trust in AutoML: Exploring Information Needs for Establishing Trust in Automated Machine Learning Systems. In Proceedings of the 25th International Conference on Intelligent User Interfaces(IUI ’20). Association for Computing Machinery, New York, NY, USA, 297–307. https://doi.org/10.1145/3377325.3377501
[15]
Sylvain Duranton, Jörg Erlebach, Camille Brégé, Jane Danziger, Andrea Gallego, and Marc Pauly. 2020. What’s Keeping Women Out of Data Science?https://www.bcg.com/en-us/publications/2020/what-keeps-women-out-data-science.
[16]
Jerry Alan Fails and Dan R. Olsen. 2003. Interactive Machine Learning. In Proceedings of the 8th International Conference on Intelligent User Interfaces(IUI ’03). Association for Computing Machinery, New York, NY, USA, 39–45. https://doi.org/10.1145/604045.604056
[17]
Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Tobias Springenberg, Manuel Blum, and Frank Hutter. 2015. Efficient and Robust Automated Machine Learning. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2(NIPS’15). MIT Press, Cambridge, MA, USA, 2755–2763.
[18]
Rebecca Fiebrink, Perry R Cook, and Daniel Trueman. 2011. Human Model Evaluation in Interactive Supervised Learning. CHI 2011 4, 2 (2011), 147–156. https://doi.org/10.14529/jsfi170202
[19]
Yolanda Gil, James Honaker, Shikhar Gupta, Yibo Ma, Vito D’Orazio, Daniel Garijo, Shruti Gadewar, Qifan Yang, and Neda Jahanshad. 2019. Towards Human-Guided Machine Learning. In Proceedings of the 24th International Conference on Intelligent User Interfaces(IUI ’19). Association for Computing Machinery, New York, NY, USA, 614–624. https://doi.org/10.1145/3301275.3302324
[20]
GoogleCloudAutoML. 2020. Google Cloud AutoML. Website. Retrieved July 18, 2020 from https://cloud.google.com/automl
[21]
H2O.ai. 2020. H2o.ai Automated Machine Learning. Website. Retrieved July 18, 2020 from https://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html#automl-interface
[22]
Fred Hohman, Kanit Wongsuphasawat, Mary Beth Kery, and Kayur Patel. 2020. Understanding and Visualizing Data Iteration in Machine Learning. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems(CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376177
[23]
Sungsoo Ray Hong, Jessica Hullman, and Enrico Bertini. 2020. Human Factors in Model Interpretability: Industry Practices, Challenges, and Needs. Proceedings of the ACM on Human-Computer Interaction 4, CSCW1(2020), 1–26.
[24]
Haifeng Jin, Qingquan Song, and Xia Hu. 2019. Auto-Keras: An Efficient Neural Architecture Search System. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD ’19). Association for Computing Machinery, New York, NY, USA, 1946–1956. https://doi.org/10.1145/3292500.3330648
[25]
S. Kandel, A. Paepcke, J. M. Hellerstein, and J. Heer. 2012. Enterprise Data Analysis and Visualization: An Interview Study. IEEE Transactions on Visualization and Computer Graphics 18, 12 (Dec 2012), 2917–2926. https://doi.org/10.1109/TVCG.2012.219
[26]
Trang T Le, Weixuan Fu, and Jason H Moore. 2020. Scaling tree-based automated machine learning to biomedical big data with a feature set selector. Bioinformatics 36, 1 (2020), 250–256.
[27]
Angela Lee, Doris Xin, Doris Lee, and Aditya Parameswaran. 2020. Demystifying a Dark Art: Understanding Real-World Machine Learning Model Development. arxiv:cs.LG/2005.01520
[28]
Doris Jung Lin Lee, Stephen Macke, Doris Xin, Angela Lee, Silu Huang, and Aditya G. Parameswaran. 2019. A Human-in-the-loop Perspective on AutoML: Milestones and the Road Ahead. IEEE Data Eng. Bull. 42(2019), 59–70.
[29]
Nora McDonald, Sarita Schoenebeck, and Andrea Forte. 2019. Reliability and Inter-rater Reliability in Qualitative Research: Norms and Guidelines for CSCW and HCI Practice. Proceedings of the ACM on Human-Computer Interaction 3 (11 2019), 1–23. https://doi.org/10.1145/3359174
[30]
Wes McKinney 2010. Data structures for statistical computing in python. In Proceedings of the 9th Python in Science Conference, Vol. 445. SciPy 2010, Austin, TX, 51–56.
[31]
MicrosoftAzureAutomatedML. 2020. Microsoft Azure Automated Machine Learning. Website. Retrieved September 15, 2020 from https://azure.microsoft.com/en-us/services/machine-learning/automatedml/
[32]
Piero Molino, Yaroslav Dudin, and Sai Sumanth Miryala. 2019. Ludwig: a type-based declarative deep learning toolbox. arxiv:cs.LG/1909.07930
[33]
Jorge Piazentin Ono, Sonia Castelo, Roque Lopez, Enrico Bertini, Juliana Freire, and Claudio Silva. 2020. PipelineProfiler: A Visual Analytics Tool for the Exploration of AutoML Pipelines. arxiv:cs.HC/2005.00160
[34]
Shoumik Palkar, James J Thomas, Anil Shanbhag, Deepak Narayanan, Holger Pirk, Malte Schwarzkopf, Saman Amarasinghe, Matei Zaharia, and Stanford InfoLab. 2017. Weld: A common runtime for high performance data analytics. In Conference on Innovative Data Systems Research (CIDR). CIDR, Chaminade, California, 45.
[35]
Kayur Patel, Naomi Bancroft, Steven M. Drucker, James Fogarty, Andrew J. Ko, and James Landay. 2010. Gestalt: Integrated Support for Implementation and Analysis in Machine Learning. In Proceedings of the 23nd Annual ACM Symposium on User Interface Software and Technology(UIST ’10). Association for Computing Machinery, New York, NY, USA, 37–46. https://doi.org/10.1145/1866029.1866038
[36]
Gonzalo Ramos, Christopher Meek, Patrice Simard, Jina Suh, and Soroush Ghorashi. 2020. Interactive machine teaching: a human-centered approach to building machine-learned models. Human-Computer Interaction 35 (04 2020), 1–39. https://doi.org/10.1080/07370024.2020.1734931
[37]
Yuji Roh, Geon Heo, and Steven Euijong Whang. 2019. A Survey on Data Collection for Machine Learning: a Big Data – AI Integration Perspective. arxiv:cs.LG/1811.03402
[38]
Justin Talbot, Bongshin Lee, Ashish Kapoor, and Desney S. Tan. 2009. EnsembleMatrix: Interactive Visualization to Support Machine Learning with Multiple Classifiers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems(CHI ’09). Association for Computing Machinery, New York, NY, USA, 1283–1292. https://doi.org/10.1145/1518701.1518895
[39]
TransmogriFAI. 2020. TransmogrifAI. Website. Retrieved September 5, 2020 from https://github.com/salesforce/TransmogrifAI
[40]
Michelle Vaccaro and Jim Waldo. 2019. The Effects of Mixing Machine Learning and Human Judgment. Commun. ACM 62, 11 (Oct. 2019), 104–110. https://doi.org/10.1145/3359338
[41]
Stefan Van Der Walt, S Chris Colbert, and Gael Varoquaux. 2011. The NumPy array: a structure for efficient numerical computation. Computing in Science & Engineering 13, 2 (2011), 22.
[42]
Dakuo Wang, Justin D. Weisz, Michael Muller, Parikshit Ram, Werner Geyer, Casey Dugan, Yla Tausczik, Horst Samulowitz, and Alexander Gray. 2019. Human-AI Collaboration in Data Science: Exploring Data Scientists’ Perceptions of Automated AI. Proc. ACM Hum.-Comput. Interact. 3, CSCW, Article 211 (Nov. 2019), 24 pages. https://doi.org/10.1145/3359313
[43]
Qianwen Wang, Yao Ming, Zhihua Jin, Qiaomu Shen, Dongyu Liu, Micah J. Smith, Kalyan Veeramachaneni, and Huamin Qu. 2019. ATMSeer: Increasing Transparency and Controllability in Automated Machine Learning. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems(CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3290605.3300911
[44]
Daniel Karl I. Weidele, Justin D. Weisz, Eno Oduor, Michael Muller, Josh Andres, Alexander Gray, and Dakuo Wang. 2019. AutoAIViz: Opening the Blackbox of Automated Artificial Intelligence with Conditional Parallel Coordinates. arxiv:cs.LG/1912.06723
[45]
Wikipedia contributors. 2020. One-hot — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=One-hot&oldid=975049657 [Online; accessed 17-September-2020].
[46]
Kanit Wongsuphasawat, Daniel Smilkov, James Wexler, Jimbo Wilson, Dandelion Mané, Doug Fritz, Dilip Krishnan, Fernanda B. Viégas, and Martin Wattenberg. 2018. Visualizing Dataflow Graphs of Deep Learning Models in TensorFlow. IEEE Transactions on Visualization and Computer Graphics 24, 1(2018), 1–12. https://doi.org/10.1109/TVCG.2017.2744878
[47]
Doris Xin, Stephen Macke, Litian Ma, Jialin Liu, Shuchen Song, and Aditya Parameswaran. 2018. Helix: holistic optimization for accelerating iterative machine learning. Proceedings of the VLDB Endowment 12, 4 (2018), 446–460.
[48]
Qian Yang, Jina Suh, Nan-Chen Chen, and Gonzalo Ramos. 2018. Grounding Interactive Machine Learning Tool Design in How Non-Experts Actually Build Models. In Proceedings of the 2018 Designing Interactive Systems Conference(DIS ’18). Association for Computing Machinery, New York, NY, USA, 573–584. https://doi.org/10.1145/3196709.3196729

Cited By

View all
  • (2024)AutoML Insights: Gaining Confidence to Operationalize Predictive ModelsThe New Era of Business Intelligence [Working Title]10.5772/intechopen.1004861Online publication date: 5-Jun-2024
  • (2024)PositionProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693301(30566-30584)Online publication date: 21-Jul-2024
  • (2024)The Adaptable and Resilient Safety System: The Human Factor in Future In-Time Aviation Safety Management SystemsAIAA SCITECH 2024 Forum10.2514/6.2024-1603Online publication date: 4-Jan-2024
  • Show More Cited By

Index Terms

  1. Whither AutoML? Understanding the Role of Automation in Machine Learning Workflows
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems
    May 2021
    10862 pages
    ISBN:9781450380966
    DOI:10.1145/3411764
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 May 2021

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    CHI '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

    Upcoming Conference

    CHI 2025
    ACM CHI Conference on Human Factors in Computing Systems
    April 26 - May 1, 2025
    Yokohama , Japan

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)991
    • Downloads (Last 6 weeks)135
    Reflects downloads up to 03 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)AutoML Insights: Gaining Confidence to Operationalize Predictive ModelsThe New Era of Business Intelligence [Working Title]10.5772/intechopen.1004861Online publication date: 5-Jun-2024
    • (2024)PositionProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693301(30566-30584)Online publication date: 21-Jul-2024
    • (2024)The Adaptable and Resilient Safety System: The Human Factor in Future In-Time Aviation Safety Management SystemsAIAA SCITECH 2024 Forum10.2514/6.2024-1603Online publication date: 4-Jan-2024
    • (2024)Application of Artificial Intelligence Technology for Prompt Diagnosis of Cast Iron Mechanical PropertiesDevices and Methods of Measurements10.21122/2220-9506-2024-15-3-231-23915:3(231-239)Online publication date: 6-Nov-2024
    • (2024)How Do Categorical Duplicates Affect ML? A New Benchmark and Empirical AnalysesProceedings of the VLDB Endowment10.14778/3648160.364817817:6(1391-1404)Online publication date: 3-May-2024
    • (2024)Unlocking AutoML: Enhancing Data with Deep Learning Algorithms for Medical ImagingJournal of Data and Information Quality10.1145/370589616:4(1-17)Online publication date: 26-Nov-2024
    • (2024)"We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine LearningProceedings of the ACM on Human-Computer Interaction10.1145/36536978:CSCW1(1-34)Online publication date: 26-Apr-2024
    • (2024)Towards Feature Engineering with Human and AI’s Knowledge: Understanding Data Science Practitioners’ Perceptions in Human&AI-Assisted Feature Engineering DesignProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661517(1789-1804)Online publication date: 1-Jul-2024
    • (2024)Towards a Non-Ideal Methodological Framework for Responsible MLProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642501(1-17)Online publication date: 11-May-2024
    • (2024)Visualization and Automation in Data Science: Exploring the Paradox of Humans-in-the-Loop2024 IEEE Visualization in Data Science (VDS)10.1109/VDS63897.2024.00005(1-5)Online publication date: 14-Oct-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media