research-article

Open access

Whither AutoML? Understanding the Role of Automation in Machine Learning Workflows

Authors:

Doris Jung-Lin Lee,

Niloufar Salehi,

Aditya ParameswaranAuthors Info & Claims

CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

Article No.: 83, Pages 1 - 16

https://doi.org/10.1145/3411764.3445306

Published: 07 May 2021 Publication History

All formats PDF

Abstract

Efforts to make machine learning more widely accessible have led to a rapid increase in Auto-ML tools that aim to automate the process of training and deploying machine learning. To understand how Auto-ML tools are used in practice today, we performed a qualitative study with participants ranging from novice hobbyists to industry researchers who use Auto-ML tools. We present insights into the benefits and deficiencies of existing tools, as well as the respective roles of the human and automation in ML workflows. Finally, we discuss design implications for the future of Auto-ML tool development. We argue that instead of full automation being the ultimate goal of Auto-ML, designers of these tools should focus on supporting a partnership between the user and the Auto-ML tool. This means that a range of Auto-ML tools will need to be developed to support varying user goals such as simplicity, reproducibility, and reliability.

References

[1]

Mark S. Ackerman. 2000. The Intellectual Challenge of CSCW: The Gap between Social Requirements and Technical Feasibility. Hum.-Comput. Interact. 15, 2 (Sept. 2000), 179–203. https://doi.org/10.1207/S15327051HCI1523_5

Digital Library

[2]

Google AI. 2017. Facets: An Open Source Visualization Tool for Machine Learning Training Data. https://ai.googleblog.com/2017/07/facets-open-source-visualization-tool.html

[3]

AmazonSagemakerAutopilot. 2020. Automate model development with Amazon SageMaker Autopilot. Website. Retrieved July 26, 2020 from https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-automate-model-development.html

[4]

Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software Engineering for Machine Learning: A Case Study. In Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice(ICSE-SEIP ’19). IEEE Press, Montreal, Quebec, Canada, 291–300. https://doi.org/10.1109/ICSE-SEIP.2019.00042

Digital Library

[5]

Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the People: The Role of Humans in Interactive Machine Learning. AI Magazine 35, 4 (2014), 105–120. https://doi.org/10.1609/aimag.v35i4.2513

Digital Library

[6]

Saleema Amershi, Max Chickering, Steven M. Drucker, Bongshin Lee, Patrice Y. Simard, and Jina Suh. 2015. ModelTracker: Redesigning Performance Analysis Tools for Machine Learning. In Proceedings of the 33rd ACM SIGCHI Conference on Human Factors in Computing Systems. ACM, Seoul, Korea, 337–346. https://doi.org/10.1145/2702123.2702509

Digital Library

[7]

Alexander Ratner Stephen H Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and Christopher Ré. 2017. Snorkel: Rapid Training Data Creation with Weak Supervision. Proceedings of the VLDB Endowment 11, 3 (2017), 269–282.

Digital Library

[8]

Ioana Baldini, Paul Castro, Kerry Chang, Perry Cheng, Stephen Fink, Vatche Ishakian, Nick Mitchell, Vinod Muthusamy, Rodric Rabbah, Aleksander Slominski, 2017. Serverless computing: Current trends and open problems. In Research Advances in Cloud Computing. Springer, none, 1–20.

[9]

Eran Bringer, Abraham Israeli, Yoav Shoham, Alex Ratner, and Christopher Ré. 2019. Osprey: Weak Supervision of Imbalanced Extraction Problems without Code. In Proceedings of the 3rd International Workshop on Data Management for End-to-End Machine Learning(DEEM’19). Association for Computing Machinery, New York, NY, USA, Article 4, 11 pages. https://doi.org/10.1145/3329486.3329492

Digital Library

[10]

Jenna Burrell. 2016. How the machine ‘thinks’: Understanding opacity in machine learning algorithms. Big Data & Society 3, 1 (2016), 2053951715622512. https://doi.org/10.1177/2053951715622512

[11]

Corinna Cortes, Xavi Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, and Scott Yang. 2017. AdaNet: Adaptive Structural Learning of Artificial Neural Networks. arxiv:cs.LG/1607.01097

[12]

Datarobot. 2020. DataRobot Automated Machine Learning. Website. Retrieved July 18, 2020 from https://www.datarobot.com/platform/automated-machine-learning/

[13]

dedoose. 2020. Dedoose. Website. Retrieved September 15, 2020 from https://www.dedoose.com

[14]

Jaimie Drozdal, Justin Weisz, Dakuo Wang, Gaurav Dass, Bingsheng Yao, Changruo Zhao, Michael Muller, Lin Ju, and Hui Su. 2020. Trust in AutoML: Exploring Information Needs for Establishing Trust in Automated Machine Learning Systems. In Proceedings of the 25th International Conference on Intelligent User Interfaces(IUI ’20). Association for Computing Machinery, New York, NY, USA, 297–307. https://doi.org/10.1145/3377325.3377501

Digital Library

[15]

Sylvain Duranton, Jörg Erlebach, Camille Brégé, Jane Danziger, Andrea Gallego, and Marc Pauly. 2020. What’s Keeping Women Out of Data Science?https://www.bcg.com/en-us/publications/2020/what-keeps-women-out-data-science.

[16]

Jerry Alan Fails and Dan R. Olsen. 2003. Interactive Machine Learning. In Proceedings of the 8th International Conference on Intelligent User Interfaces(IUI ’03). Association for Computing Machinery, New York, NY, USA, 39–45. https://doi.org/10.1145/604045.604056

Digital Library

[17]

Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Tobias Springenberg, Manuel Blum, and Frank Hutter. 2015. Efficient and Robust Automated Machine Learning. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2(NIPS’15). MIT Press, Cambridge, MA, USA, 2755–2763.

[18]

Rebecca Fiebrink, Perry R Cook, and Daniel Trueman. 2011. Human Model Evaluation in Interactive Supervised Learning. CHI 2011 4, 2 (2011), 147–156. https://doi.org/10.14529/jsfi170202

Digital Library

[19]

Yolanda Gil, James Honaker, Shikhar Gupta, Yibo Ma, Vito D’Orazio, Daniel Garijo, Shruti Gadewar, Qifan Yang, and Neda Jahanshad. 2019. Towards Human-Guided Machine Learning. In Proceedings of the 24th International Conference on Intelligent User Interfaces(IUI ’19). Association for Computing Machinery, New York, NY, USA, 614–624. https://doi.org/10.1145/3301275.3302324

Digital Library

[20]

GoogleCloudAutoML. 2020. Google Cloud AutoML. Website. Retrieved July 18, 2020 from https://cloud.google.com/automl

[21]

H2O.ai. 2020. H2o.ai Automated Machine Learning. Website. Retrieved July 18, 2020 from https://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html#automl-interface

[22]

Fred Hohman, Kanit Wongsuphasawat, Mary Beth Kery, and Kayur Patel. 2020. Understanding and Visualizing Data Iteration in Machine Learning. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems(CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376177

Digital Library

[23]

Sungsoo Ray Hong, Jessica Hullman, and Enrico Bertini. 2020. Human Factors in Model Interpretability: Industry Practices, Challenges, and Needs. Proceedings of the ACM on Human-Computer Interaction 4, CSCW1(2020), 1–26.

Digital Library

[24]

Haifeng Jin, Qingquan Song, and Xia Hu. 2019. Auto-Keras: An Efficient Neural Architecture Search System. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD ’19). Association for Computing Machinery, New York, NY, USA, 1946–1956. https://doi.org/10.1145/3292500.3330648

Digital Library

[25]

S. Kandel, A. Paepcke, J. M. Hellerstein, and J. Heer. 2012. Enterprise Data Analysis and Visualization: An Interview Study. IEEE Transactions on Visualization and Computer Graphics 18, 12 (Dec 2012), 2917–2926. https://doi.org/10.1109/TVCG.2012.219

Digital Library

[26]

Trang T Le, Weixuan Fu, and Jason H Moore. 2020. Scaling tree-based automated machine learning to biomedical big data with a feature set selector. Bioinformatics 36, 1 (2020), 250–256.

[27]

Angela Lee, Doris Xin, Doris Lee, and Aditya Parameswaran. 2020. Demystifying a Dark Art: Understanding Real-World Machine Learning Model Development. arxiv:cs.LG/2005.01520

[28]

Doris Jung Lin Lee, Stephen Macke, Doris Xin, Angela Lee, Silu Huang, and Aditya G. Parameswaran. 2019. A Human-in-the-loop Perspective on AutoML: Milestones and the Road Ahead. IEEE Data Eng. Bull. 42(2019), 59–70.

[29]

Nora McDonald, Sarita Schoenebeck, and Andrea Forte. 2019. Reliability and Inter-rater Reliability in Qualitative Research: Norms and Guidelines for CSCW and HCI Practice. Proceedings of the ACM on Human-Computer Interaction 3 (11 2019), 1–23. https://doi.org/10.1145/3359174

Digital Library

[30]

Wes McKinney 2010. Data structures for statistical computing in python. In Proceedings of the 9th Python in Science Conference, Vol. 445. SciPy 2010, Austin, TX, 51–56.

[31]

MicrosoftAzureAutomatedML. 2020. Microsoft Azure Automated Machine Learning. Website. Retrieved September 15, 2020 from https://azure.microsoft.com/en-us/services/machine-learning/automatedml/

[32]

Piero Molino, Yaroslav Dudin, and Sai Sumanth Miryala. 2019. Ludwig: a type-based declarative deep learning toolbox. arxiv:cs.LG/1909.07930

[33]

Jorge Piazentin Ono, Sonia Castelo, Roque Lopez, Enrico Bertini, Juliana Freire, and Claudio Silva. 2020. PipelineProfiler: A Visual Analytics Tool for the Exploration of AutoML Pipelines. arxiv:cs.HC/2005.00160

[34]

Shoumik Palkar, James J Thomas, Anil Shanbhag, Deepak Narayanan, Holger Pirk, Malte Schwarzkopf, Saman Amarasinghe, Matei Zaharia, and Stanford InfoLab. 2017. Weld: A common runtime for high performance data analytics. In Conference on Innovative Data Systems Research (CIDR). CIDR, Chaminade, California, 45.

[35]

Kayur Patel, Naomi Bancroft, Steven M. Drucker, James Fogarty, Andrew J. Ko, and James Landay. 2010. Gestalt: Integrated Support for Implementation and Analysis in Machine Learning. In Proceedings of the 23nd Annual ACM Symposium on User Interface Software and Technology(UIST ’10). Association for Computing Machinery, New York, NY, USA, 37–46. https://doi.org/10.1145/1866029.1866038

Digital Library

[36]

Gonzalo Ramos, Christopher Meek, Patrice Simard, Jina Suh, and Soroush Ghorashi. 2020. Interactive machine teaching: a human-centered approach to building machine-learned models. Human-Computer Interaction 35 (04 2020), 1–39. https://doi.org/10.1080/07370024.2020.1734931

[37]

Yuji Roh, Geon Heo, and Steven Euijong Whang. 2019. A Survey on Data Collection for Machine Learning: a Big Data – AI Integration Perspective. arxiv:cs.LG/1811.03402

[38]

Justin Talbot, Bongshin Lee, Ashish Kapoor, and Desney S. Tan. 2009. EnsembleMatrix: Interactive Visualization to Support Machine Learning with Multiple Classifiers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems(CHI ’09). Association for Computing Machinery, New York, NY, USA, 1283–1292. https://doi.org/10.1145/1518701.1518895

Digital Library

[39]

TransmogriFAI. 2020. TransmogrifAI. Website. Retrieved September 5, 2020 from https://github.com/salesforce/TransmogrifAI

[40]

Michelle Vaccaro and Jim Waldo. 2019. The Effects of Mixing Machine Learning and Human Judgment. Commun. ACM 62, 11 (Oct. 2019), 104–110. https://doi.org/10.1145/3359338

Digital Library

[41]

Stefan Van Der Walt, S Chris Colbert, and Gael Varoquaux. 2011. The NumPy array: a structure for efficient numerical computation. Computing in Science & Engineering 13, 2 (2011), 22.

Digital Library

[42]

Dakuo Wang, Justin D. Weisz, Michael Muller, Parikshit Ram, Werner Geyer, Casey Dugan, Yla Tausczik, Horst Samulowitz, and Alexander Gray. 2019. Human-AI Collaboration in Data Science: Exploring Data Scientists’ Perceptions of Automated AI. Proc. ACM Hum.-Comput. Interact. 3, CSCW, Article 211 (Nov. 2019), 24 pages. https://doi.org/10.1145/3359313

Digital Library

[43]

Qianwen Wang, Yao Ming, Zhihua Jin, Qiaomu Shen, Dongyu Liu, Micah J. Smith, Kalyan Veeramachaneni, and Huamin Qu. 2019. ATMSeer: Increasing Transparency and Controllability in Automated Machine Learning. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems(CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3290605.3300911

Digital Library

[44]

Daniel Karl I. Weidele, Justin D. Weisz, Eno Oduor, Michael Muller, Josh Andres, Alexander Gray, and Dakuo Wang. 2019. AutoAIViz: Opening the Blackbox of Automated Artificial Intelligence with Conditional Parallel Coordinates. arxiv:cs.LG/1912.06723

[45]

Wikipedia contributors. 2020. One-hot — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=One-hot&oldid=975049657 [Online; accessed 17-September-2020].

[46]

Kanit Wongsuphasawat, Daniel Smilkov, James Wexler, Jimbo Wilson, Dandelion Mané, Doug Fritz, Dilip Krishnan, Fernanda B. Viégas, and Martin Wattenberg. 2018. Visualizing Dataflow Graphs of Deep Learning Models in TensorFlow. IEEE Transactions on Visualization and Computer Graphics 24, 1(2018), 1–12. https://doi.org/10.1109/TVCG.2017.2744878

[47]

Doris Xin, Stephen Macke, Litian Ma, Jialin Liu, Shuchen Song, and Aditya Parameswaran. 2018. Helix: holistic optimization for accelerating iterative machine learning. Proceedings of the VLDB Endowment 12, 4 (2018), 446–460.

Digital Library

[48]

Qian Yang, Jina Suh, Nan-Chen Chen, and Gonzalo Ramos. 2018. Grounding Interactive Machine Learning Tool Design in How Non-Experts Actually Build Models. In Proceedings of the 2018 Designing Interactive Systems Conference(DIS ’18). Association for Computing Machinery, New York, NY, USA, 573–584. https://doi.org/10.1145/3196709.3196729

Digital Library

Cited By

Stoica FFlorentina Stoica L(2024)AutoML Insights: Gaining Confidence to Operationalize Predictive ModelsThe New Era of Business Intelligence [Working Title]10.5772/intechopen.1004861Online publication date: 5-Jun-2024
https://doi.org/10.5772/intechopen.1004861
Lindauer MKarl FKlier AMoosbauer JTornede AMueller AHutter FFeurer MBischl BSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)PositionProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693301(30566-30584)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693301
Prinzel LKrois PEllis KVincent MStephens COza NChancey EDavies MMah RAckerson JInfeld SKiggins DMatthews B(2024)The Adaptable and Resilient Safety System: The Human Factor in Future In-Time Aviation Safety Management SystemsAIAA SCITECH 2024 Forum10.2514/6.2024-1603Online publication date: 4-Jan-2024
https://doi.org/10.2514/6.2024-1603
Show More Cited By

Index Terms

Whither AutoML? Understanding the Role of Automation in Machine Learning Workflows
1. Human-centered computing
  1. Human computer interaction (HCI)

Index terms have been assigned to the content through auto-classification.

Recommendations

Trust in AutoML: exploring information needs for establishing trust in automated machine learning systems
IUI '20: Proceedings of the 25th International Conference on Intelligent User Interfaces

We explore trust in a relatively new area of data science: Automated Machine Learning (AutoML). In AutoML, AI methods are used to generate and optimize machine learning models by automatically engineering features, selecting models, and optimizing ...
Understanding the Role of (Advanced) Machine Learning in Metagenomic Workflows
Advanced Visual Interfaces. Supporting Artificial Intelligence and Big Data Applications
Abstract
With the rapid decrease in sequencing costs there is an increased research interest in metagenomics, the study of the genomic content of microbial communities. Machine learning has also seen a revolution with regards to versatility and performance ...
Lifelong Machine Learning

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

May 2021

10862 pages

ISBN:9781450380966

DOI:10.1145/3411764

General Chairs:
Yoshifumi Kitamura
Tohoku University, Japan
,
Aaron Quigley
University of New South Wales, Australia
,
Program Chairs:
Katherine Isbister
University of California Santa Cruz, USA
,
Takeo Igarashi
The University of Tokyo, Japan
,
Publications Chairs:
Pernille Bjørn
University of Copenhagen, Denmark
,
Steven Drucker
Microsoft Research, USA

Copyright © 2021 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 May 2021

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Science Foundation

Conference

CHI '21

Sponsor:

SIGCHI

CHI '21: CHI Conference on Human Factors in Computing Systems

May 8 - 13, 2021

Yokohama, Japan

Acceptance Rates

Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Upcoming Conference

CHI 2025

Sponsor:
sigchi

ACM CHI Conference on Human Factors in Computing Systems

April 26 - May 1, 2025

Yokohama , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

48
Total Citations
View Citations
4,234
Total Downloads

Downloads (Last 12 months)991
Downloads (Last 6 weeks)135

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Stoica FFlorentina Stoica L(2024)AutoML Insights: Gaining Confidence to Operationalize Predictive ModelsThe New Era of Business Intelligence [Working Title]10.5772/intechopen.1004861Online publication date: 5-Jun-2024
https://doi.org/10.5772/intechopen.1004861
Lindauer MKarl FKlier AMoosbauer JTornede AMueller AHutter FFeurer MBischl BSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)PositionProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693301(30566-30584)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693301
Prinzel LKrois PEllis KVincent MStephens COza NChancey EDavies MMah RAckerson JInfeld SKiggins DMatthews B(2024)The Adaptable and Resilient Safety System: The Human Factor in Future In-Time Aviation Safety Management SystemsAIAA SCITECH 2024 Forum10.2514/6.2024-1603Online publication date: 4-Jan-2024
https://doi.org/10.2514/6.2024-1603
Kutsepau AKren ANikiforov ATursunov N(2024)Application of Artificial Intelligence Technology for Prompt Diagnosis of Cast Iron Mechanical PropertiesDevices and Methods of Measurements10.21122/2220-9506-2024-15-3-231-23915:3(231-239)Online publication date: 6-Nov-2024
https://doi.org/10.21122/2220-9506-2024-15-3-231-239
Shah VParashos TKumar A(2024)How Do Categorical Duplicates Affect ML? A New Benchmark and Empirical AnalysesProceedings of the VLDB Endowment10.14778/3648160.364817817:6(1391-1404)Online publication date: 3-May-2024
https://dl.acm.org/doi/10.14778/3648160.3648178
Ribeiro Jesus RRodrigues ACosta C(2024)Unlocking AutoML: Enhancing Data with Deep Learning Algorithms for Medical ImagingJournal of Data and Information Quality10.1145/370589616:4(1-17)Online publication date: 26-Nov-2024
https://dl.acm.org/doi/10.1145/3705896
Shankar SGarcia RHellerstein JParameswaran A(2024)"We Have No Idea How Models will Behave in Production until Production": How Engineers Operationalize Machine LearningProceedings of the ACM on Human-Computer Interaction10.1145/36536978:CSCW1(1-34)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3653697
Zhu QWang DMa SWang AChen ZKhurana UMa X(2024)Towards Feature Engineering with Human and AI’s Knowledge: Understanding Data Science Practitioners’ Perceptions in Human&AI-Assisted Feature Engineering DesignProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661517(1789-1804)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3643834.3661517
Kommiya Mothilal RGuha SAhmed S(2024)Towards a Non-Ideal Methodological Framework for Responsible MLProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642501(1-17)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642501
Rogers JAnastacio MBernard JChakhchoukh MFaust RKerren AKoch SKotthoff LTurkay CWall E(2024)Visualization and Automation in Data Science: Exploring the Paradox of Humans-in-the-Loop2024 IEEE Visualization in Data Science (VDS)10.1109/VDS63897.2024.00005(1-5)Online publication date: 14-Oct-2024
https://doi.org/10.1109/VDS63897.2024.00005
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents