Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Human-AI Collaboration in Data Science: Exploring Data Scientists' Perceptions of Automated AI

Published: 07 November 2019 Publication History

Abstract

The rapid advancement of artificial intelligence (AI) is changing our lives in many ways. One application domain is data science. New techniques in automating the creation of AI, known as AutoAI or AutoML, aim to automate the work practices of data scientists. AutoAI systems are capable of autonomously ingesting and pre-processing data, engineering new features, and creating and scoring models based on a target objectives (e.g. accuracy or run-time efficiency). Though not yet widely adopted, we are interested in understanding how AutoAI will impact the practice of data science. We conducted interviews with 20 data scientists who work at a large, multinational technology company and practice data science in various business settings. Our goal is to understand their current work practices and how these practices might change with AutoAI. Reactions were mixed: while informants expressed concerns about the trend of automating their jobs, they also strongly felt it was inevitable. Despite these concerns, they remained optimistic about their future job security due to a view that the future of data science work will be a collaboration between humans and AI systems, in which both automation and human expertise are indispensable.

References

[1]
Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the people: The role of humans in interactive machine learning. AI Magazine, Vol. 35, 4 (2014), 105--120.
[2]
Daniel Berrar, Philippe Lopes, and Werner Dubitzky. 2019. Incorporating domain knowledge in machine learning for soccer outcome prediction. Machine Learning, Vol. 108, 1 (2019), 97--126.
[3]
Chris Bopp, Ellie Harmon, and Amy Voida. 2017. Disempowered by data: Nonprofits, social enterprises, and the consequences of data-driven work. In Proceedings of the 2017 CHI conference on human factors in computing systems. ACM, 3608--3619.
[4]
Christine L Borgman, Jillian C Wallis, and Matthew S Mayernik. 2012. Who's got the data? Interdependencies in science and technology collaborations. Computer Supported Cooperative Work (CSCW), Vol. 21, 6 (2012), 485--523.
[5]
Brian d'Alessandro, Cathy O'Neil, and Tom LaGatta. 2017. Conscientious classification: A data scientist's guide to discrimination-aware classification. Big data, Vol. 5, 2 (2017), 120--134.
[6]
Tommy Dang, Fang Jin, et almbox. 2018. Predict saturated thickness using tensorboard visualization. In Proceedings of the Workshop on Visualisation in Environmental Sciences. Eurographics Association, 35--39.
[7]
DataRobot. [n. d.]. Automated Machine Learning for Predictive Modeling. https://www.datarobot.com/ Retrieved 3-April-2019 from
[8]
Vasant Dhar. 2013. Data Science and Prediction. Commun. ACM, Vol. 56, 12 (Dec. 2013), 64--73. https://doi.org/10.1145/2500499
[9]
Joan DiMicco, David R Millen, Werner Geyer, Casey Dugan, Beth Brownholtz, and Michael Muller. 2008. Motivations for social networking at work. In Proceedings of the 2008 ACM conference on Computer supported cooperative work. ACM, 711--720.
[10]
Seth Dobrin and IBM Analytics. 2017. How IBM builds an effective data science team. https://venturebeat.com/2017/12/22/how-ibm-builds-an-effective-data-science-team/
[11]
Paul Dourish and Edgar Gómez Cruz. 2018. Datafication and data fiction: Narrating data and narrating with data. Big Data & Society, Vol. 5, 2 (2018), 2053951718784083.
[12]
Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. 2018. Neural architecture search: A survey. arXiv preprint arXiv:1808.05377 (2018).
[13]
EpistasisLab. [n. d.]. tpot. https://github.com/EpistasisLab/tpot Retrieved 3-April-2019 from
[14]
Melanie Feinberg. 2017. A design perspective on data. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, 2952--2963.
[15]
Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. Efficient and robust automated machine learning. In Advances in Neural Information Processing Systems. 2962--2970.
[16]
Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.
[17]
Yolanda Gil, James Honaker, Shikhar Gupta, Yibo Ma, Vito D'Orazio, Daniel Garijo, Shruti Gadewar, Qifan Yang, and Neda Jahanshad. 2019. Towards Human-Guided Machine Learning. (2019).
[18]
Google. [n. d.] a. Cloud AutoML. https://cloud.google.com/automl/ Retrieved 3-April-2019 from
[19]
Google. [n. d.] b. Colaboratory. https://colab.research.google.com Retrieved 3-April-2019 from
[20]
Brian Granger, Chris Colbert, and Ian Rose. 2017. JupyterLab: The next generation jupyter frontend. JupyterCon 2017 (2017).
[21]
Corrado Grappiolo, Emile van Gerwen, Jack Verhoosel, and Lou Somers. 2019. The Semantic Snake Charmer Search Engine: A Tool to Facilitate Data Science in High-tech Industry Domains. In Proceedings of the 2019 Conference on Human Information Interaction and Retrieval. ACM, 355--359.
[22]
Ben Green, Alejandra Caro, Matthew Conway, Robert Manduca, Tom Plagge, and Abby Miller. 2015. Mining administrative data to spur urban revitalization. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 1829--1838.
[23]
H2O. [n. d.]. H2O. https://h2o.ai Retrieved 3-April-2019 from
[24]
Jeffrey Heer. 2019. Agency plus automation: Designing artificial intelligence into interactive systems. Proceedings of the National Academy of Sciences, Vol. 116, 6 (2019), 1844--1850.
[25]
Jeffrey Heer and Ben Shneiderman. 2012. Interactive dynamics for visual analysis. Queue, Vol. 10, 2 (2012), 30.
[26]
Jeffrey Heer, Fernanda B Viégas, and Martin Wattenberg. 2007. Voyagers and voyeurs: supporting asynchronous collaborative information visualization. In Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 1029--1038.
[27]
Youyang Hou and Dakuo Wang. 2017. Hacking with NPOs: collaborative analytics and broker roles in civic data hackathons. Proceedings of the ACM on Human-Computer Interaction, Vol. 1, CSCW (2017), 53.
[28]
Shamsi T Iqbal and Eric Horvitz. 2010. Notifications and awareness: a field study of alert usage and preferences. In Proceedings of the 2010 ACM conference on Computer supported cooperative work. ACM, 27--30.
[29]
Piper Jackson. 2019. Casting a Wider (Neural) Net: Introducing Data Science and Machine Learning to a Larger Audience. In Proceedings of the Western Canadian Conference on Computing Education. ACM, 14.
[30]
Project Jupyter. [n. d.] a. Jupyter Notebook. https://jupyter.org Retrieved 3-April-2019 from
[31]
Project Jupyter. [n. d.] b. JupyterLab. https://www.github.com/jupyterlab/jupyterlab
[32]
Kaggle. [n. d.]. Kaggle: Your Home for Data Science. https://www.kaggle.com Retrieved 3-April
[33]
Kaggle. 2017. The State of Data Science & Machine Learning. https://www.kaggle.com/surveys/2017
[34]
Sean Kandel, Andreas Paepcke, Joseph Hellerstein, and Jeffrey Heer. 2011. Wrangler: Interactive visual specification of data transformation scripts. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 3363--3372.
[35]
James Max Kanter and Kalyan Veeramachaneni. 2015. Deep feature synthesis: Towards automating data science endeavors. In 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 1--10.
[36]
Mary Beth Kery, Marissa Radensky, Mahima Arya, Bonnie E John, and Brad A Myers. 2018. The story in the notebook: Exploratory data science using a literate programming tool. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 174.
[37]
Udayan Khurana, Deepak Turaga, Horst Samulowitz, and Srinivasan Parthasrathy. 2016. Cognito: Automated feature engineering for supervised learning. In 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW). IEEE, 1304--1307.
[38]
Ákos Kiss and Tamás Szirányi. 2013. Evaluation of manually created ground truth for multi-view people localization. In Proceedings of the International Workshop on Video and Image Ground Truth in Computer Vision Applications. ACM, 9.
[39]
Thomas Kluyver, Benjamin Ragan-Kelley, Fernando Pérez, Brian E Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica B Hamrick, Jason Grout, Sylvain Corlay, et almbox. 2016. Jupyter Notebooks-a publishing format for reproducible computational workflows. In ELPUB . 87--90.
[40]
Donald Ervin Knuth. 1984. Literate programming. Comput. J., Vol. 27, 2 (1984), 97--111.
[41]
Lars Kotthoff, Chris Thornton, Holger H. Hoos, Frank Hutter, and Kevin Leyton-Brown. 2017. Auto-WEKA 2.0: Automatic Model Selection and Hyperparameter Optimization in WEKA. J. Mach. Learn. Res., Vol. 18, 1 (Jan. 2017), 826--830. http://dl.acm.org/citation.cfm?id=3122009.3122034
[42]
Georgia Kougka, Anastasios Gounaris, and Alkis Simitsis. 2018. The many faces of data-centric workflow optimization: a survey. International Journal of Data Science and Analytics, Vol. 6, 2 (2018), 81--107.
[43]
Hoang Thanh Lam, Johann-Michael Thiebaut, Mathieu Sinn, Bei Chen, Tiep Mai, and Oznur Alkan. 2017. One button machine for automating feature engineering in relational databases. arXiv preprint arXiv:1706.00327 (2017).
[44]
Fei-Fei Li. 2018. How to Make A.I. That's Good for People. The New York Times (7 March 2018). https://www.nytimes.com/2018/03/07/opinion/artificial-intelligence-human.html Retrieved 3-April-2019 from
[45]
Zachary C Lipton. 2016. The mythos of model interpretability. arXiv preprint arXiv:1606.03490 (2016).
[46]
Yaoli Mao, Dakuo Wang, Michael Muller, KUSH VARSHNEY, IOANA Baldini, CASEY Dugan, and ALEKSANDRA MOJSILOVI?. 2020. How Data Scientists Work Together With Domain Experts in Scientific Collaborations. In Proceedings of the 2020 ACM conference on GROUP . ACM.
[47]
Matthew Thomas Martinez. 2016. An Overview of Google's Machine Intelligence Software TensorFlow. Technical Report. Sandia National Lab.(SNL-NM), Albuquerque, NM (United States).
[48]
Michael Muller, Ingrid Lange, Dakuo Wang, David Piorkowski, Jason Tsay, Q. Vera Liao, Casey Dugan, and Thomas Erickson. 2019. How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). ACM, New York, NY, USA, Forthcoming.
[49]
Fatemeh Nargesian, Horst Samulowitz, Udayan Khurana, Elias B Khalil, and Deepak Turaga. 2017. Learning feature engineering for classification. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. AAAI Press, 2529--2535.
[50]
Gina Neff, Anissa Tanweer, Brittany Fiore-Gartland, and Laura Osburn. 2017. Critique and contribute: A practice-based framework for improving critical data studies and data science. Big Data, Vol. 5, 2 (2017), 85--97.
[51]
Judith S Olson, Dakuo Wang, Gary M Olson, and Jingwen Zhang. 2017. How people write together now: Beginning the investigation with advanced undergraduates in a project course. ACM Transactions on Computer-Human Interaction (TOCHI), Vol. 24, 1 (2017), 4.
[52]
Randal S Olson and Jason H Moore. 2016. TPOT: A tree-based pipeline optimization tool for automating machine learning. In Workshop on Automatic Machine Learning. 66--74.
[53]
Samir Passi and Steven Jackson. 2017. Data vision: Learning to see through algorithmic abstraction. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing . ACM, 2436--2447.
[54]
Samir Passi and Steven J Jackson. 2018. Trust in Data Science: Collaboration, Translation, and Accountability in Corporate Data Science Projects. Proceedings of the ACM on Human-Computer Interaction, Vol. 2, CSCW (2018), 136.
[55]
Kayur Patel, James Fogarty, James A Landay, and Beverly Harrison. 2008. Investigating statistical machine learning as a tool for software development. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 667--676.
[56]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, Vol. 12 (2011), 2825--2830.
[57]
Kathleen H Pine and Max Liboiron. 2015. The politics of measurement and action. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 3147--3156.
[58]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?". Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '16 (2016). https://doi.org/10.1145/2939672.2939778
[59]
Adam Rule, Aurélien Tabard, and James D Hollan. 2018. Exploration and explanation in computational notebooks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 32.
[60]
Cathrine Seidelin. 2018. Developing Notations for Data Infrastructuring in Participatory Design. In Companion of the 2018 ACM Conference on Computer Supported Cooperative Work and Social Computing . ACM, 81--84.
[61]
B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. De Freitas. 2016. Taking the human out of the loop: A review of bayesian optimization. Proc. IEEE, Vol. 104, 1 (2016), 148--175.
[62]
Il-Yeol Song and Yongjun Zhu. 2016. Big data and data science: what should we teach? Expert Systems, Vol. 33, 4 (2016), 364--373.
[63]
Manuel Stein, Halldór Janetzko, Daniel Seebacher, Alexander J"ager, Manuel Nagel, Jürgen Hölsch, Sven Kosub, Tobias Schreck, Daniel Keim, and Michael Grossniklaus. 2017. How to make sense of team sport data: From acquisition to data modeling and research aspects. Data, Vol. 2, 1 (2017), 2.
[64]
Eva Thelisson. 2017. Towards Trust, Transparency and Liability in AI/AS systems. In IJCAI. 5215--5216.
[65]
Eva Thelisson, Kirtan Padh, and L Elisa Celis. 2017. Regulatory mechanisms and algorithms towards trust in AI/ML. In Proceedings of the IJCAI 2017 Workshop on Explainable Artificial Intelligence (XAI), Melbourne, Australia .
[66]
Chris Thornton, Holger H. Hoos, Frank Hutter, and Kevin Leyton-Brown. 2012. Auto-WEKA: Automated Selection and Hyper-Parameter Optimization of Classification Algorithms. CoRR, Vol. abs/1208.3719 (2012). http://arxiv.org/abs/1208.3719
[67]
Michelle Ufford, Matthew Seal, and Kyle Kelley. 2018. Beyond Interactive: Notebook Innovation at Netflix.
[68]
Wil MP Van der Aalst. 2014. Data scientist: The engineer of the future. In Enterprise interoperability VI . Springer, 13--26.
[69]
Joaquin Vanschoren. 2018. Meta-learning: A survey. arXiv preprint arXiv:1810.03548 (2018).
[70]
Stijn Viaene. 2013. Data scientists aren't domain experts. IT Professional, Vol. 15, 6 (2013), 12--17.
[71]
Flavio G Villanustre. 2014. Big data trends and evolution: A human perspective. In Proceedings of the 3rd annual conference on Research in information technology. ACM, 1--2.
[72]
Michael A Walker. 2015. The professionalisation of data science. International Journal of Data Science, Vol. 1, 1 (2015), 7--16.
[73]
Wikipedia contributors. 2019. General Data Protection Regulation -- Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=General_Data_Protection_Regulation&oldid=890913930 [Online; accessed 3-April-2019].
[74]
Bledi Taska Steven M. Miller Debbie Hughes Will Markow, Soumya Braganza. 2017. The Quant Crunch: How the Demand for Data Science Skills is Disrupting the Job Market. (2017). https://www.ibm.com/downloads/cas/3RL3VXGA [Online; accessed 3-April-2019].

Cited By

View all
  • (2025)High hopes, hard falls: consumer expectations and reactions to AI-human collaboration in advertisingInternational Journal of Advertising10.1080/02650487.2025.2458996(1-33)Online publication date: 31-Jan-2025
  • (2025)Understanding Human-Centred AI: a review of its defining elements and a research agendaBehaviour & Information Technology10.1080/0144929X.2024.2448719(1-40)Online publication date: 16-Feb-2025
  • (2025)An Exhaustive Survey on the Methods and Applications of Graph Neural NetworksThe 8th International Conference on Information Science, Communication and Computing10.1007/978-981-96-1781-4_11(137-151)Online publication date: 14-Feb-2025
  • Show More Cited By

Index Terms

  1. Human-AI Collaboration in Data Science: Exploring Data Scientists' Perceptions of Automated AI

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Human-Computer Interaction
    Proceedings of the ACM on Human-Computer Interaction  Volume 3, Issue CSCW
    November 2019
    5026 pages
    EISSN:2573-0142
    DOI:10.1145/3371885
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 November 2019
    Published in PACMHCI Volume 3, Issue CSCW

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. ai design ai
    2. autoai
    3. automation
    4. automl
    5. data science
    6. data scientist
    7. domain experts
    8. future of work
    9. human-ai collaboration
    10. human-centered ai
    11. human-in-the-loop ai
    12. machine learning

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1,463
    • Downloads (Last 6 weeks)133
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)High hopes, hard falls: consumer expectations and reactions to AI-human collaboration in advertisingInternational Journal of Advertising10.1080/02650487.2025.2458996(1-33)Online publication date: 31-Jan-2025
    • (2025)Understanding Human-Centred AI: a review of its defining elements and a research agendaBehaviour & Information Technology10.1080/0144929X.2024.2448719(1-40)Online publication date: 16-Feb-2025
    • (2025)An Exhaustive Survey on the Methods and Applications of Graph Neural NetworksThe 8th International Conference on Information Science, Communication and Computing10.1007/978-981-96-1781-4_11(137-151)Online publication date: 14-Feb-2025
    • (2024)Artificial Intelligence as a tool for analysis in Social Sciences: methods and applicationsLatIA10.62486/latia2024112(11)Online publication date: 23-Jul-2024
    • (2024)Artificial Intelligence in Psychological Diagnosis and InterventionLatIA10.62486/latia2023261(26)Online publication date: 27-Jul-2024
    • (2024)Evaluation of ChatGPT Usage in Preschool Education: Teacher PerspectivesEğitim Ve İnsani Bilimler Dergisi: Teori Ve Uygulama10.58689/eibd.153733715:30(387-414)Online publication date: 31-Dec-2024
    • (2024)PositionProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693301(30566-30584)Online publication date: 21-Jul-2024
    • (2024)On the Utility of External Agent Intention Predictor for Human-AI CoordinationProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663222(2546-2548)Online publication date: 6-May-2024
    • (2024)The role of Big Data Analytics in Financial Decision-Making and Strategic AccountingTechnium Business and Management10.47577/business.v10i.1187710(17-33)Online publication date: 23-Nov-2024
    • (2024)Productivity Modern Management Science Practices in the Age of AIModern Management Science Practices in the Age of AI10.4018/979-8-3693-6720-9.ch005(123-150)Online publication date: 30-Aug-2024
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media