Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

How Data Scientists Use Computational Notebooks for Real-Time Collaboration

Published: 07 November 2019 Publication History

Abstract

Effective collaboration in data science can leverage domain expertise from each team member and thus improve the quality and efficiency of the work. Computational notebooks give data scientists a convenient interactive solution for sharing and keeping track of the data exploration process through a combination of code, narrative text, visualizations, and other rich media. In this paper, we report how synchronous editing in computational notebooks changes the way data scientists work together compared to working on individual notebooks. We first conducted a formative survey with 195 data scientists to understand their past experience with collaboration in the context of data science. Next, we carried out an observational study of 24 data scientists working in pairs remotely to solve a typical data science predictive modeling problem, working on either notebooks supported by synchronous groupware or individual notebooks in a collaborative setting. The study showed that working on the synchronous notebooks improves collaboration by creating a shared context, encouraging more exploration, and reducing communication costs. However, the current synchronous editing features may lead to unbalanced participation and activity interference without strategic coordination. The synchronous notebooks may also amplify the tension between quick exploration and clear explanations. Building on these findings, we propose several design implications aimed at better supporting collaborative editing in computational notebooks, and thus improving efficiency in teamwork among data scientists.

Supplementary Material

ZIP File (cscw039aux.zip)

References

[1]
Ronald M Baecker, Dimitrios Nastos, Ilona R Posner, and Kelly L Mawby. 1995. The user-centred iterative design of collaborative writing software. In Readings in Human-Computer Interaction. Elsevier, 775--782.
[2]
Andrew Begel. 2008. Effecting Change: Coordination in Large-scale Software Development. In Proceedings of the 2008 International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE '08). ACM, New York, NY, USA, 17--20. https://doi.org/10.1145/1370114.1370119
[3]
Jeremy Birnholtz, Stephanie Steinhardt, and Antonella Pavese. 2013. Write Here, Write Now!: An Experimental Study of Group Maintenance in Collaborative Writing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '13). ACM, New York, NY, USA, 961--970. https://doi.org/10.1145/2470654.2466123
[4]
Netflix Technology Blog. 2018. Beyond Interactive: Notebook Innovation at Netflix. https://medium.com/netflix-techblog/notebook-innovation-591ee3221233
[5]
Sallyann Bryant, Pablo Romero, and Benedict du Boulay. 2008. Pair programming and the mysterious role of the navigator. International Journal of Human-Computer Studies, Vol. 66, 7 (2008), 519 -- 529. https://doi.org/10.1016/j.ijhcs.2007.03.005 Collaborative and social aspects of software development.
[6]
Yan Chen, Sang Won Lee, Yin Xie, YiWei Yang, Walter S. Lasecki, and Steve Oney. 2017. Codeon: On-Demand Software Development Assistance. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 6220--6231. https://doi.org/10.1145/3025453.3025972
[7]
Yan Chen, Steve Oney, and Walter S. Lasecki. 2016. Towards Providing On-Demand Expert Support for Software Developers. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI '16). ACM, New York, NY, USA, 3192--3203. https://doi.org/10.1145/2858036.2858512
[8]
Matthew Conlen and Jeffrey Heer. 2018. Idyll: A Markup Language for Authoring and Publishing Interactive Articles on the Web. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology (UIST '18). ACM, New York, NY, USA, 977--989. https://doi.org/10.1145/3242587.3242600
[9]
Juliet Corbin and Anselm Strauss. 2008. Basics of qualitative research: Techniques and procedures for developing grounded theory, 3rd ed. Sage Publications, Inc. https://doi.org/10.4135/9781452230153
[10]
Gabriele D'Angelo, Angelo Di Iorio, and Stefano Zacchiroli. 2018. Spacetime Characterization of Real-Time Collaborative Editing. Proc. ACM Hum.-Comput. Interact., Vol. 2, CSCW, Article 41 (Nov. 2018), 19 pages. https://doi.org/10.1145/3274310
[11]
Thomas H. Davenport and D. J. Patil. 2012. Data Scientist: The Sexiest Job of the 21st Century. (2012). Issue October 2012. https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
[12]
Prasun Dewan and John Riedl. 1993. Toward computer-supported concurrent software engineering. Computer, Vol. 26, 1 (Jan 1993), 17--27. https://doi.org/10.1109/2.179149
[13]
David Donoho. 2017. 50 Years of Data Science. Journal of Computational and Graphical Statistics, Vol. 26, 4 (2017), 745--766. https://doi.org/10.1080/10618600.2017.1384734
[14]
Paul Dourish and Victoria Bellotti. 1992. Awareness and Coordination in Shared Workspaces. In Proceedings of the 1992 ACM Conference on Computer-supported Cooperative Work (CSCW '92). ACM, New York, NY, USA, 107--114. https://doi.org/10.1145/143457.143468
[15]
Hongfei Fan, Chengzheng Sun, and Haifeng Shen. 2012. ATCoPE: Any-time Collaborative Programming Environment for Seamless Integration of Real-time and Non-real-time Teamwork in Software Development. In Proceedings of the 17th ACM International Conference on Supporting Group Work (GROUP '12). ACM, New York, NY, USA, 107--116. https://doi.org/10.1145/2389176.2389194
[16]
John C Flanagan. 1954. The critical incident technique. Psychological bulletin, Vol. 51, 4 (1954), 327.
[17]
Gregg Stanley Foster. 1986. Collaborative Systems and Multi-user Interfaces. Ph.D. Dissertation. AAI8717981.
[18]
Max Goldman. 2010. Test-driven Roles for Pair Programming. In Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering - Volume 2 (ICSE '10). ACM, New York, NY, USA, 515--516. https://doi.org/10.1145/1810295.1810458
[19]
Max Goldman. 2012. Software Development with Real-time Collaborative Editing. Ph.D. Dissertation. Cambridge, MA, USA. AAI0829066.
[20]
Max Goldman, Greg Little, and Robert C. Miller. 2011. Real-time Collaborative Coding in a Web IDE. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (UIST '11). ACM, New York, NY, USA, 155--164. https://doi.org/10.1145/2047196.2047215
[21]
Philip J. Guo. 2012. Software tools to facilitate research programming. Ph.D. Dissertation. Stanford University Stanford, CA.
[22]
Philip J. Guo. 2015. Codeopticon: Real-Time, One-To-Many Human Tutoring for Computer Programming. In Proceedings of the 28th Annual ACM Symposium on User Interface Software and Technology (UIST '15). ACM, New York, NY, USA, 599--608. https://doi.org/10.1145/2807442.2807469
[23]
Philip J. Guo and Margo Seltzer. 2012. BURRITO: Wrapping Your Lab Notebook in Computational Infrastructure. In Proceedings of the 4th USENIX Conference on Theory and Practice of Provenance (TaPP'12). USENIX Association, 7--7. http://dl.acm.org/citation.cfm?id=2342875.2342882
[24]
Carl Gutwin and Saul Greenberg. 1998. Effects of Awareness Support on Groupware Usability. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '98). ACM Press/Addison-Wesley Publishing Co., New York, NY, USA, 511--518. https://doi.org/10.1145/274644.274713
[25]
Carl Gutwin and Saul Greenberg. 2002. A Descriptive Framework of Workspace Awareness for Real-Time Groupware. Computer Supported Cooperative Work (CSCW), Vol. 11, 3 (01 Sep 2002), 411--446. https://doi.org/10.1023/A:1021271517844
[26]
Caroline Haythornthwaite. 2005. Introduction: Computer-Mediated Collaborative Practices., Vol. 10, 4 (2005). https://doi.org/10.1111/j.1083--6101.2005.tb00274.x
[27]
Andrew Head, Fred Hohman, Titus Barik, Steven M. Drucker, and Robert DeLine. 2019. Managing Messes in Computational Notebooks. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). ACM, New York, NY, USA, Article 270, 12 pages. https://doi.org/10.1145/3290605.3300500
[28]
Jeffery Heer and Maneesh Agrawala. 2007. Design Considerations for Collaborative Visual Analytics. In 2007 IEEE Symposium on Visual Analytics Science and Technology. 171--178. https://doi.org/10.1109/VAST.2007.4389011
[29]
James D. Herbsleb and Audris Mockus. 2003. Formulation and Preliminary Test of an Empirical Theory of Coordination in Software Engineering. SIGSOFT Softw. Eng. Notes, Vol. 28, 5 (Sept. 2003), 138--137. https://doi.org/10.1145/949952.940091
[30]
Petra Isenberg, Niklas Elmqvist, Jean Scholtz, Daniel Cernea, Kwan-Liu Ma, and Hans Hagen. 2011. Collaborative visualization: Definition, challenges, and research agenda. Information Visualization, Vol. 10, 4 (2011), 310--326. https://doi.org/10.1177/1473871611412817 https://doi.org/10.1109/MCSE.2007.53
[31]
Jeffrey M. Perkel. 2018. Why Jupyter is data scientists' computational notebook of choice. Nature, Vol. 563 (2018), 145. https://doi.org/10.1038/d41586-018-07196--1
[32]
Ilona R. Posner and Ron Baecker. 1992. How people write together (groupware). In Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences, Vol. iv. 127--138 vol.4. https://doi.org/10.1109/HICSS.1992.183420
[33]
Bernadette M. Randles, Irene V. Pasquetto, Milena S. Golshan, and Christine L. Borgman. 2017. Using the Jupyter Notebook as a Tool for Open Science: An Empirical Study. In 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL). 1--2. https://doi.org/10.1109/JCDL.2017.7991618
[34]
Adam Rule, Ian Drosos, Aurélien Tabard, and James D. Hollan. 2018a. Aiding Collaborative Reuse of Computational Notebooks with Annotated Cell Folding. Proc. ACM Hum.-Comput. Interact., Vol. 2 (2018), 150:1--150:12. Issue CSCW. https://doi.org/10.1145/3274419
[35]
Adam Rule, Aurélien Tabard, and James D. Hollan. 2018b. Exploration and Explanation in Computational Notebooks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Article 32, 12 pages. https://doi.org/10.1145/3173574.3173606
[36]
Adam Carl Rule. 2018. Design and Use of Computational Notebooks. Ph.D. Dissertation. University of California San Diege.
[37]
Helen Sharp, Robert Biddle, Phil Gray, Lynn Miller, and Jeff Patton. 2006. Agile Development: Opportunity or Fad?. In CHI '06 Extended Abstracts on Human Factors in Computing Systems (CHI EA '06). ACM, New York, NY, USA, 32--35. https://doi.org/10.1145/1125451.1125461
[38]
Aurélien Tabard, Wendy E. Mackay, and Evelyn Eastmond. 2008. From Individual to Collaborative: The Evolution of Prism, a Hybrid Laboratory Notebook. In Proceedings of the 2008 ACM Conference on Computer Supported Cooperative Work (CSCW '08). ACM, 569--578. https://doi.org/10.1145/1460563.1460653
[39]
M. Rita Thissen, Jean M. Page, Madhavi C. Bharathi, and Toyia L. Austin. 2007. Communication Tools for Distributed Software Development Teams. In Proceedings of the 2007 ACM SIGMIS CPR Conference on Computer Personnel Research: The Global Information Technology Workforce (SIGMIS CPR '07). ACM, New York, NY, USA, 28--35. https://doi.org/10.1145/1235000.1235007
[40]
Darja vSmite, Nils Brede Moe, and Richard Torkar. 2008. Pitfalls in Remote Team Coordination: Lessons Learned from a Case Study. In Proceedings of the 9th International Conference on Product-Focused Software Process Improvement (PROFES '08). Springer-Verlag, Berlin, Heidelberg, 345--359. https://doi.org/10.1007/978--3--540--69566-0_28
[41]
Dakuo Wang, Haodan Tan, and Tun Lu. 2017. Why Users Do Not Want to Write Together When They Are Writing Together: Users' Rationales for Today's Collaborative Writing Practices. Proc. ACM Hum.-Comput. Interact., Vol. 1, CSCW, Article 107 (Dec. 2017), 18 pages. https://doi.org/10.1145/3134742
[42]
Jeremy Warner and Philip J. Guo. 2017. CodePilot: Scaffolding End-to-End Collaborative Software Development for Novice Programmers. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 1136--1141. https://doi.org/10.1145/3025453.3025876
[43]
Judith D. Wilson, Nathan Hoskin, and John T. Nosek. 1993. The Benefits of Collaboration for Student Programmers. In Proceedings of the Twenty-fourth SIGCSE Technical Symposium on Computer Science Education (SIGCSE '93). ACM, New York, NY, USA, 160--164. https://doi.org/10.1145/169070.169383
[44]
Soobin Yim, Dakuo Wang, Judith Olson, Viet Vu, and Mark Warschauer. 2017. Synchronous Collaborative Writing in the Classroom: Undergraduates' Collaboration Practices and Their Impact on Writing Style, Quality, and Quantity. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW '17). ACM, New York, NY, USA, 468--479. https://doi.org/10.1145/2998181.2998356
[45]
Xiong Zhang and Philip J. Guo. 2017. DS.Js: Turn Any Webpage into an Example-Centric Live Programming Environment for Learning Data Science. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology (UIST '17). ACM, New York, NY, USA, 691--702. https://doi.org/10.1145/3126594.3126663

Cited By

View all
  • (2024)Scaffolded team-based computational modeling and simulation projects for promoting representational competence and regulatory skillsInternational Journal of STEM Education10.1186/s40594-024-00494-311:1Online publication date: 30-Jul-2024
  • (2024)NotePlayer: Engaging Computational Notebooks for Dynamic Presentation of Analytical ProcessesProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676410(1-20)Online publication date: 13-Oct-2024
  • (2024)Improving Steering and Verification in AI-Assisted Data Analysis with Interactive Task DecompositionProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676345(1-19)Online publication date: 13-Oct-2024
  • Show More Cited By

Index Terms

  1. How Data Scientists Use Computational Notebooks for Real-Time Collaboration

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Human-Computer Interaction
    Proceedings of the ACM on Human-Computer Interaction  Volume 3, Issue CSCW
    November 2019
    5026 pages
    EISSN:2573-0142
    DOI:10.1145/3371885
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 November 2019
    Published in PACMHCI Volume 3, Issue CSCW

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. collaborative systems
    2. computational notebooks
    3. data science

    Qualifiers

    • Research-article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)689
    • Downloads (Last 6 weeks)58
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Scaffolded team-based computational modeling and simulation projects for promoting representational competence and regulatory skillsInternational Journal of STEM Education10.1186/s40594-024-00494-311:1Online publication date: 30-Jul-2024
    • (2024)NotePlayer: Engaging Computational Notebooks for Dynamic Presentation of Analytical ProcessesProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676410(1-20)Online publication date: 13-Oct-2024
    • (2024)Improving Steering and Verification in AI-Assisted Data Analysis with Interactive Task DecompositionProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676345(1-19)Online publication date: 13-Oct-2024
    • (2024)Towards Feature Engineering with Human and AI’s Knowledge: Understanding Data Science Practitioners’ Perceptions in Human&AI-Assisted Feature Engineering DesignProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661517(1789-1804)Online publication date: 1-Jul-2024
    • (2024)Hidden Gems in the Rough: Computational Notebooks as an Uncharted Oasis for IDEsProceedings of the 1st ACM/IEEE Workshop on Integrated Development Environments10.1145/3643796.3648465(107-109)Online publication date: 20-Apr-2024
    • (2024)Don't Step on My Toes: Resolving Editing Conflicts in Real-Time Collaboration in Computational NotebooksProceedings of the 1st ACM/IEEE Workshop on Integrated Development Environments10.1145/3643796.3648453(47-52)Online publication date: 20-Apr-2024
    • (2024)Understanding Real-Time Collaborative Programming: A Study of Visual Studio Live ShareACM Transactions on Software Engineering and Methodology10.1145/364367233:4(1-28)Online publication date: 20-Apr-2024
    • (2024)Co-ML: Collaborative Machine Learning Model Building for Developing Dataset Design PracticesACM Transactions on Computing Education10.1145/3641552Online publication date: 22-Jan-2024
    • (2024)Bug Analysis in Jupyter Notebook Projects: An Empirical StudyACM Transactions on Software Engineering and Methodology10.1145/364153933:4(1-34)Online publication date: 18-Apr-2024
    • (2024)Leveraging Large Language Models to Enhance Domain Expert Inclusion in Data Science WorkflowsExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3651115(1-11)Online publication date: 11-May-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media