research-article

Visualising data science workflows to support third-party notebook comprehension: an empirical study

Authors:

Dhivyabharathi Ramasamy,

Cristina Sarasua,

Alberto Bacchelli,

Abraham BernsteinAuthors Info & Claims

Empirical Software Engineering, Volume 28, Issue 3

https://doi.org/10.1007/s10664-023-10289-9

Published: 23 March 2023 Publication History

Abstract

Data science is an exploratory and iterative process that often leads to complex and unstructured code. This code is usually poorly documented and, consequently, hard to understand by a third party. In this paper, we first collect empirical evidence for the non-linearity of data science code from real-world Jupyter notebooks, confirming the need for new approaches that aid in data science code interaction and comprehension. Second, we propose a visualisation method that elucidates implicit workflow information in data science code and assists data scientists in navigating the so-called garden of forking paths in non-linear code. The visualisation also provides information such as the rationale and the identification of the data science pipeline step based on cell annotations. We conducted a user experiment with data scientists to evaluate the proposed method, assessing the influence of (i) different workflow visualisations and (ii) cell annotations on code comprehension. Our results show that visualising the exploration helps the users obtain an overview of the notebook, significantly improving code comprehension. Furthermore, our qualitative analysis provides more insights into the difficulties faced during data science code comprehension.

References

[1]

Ball T and Eick SG Software visualization in the large Computer 1996 29 4 33-43

[2]

Bangor A, Kortum PT, and Miller JT An empirical evaluation of the system usability scale Intl J Hum–Comput Interact 2008 24 6 574-594

[3]

Bavishi R, Lemieux C, Fox R, Sen K, and Stoica I AutoPandas: neural-backed generators for program synthesis Proc ACM on Programm Lang 2019 3 OOPSLA 1-27

[4]

Begel A, Nagappan N (2008) Pair programming: What’s in it for me?. In: Proceedings of the 2nd ACM-IEEE international symposium on empirical software engineering and measurement, ESEM ’08. Association for Computing Machinery, New York, pp 120–128

[5]

Brandt J, Guo PJ, Lewenstein J, Klemmer SR (2008) Opportunistic programming: How rapid ideation and prototyping occur in practice. In: Proceedings of the 4th international workshop on end-user software engineering, WEUSE ’08. Association for Computing Machinery, New York, pp 1–5

[6]

Brooke J (1996) Sus: a “quick and dirty’usability. Usability evaluation in industry, p 189

[7]

Cohen J Statistical analysis and power for the behavioral sciences 1988 2nd edn. Hillsdale Erinbaum

[8]

Collberg C, Kobourov S, Nagra J, Pitts J, Wampler K (2003) A system for graph-based visualization of the evolution of software. In: Proceedings of the 2003 ACM symposium on Software visualization, pp 77–ff

[9]

Corbi TA Program understanding: Challenge for the 1990’s IBM Syst J 1989 28 2 294-306

[10]

Cornelissen B, Zaidman A, van Deursen A, van Rompaey B (2009) Trace visualization for program comprehension: A controlled experiment. In: 2009 IEEE 17th international conference on program comprehension, pp 100–109

[11]

DeLine R, Czerwinski M, Robertson G (2005) Easing program comprehension by sharing navigation data. In: 2005 IEEE symposium on visual languages and human-centric computing (VL/HCC’05). IEEE, pp 241–248

[12]

Dictionary OL (2020) Marg - Oxford learner’s dictionary. https://www.oxfordlearnersdictionaries.com/definition/english/marg?q=marg. Accessed 15 Sept 2020

[13]

Fjelstad R, Hamlen W (1979) Application program maintenance study: report to our respondents. Proceedings of GUIDE 48

[14]

Francese R, Risi M, Scanniello G, Tortora G (2017) Users’ perception on the use of metricattitude to perform source code comprehension tasks: A focus group study. In: 2017 21st international conference information visualisation (IV). IEEE, pp 8–13

[15]

Gelman A, Loken E (2013) The garden of forking paths: Why multiple comparisons can be a problem even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time. Department of Statistics, Columbia University

[16]

Gil Y, Ratnakar V, Kim J, Gonzalez-Calero P, Groth P, Moody J, and Deelman E Wings: Intelligent workflow-based design of computational experiments IEEE Intell Syst 2010 26 1 62-72

[17]

Granger B, Pérez F (2021) Jupyter: Thinking and storytelling with code and data. Authorea Preprints

[18]

Green T and Petre M Usability analysis of visual programming environments: A ‘cognitive dimensions’ framework J Vis Lang Comput 1996 7 131-174

[19]

Head A, Hohman F, Barik T, Drucker SM, DeLine R (2019) Managing messes in computational notebooks. In: Proceedings of the 2019 CHI conference on human factors in computing systems, CHI ’19. Association for Computing Machinery, New York, pp 1–12

[20]

Hill C, Bellamy R, Erickson T, Burnett M (2016) Trials and tribulations of developers of intelligent systems: A field study. In: 2016 IEEE symposium on visual languages and human-centric computing (VL/HCC), pp 162–170

[21]

Hulkko H, Abrahamsson P (2005) A multiple case study on the impact of pair programming on product quality. In: Proceedings of the 27th international conference on software engineering, ICSE ’05. Association for Computing Machinery, New York, pp 495–504

[22]

Jupyter P (2015) Project Jupyter: Computational narratives as the engine of collaborative data science. https://blog.jupyter.org/

[23]

Kandel S, Paepcke A, Hellerstein JM, and Heer J Enterprise data analysis and visualization: An interview study IEEE Trans Vis Comput Graph 2012 18 12 2917-2926

[24]

Kery MB, Myers BA (2018) Interactions for untangling messy history in a computational notebook. In: 2018 IEEE symposium on visual languages and human-centric computing (VL/HCC), pp 147–155

[25]

Kery MB, Horvath A, Myers B (2017) Variolite: Supporting exploratory programming by data scientists. In: Proceedings of the 2017 CHI conference on human factors in computing systems, CHI ’17. ACM, New York, pp 1265–1276

[26]

Kery MB, Radensky M, Arya M, John BE, Myers BA (2018) The story in the notebook: Exploratory data science using a literate programming tool. In: Proceedings of the 2018 CHI conference on human factors in computing systems, CHI ’18. Association for Computing Machinery, New York, pp 1–11

[27]

Kery MB, John BE, O’Flaherty P, Horvath A, Myers BA (2019) Towards effective foraging by data scientists to find past analysis choices. In: Proceedings of the 2019 CHI conference on human factors in computing systems, CHI ’19. Association for Computing Machinery, New York, pp 1–13

[28]

Kienle HM and Müller HA Rigi-An environment for software reverse engineering, exploration, visualization, and redocumentation Sci Comput Program 2010 75 4 247-263

[29]

Kim M, Zimmermann T, DeLine R, and Begel A Data scientists in software teams: State of the art and challenges IEEE Trans Softw Eng 2017 44 11 1024-1038

[30]

Ko AJ, Myers BA, Coblenz MJ, and Aung H An exploratory study of how developers seek, relate, and collect relevant information during software maintenance tasks IEEE Trans Softw Eng 2006 32 12 971-987

[31]

Koop D, Patel J (2017) Dataflow notebooks: encoding and tracking dependencies of cells. In: 9th USENIX workshop on the theory and practice of provenance (TaPP)

[32]

Letovsky S and Soloway E Delocalized plans and program comprehension IEEE Softw 1986 3 3 41

[33]

Littman DC, Pinto J, Letovsky S, and Soloway EMental models and software maintenanceJ Syst Softw198774341-355https://doi.org/10.1016/0164-1212(87)90033-1, http://www.sciencedirect.com/science/article/pii/0164121287900331

[34]

Liu J, Boukhelifa N, and Eagan JR Understanding the role of alternatives in data analysis practices IEEE Trans Vis Comput Graph 2020 26 1 66-76

[35]

Liu Y, Althoff T, Heer J (2020b) Paths explored, paths omitted, paths obscured: Decision points & selective reporting in end-to-end data analysis. In: Proceedings of the 2020 CHI conference on human factors in computing systems, pp 1–14

[36]

Macke S, Gong H, Lee DJL, Head A, Xin D, Parameswaran A (2020) Fine-grained lineage for safer notebook interactions. arXiv:201206981

[37]

Merali Z (2010) Computational science: Error, why scientific programming does not compute. Nature, https://www.nature.com/articles/467775a

[38]

Minelli R, Lanza M (2013) Visualizing the workflow of developers. In: 2013 First IEEE working conference on software visualization (VISSOFT), pp 1–4

[39]

Minelli R, Mocci A, Lanza M (2015) I know what you did last summer - an investigation of how developers spend their time. In: 2015 IEEE 23rd international conference on program comprehension, pp 25–35

[40]

Namaki MH, Floratou A, Psallidas F, Krishnan S, Agrawal A, Wu Y, Zhu Y, Weimer M (2020) Vamsa: Automated provenance tracking in data science scripts. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 1542–1551

[41]

Nosek JT The case for collaborative programming Commun ACM 1998 41 3 105-108

[42]

Patel K, Fogarty J, Landay JA, Harrison B (2008) Investigating statistical machine learning as a tool for software development. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 667–676

[43]

Patterson E, McBurney R, Schmidt H, Baldini I, Mojsilović A, and Varshney KR Dataflow representation of data analyses: Toward a platform for collaborative data science IBM J Res Dev 2017 61 6 9:1-9:13

[44]

Pauw WD, Jensen E, Mitchell N, Sevitsky G, Vlissides JM, Yang J (2001) Visualizing the execution of java programs. In: Revised lectures on software visualization, International Seminar. Springer-Verlag, Berlin, pp 151–162

[45]

Perkel JM (2018) Why Jupyter is data scientists’ computational notebook of choice. Nature. https://www.nature.com/articles/d41586-018-07196-1

[46]

Perkel JM Workflow systems turn raw data into scientific knowledge Nature 2019 573 7772 149-151

[47]

Pimentel JF, Braganholo V, Murta L, Freire J (2015) Collecting and analyzing provenance on interactive notebooks: When IPython meets noworkflow. In: TaPP

[48]

Pimentel JF, Freire J, Murta L, and Braganholo V A survey on collecting, managing, and analyzing provenance from scripts ACM Comput Surv (CSUR) 2019 52 3 1-38

[49]

Pimentel JF, Murta L, Braganholo V, Freire J (2019) A large-scale study about quality and reproducibility of Jupyter notebooks. In: 2019 IEEE/ACM 16th international conference on mining software repositories (MSR), pp 507–517

[50]

Pimentel JF, Murta L, Braganholo V, and Freire J Understanding and improving the quality and reproducibility of Jupyter notebooks Empir Softw Eng 2021 26 4 1-55

[51]

Quaranta L, Calefato F, and Lanubile F Eliciting best practices for collaboration with computational notebooks Proc ACM Hum-Comput Interact 2022 6 CSCW1 1-41

[52]

Rajlich V, Cowan GS (1997) Towards standard for experiments in program comprehension. In: Proceedings Fifth International Workshop on Program Comprehension. IWPC’97, pp 160–161

[53]

Ramasamy D, Sarasua C, Bacchelli A, Bernstein A (2022) Workflow analysis of data science code in public Github repositories. To be published in EMSE

[54]

Randles BM, Pasquetto IV, Golshan MS, Borgman CL (2017) Using the Jupyter notebook as a tool for open science: An empirical study. In: 2017 ACM/IEEE joint conference on digital libraries (JCDL). IEEE, pp 1–2

[55]

Rule A, Birmingham A, Zuniga C, Altintas I, Huang S, Knight R, Moshiri N, Nguyen MH, Rosenthal SB, Pérez F, Rose PW (2018a) Ten simple rules for reproducible research in Jupyter notebooks. arXiv:1810.08055

[56]

Rule A, Tabard A, Hollan JD (2018b) Exploration and explanation in computational notebooks. In: Proceedings of the 2018 CHI conference on human factors in computing systems, CHI ’18. ACM, New York, pp 32:1–32:12

[57]

Rule A, Birmingham A, Zuniga C, Altintas I, Huang SC, Knight R, Moshiri N, Nguyen MH, Rosenthal SB, Pérez F et al (2019) Ten simple rules for writing and sharing computational analyses in Jupyter notebooks

[58]

Saldaña J The coding manual for qualitative researchers 2015 Newbury Park Sage

[59]

Schweinsberg M, Feldman M, Staub N, van den Akker OR, van Aert RC, Van Assen MA, Liu Y, Althoff T, Heer J, Kale A et al (2021) Same data, different conclusions: Radical dispersion in empirical results when independent analysts operationalize and test the same hypothesis. Organizational Behavior and Human Decision Processes

[60]

Siegmund J (2016) Program comprehension: Past, present, and future. In: 2016 IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER)., vol 5, pp 13–20

[61]

Silberzahn R, Uhlmann EL, Martin DP, Anselmi P, Aust F, Awtrey E, Bahník BaiF, Bannard C, Bonnier E, Carlsson R, Cheung F, Christensen G, Clay R, Craig MA, Rosa AD, Dam L, Evans MH, Cervantes IF, Fong N, Gamez-Djokic M, Glenz A, Gordon-McKeon S, Heaton TJ, Hederos K, Heene M, Mohr AJH, Högden F, Hui K, Johannesson M, Kalodimos J, Kaszubowski E, Kennedy DM, Lei R, Lindsay TA, Liverani S, Madan CR, Molden D, Molleman E, Morey RD, Mulder LB, Nijstad BR, Pope NG, Pope B, Prenoveau JM, Rink F, Robusto E, Roderique H, Sandberg A, Schlüter E, Schönbrodt FD, Sherman MF, Sommer SA, Sotak K, Spain S, Spörlein C, Stafford T, Stefanutti L, Tauber S, Ullrich J, Vianello M, Wagenmakers EJ, Witkowiak M, Yoon S, and Nosek BA Many analysts, one data set: Making transparent how variations in analytic choices affect results Adv Methods Pract Psychol Sci 2018 1 3 337-356

[62]

Srinivasa Ragavan S, Kuttal SK, Hill C, Sarma A, Piorkowski D, Burnett M (2016) Foraging among an overabundance of similar variants. In: Proceedings of the 2016 CHI conference on human factors in computing systems, CHI ’16. Association for Computing Machinery, New York, pp 3509–3521

[63]

Steegen S, Tuerlinckx F, Gelman A, and Vanpaemel W Increasing transparency through a multiverse analysis Perspect Psychol Sci 2016 11 702-712

[64]

Storey MD, Fracchia FD, Muller HA (1997a) Cognitive design elements to support the construction of a mental model during software visualization. In: Proceedings 5th international workshop on program comprehension. IWPC’97, pp 17–28

[65]

Storey MD, Wong K, Fracchia FD, Muller HA (1997b) On integrating visualization techniques for effective software exploration. In: Proceedings of VIZ ’97: Visualization conference, information visualization symposium and parallel rendering symposium, pp 38–45

[66]

Storey MAD (1998) A cognitive framework for describing and evaluating software exploration tools. PhD thesis, Simon Fraser University, CAN, aAINQ37756

[67]

Systä T, Koskimies K, and Müller H Shimba-an environment for reverse engineering Java software systems Softw, Pract Exper 2001 31 371-394

[68]

Thüring M, Hannemann J, and Haake JM Hypermedia and cognition: Designing for comprehension Commun ACM 1995 38 8 57-66

[69]

Wacker J Increasing the reproducibility of science through close cooperation and forking path analysis Front Psychol 2017 8 1332

[70]

Wang D, Weisz JD, Muller M, Ram P, Geyer W, Dugan C, Tausczik Y, Samulowitz H, and Gray A Human-ai collaboration in data science Proc ACM Hum-Comput Interact 2019 3 CSCW 1-24

[71]

Wilson G, Aruliah DA, Brown CT, Hong NPC, Davis M, Guy RT, Haddock SH, Huff KD, Mitchell IM, Plumbley MD, et al. Best practices for scientific computing PLoS Biol 2014 12 1 e1001745

[72]

Ye D, Xing Z, Foo CY, Ang ZQ, Li J, Kapre N (2016) Software-specific named entity recognition in software engineering social content. In: 2016 IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), vol 1, pp 90–101, DOI

Cited By

Ouyang YShen LWang YLi Q(2024)NotePlayer: Engaging Computational Notebooks for Dynamic Presentation of Analytical ProcessesProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676410(1-20)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676410
Xie LZheng CXia HQu HZhu-Tian C(2024)WaitGPT: Monitoring and Steering Conversational LLM Agent in Data Analysis with On-the-Fly Code VisualizationProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676374(1-14)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676374

Recommendations

An empirical study on code comprehension: data context interaction compared to classical object oriented
ICPC '17: Proceedings of the 25th International Conference on Program Comprehension

Source code comprehension affects software development --- especially its maintenance --- where code reading is one of the most time-consuming activities. A programming language, together with the programming paradigm it supports, is a strong factor ...
Workflow analysis of data science code in public GitHub repositories
Abstract
Despite the ubiquity of data science, we are far from rigorously understanding how coding in data science is performed. Even though the scientific literature has hinted at the iterative and explorative nature of data science coding, we need ...
Bug Analysis in Jupyter Notebook Projects: An Empirical Study
Computational notebooks, such as Jupyter, have been widely adopted by data scientists to write code for analyzing and visualizing data. Despite their growing adoption and popularity, few studies have been found to understand Jupyter development challenges ...

Comments

Information & Contributors

Information

Published In

cover image Empirical Software Engineering

Empirical Software Engineering Volume 28, Issue 3

May 2023

845 pages

ISSN:1382-3256

Issue’s Table of Contents

© The Author(s) 2023.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 23 March 2023

Accepted: 05 January 2023

Author Tags

Qualifiers

Research-article

Funding Sources

Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung
University of Zurich

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ouyang YShen LWang YLi Q(2024)NotePlayer: Engaging Computational Notebooks for Dynamic Presentation of Analytical ProcessesProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676410(1-20)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676410
Xie LZheng CXia HQu HZhu-Tian C(2024)WaitGPT: Monitoring and Steering Conversational LLM Agent in Data Analysis with On-the-Fly Code VisualizationProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676374(1-14)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676374

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents